Skip to content

Oracle恢复 - 8. page

ORA-1194/00354/00353/00334处理思路:_allow_resetlogs_corruption

遭遇ORA-1194,00354等错误,在线日志损坏,处理思路过程如下,

--告警报错信息
Thu Jul 24 00:16:08 2014
ORA-1194 signalled during: alter database open resetlogs...
Thu Jul 24 00:16:23 2014
ALTER DATABASE RECOVER database until cancel
Thu Jul 24 00:16:23 2014
Media Recovery Start
parallel recovery started with 2 processes
ORA-279 signalled during: ALTER DATABASE RECOVER database until cancel ...
Thu Jul 24 00:18:35 2014
ALTER DATABASE RECOVER LOGFILE '/oradata/ora10g/lu10g/redo03.log'
Thu Jul 24 00:18:35 2014
Media Recovery Log /oradata/ora10g/lu10g/redo03.log
Thu Jul 24 00:18:36 2014
Errors in file /ora10g/admin/lu10g/udump/lu10g_ora_14994.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 1508 change 4305448435 time 07/23/2014 23:56:57
ORA-00334: archived log: '/oradata/ora10g/lu10g/redo03.log'
Recovery interrupted!
Recovered data files to a consistent state at change 10480752

通过告警报错ORA-00354: corrupt redo log block header 从该错误可以看出当前日志的redo block的header出现问题,导致oracle无法读取redo进行恢复。通过以下检查发现数据库文件头都一致,没有需要恢复的文件,但是由于数据库是非正常关闭,last_change scn没有被写入,数据库判断需要恢复,则读取相应的redo log group 3,但是此时该文件已经损坏,尝试清理log,但是因为group 3在宕机时候为current状态,所以不能被清楚,则尝试通过设置_allow_reset_logs_corruption参数让系统跳过该错误,具体如下过程:

--检查文件状态
SQL> select file#,online_status,to_char(change#,'999999999999') from v$recover_file;

no rows selected

SQL> select file#,to_char(checkpoint_change#,'999999999999'),
2 to_char(last_change#,'999999999999') from v$datafile;

FILE# TO_CHAR(CHECK TO_CHAR(LAST_
---------- ------------- -------------
1 10480749
2 10480749
3 10480749
4 10480749
SQL> /

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
1 1 31 52428800 1 YES INACTIVE 10391212 19-JUL-14
3 1 33 52428800 1 NO CURRENT 10480310 23-JUL-14
2 1 32 52428800 1 YES INACTIVE 10427919 20-JUL-14

SQL> alter database clear unarchived logfile group 3;
alter database clear unarchived logfile group 3
*
ERROR at line 1:
ORA-01624: log 3 needed for crash recovery of instance lu10g (thread 1)
ORA-00312: online log 3 thread 1: '/oradata/ora10g/lu10g/redo03.log'

--在pfile中加入参数
_allow_resetlogs_corruption=true
_allow_error_simulation=TRUE

重启数据库以resetlog方式打开,得到以下告警信息:

Thu Jul 24 00:45:45 2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Thu Jul 24 00:45:45 2014
Errors in file /ora10g/admin/lu10g/udump/lu10g_ora_16481.trc:
ORA-00265: instance recovery required, cannot set ARCHIVELOG mode
Archive all online redo logfiles failed:265
RESETLOGS after incomplete recovery UNTIL CHANGE 10480752
Resetting resetlogs activation ID 2561579617 (0x98ae9a61)
Thu Jul 24 00:45:47 2014
Setting recovery target incarnation to 3
Thu Jul 24 00:45:47 2014
Assigning activation ID 2580873586 (0x99d50172)
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=16, OS id=16490
Thu Jul 24 00:45:47 2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=17, OS id=16492
Thu Jul 24 00:45:47 2014
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /oradata/ora10g/lu10g/redo01.log
Successful open of redo thread 1
Thu Jul 24 00:45:47 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jul 24 00:45:47 2014
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Thu Jul 24 00:45:47 2014
ARC1: Becoming the heartbeat ARCH
Thu Jul 24 00:45:47 2014
SMON: enabling cache recovery
Thu Jul 24 00:45:47 2014
Errors in file /ora10g/admin/lu10g/udump/lu10g_ora_16481.trc:
ORA-00600: internal error code, arguments: [2662], [0], [10480758], [0], [10480787], [4194313], [], []
Thu Jul 24 00:45:48 2014
Errors in file /ora10g/admin/lu10g/udump/lu10g_ora_16481.trc:
ORA-00600: internal error code, arguments: [2662], [0], [10480758], [0], [10480787], [4194313], [], []
Thu Jul 24 00:45:48 2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 16481
ORA-1092 signalled during: alter database open resetlogs...

至此数据库2662错误已经出来了,那就好办了,2662的处理办法参考手工推进SCN的办法以及《谈谈adjust_scn的计算方法》

手工推进oracle scn方式(1)Oradebug 推进scn

环境:LINUX_x86

这里需要注意地方为:

1.我的系统为32bit的小端系统.
2.大小端的SCN_WRAP和SCN_BASE是相反的.在大端中,kcsgscn_的前4字节为SCN_BASE,5-8字节为SCN_WRAP.
3.在32bit系统中寻址只能是4字节,而64位系统可以8字节.

该测试为32bit 小端系统,环境如下:

[root@ludatou ~]# file /sbin/init
/sbin/init: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped

[root@ludatou ~]# su - ora10g
[ora10g@ludatou ~]$ sqlplus '/as sysdba'

SQL*Plus: Release 10.2.0.4.0 - Production on Thu Jul 24 01:58:10 2014

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> startup mount
ORACLE instance started.

Total System Global Area  318767104 bytes
Fixed Size                  1267260 bytes
Variable Size             100665796 bytes
Database Buffers          213909504 bytes
Redo Buffers                2924544 bytes
Database mounted.

检查当前scn情况并以oradebug的方式推进scn:

--当前scn
SQL> select to_char(checkpoint_change#,'XXXXXXXXXXXXXXXX') from v$database;

TO_CHAR(CHECKPOIN
-----------------
A03E2F

SQL> oradebug setmypid
Statement processed.

--获取内存scn对应
SQL> oradebug dumpvar sga kcsgscn_
kcslf kcsgscn_ [20009228, 20009248) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 20009034
--因为32bit系统所以采用4字节寻址,更改scn_wrap
SQL> oradebug poke 0x20009228 4 0x01
BEFORE: [20009228, 2000922C) = 00000000
AFTER: [20009228, 2000922C) = 00000001
--更改scn_base
SQL> oradebug poke 0x2000922C 4 0x01
BEFORE: [2000922C, 20009230) = 00000000
AFTER: [2000922C, 20009230) = 00000001
SQL> oradebug dumpvar sga kcsgscn_
kcslf kcsgscn_ [20009228, 20009248) = 00000001 00000001 00000000 00000000 00000000 00000000 00000000 20009034
SQL> oradebug poke 0x2000922C 4 0xA03E2F
BEFORE: [2000922C, 20009230) = 00000001
AFTER: [2000922C, 20009230) = 00A03E2F
SQL> oradebug dumpvar sga kcsgscn_
kcslf kcsgscn_ [20009228, 20009248) = 00000001 00A03E2F 00000000 00000000 00000000 00000000 00000000 20009034
--更改完成后打开数据
SQL> alter database open;

Database altered.
--可以发现scn已经推进了wrap+1位
SQL> select to_char(current_scn,'XXXXXXXXXXXXXXXX') from v$database;

TO_CHAR(CURRENT_S
-----------------
100A04185

 

浅谈如何计算oracle文件的偏移量

##############################
www.ludatou.com 大头
转载请指明处,否则追究法律责任
##############################

了解oracle的偏移量可以帮助在一些特殊恢复场景时候急救需求用到。
关于oracle 偏移量在10g之前一直是DBA需要惦记的事情,曾经有一个在AIX平台存在的经典bug2620053就是因为关于在裸设备上偏移量的问题,在那个时代DBA在aix系统普遍使用裸设备,每次加数据文件总是要考虑到offset的问题。

在众多平台中,目前只有在AIX平台上在为Oracle使用裸设备时候,会预留出4k的空间保留裸设备元数据,俗称磁盘头(lv header)。而在以前HP TRU64也存在一样的情况,只是目前TRU 64已经不受Oracle支持了。
在每一个oracle文件都存在一个系统文件头(block 0),它们由oracle维护,数据文件上的BLOCK 0的大小等于该数据文件的block_size,而重做日志文件上的BLOCK 0除了hp平台为1M外其他都为512bytes(具体可以通过dump logfile,通过tracefile的block_size查出)。BLOCK 0之后的BLOCK 1就是Oracle header block。这里的BLOCK 1的offset就等于v$datafile中的block1_offset。

在正常情况下,不算上AIX,HP tru64的平台中使用裸设备的话,实际的oracle数据的byte offset通用计算方法如下:

byte offset=oracle_block_number * oracle_block_size (对应文件上的block number)
另外一种算法是
byte offset=block1_offset + 1 + ((oracle_block_number - 1) * oracle_block_size)
因为严格意义上讲,block1_offset并不是oracle真实数据的起始块,这两种结果都一样,只是从意义上区分开来了。

在AIX,HP tru64系统上使用裸设备的情况下,byte offset计算方法如下:

offset = raw_device_block0_offset + 1 + (oracle_block_number * oracle_block_size) 

其实到这里就可以看出来了,在AIX平台上,当使用裸设备时候(这里的裸设备不是使用-T -O的big vg或者scan vg划分出来的lv),脱离了文件系统管理,file_syatem_block0_offset为0。只有在使用文件系统存放oracle文件的时才存在file_syatem_block0_offset为对应的blocksize或者redoblocksize。

AIX平台裸设备offset:

raw_device_block0_offset = 4096
file_system_block0_offset = 0

HP tru64平台裸设备offset:

raw_device_block0_offset = 65536
file_system_block0_offset = 0

从oracle 9203开始,Oracle就建议使用rlv这种不带raw_device_block0_offset的类型裸设备(lv),具体信息参考
《关于aix上的逻辑卷偏移量》

相比较而言,

遭遇ora-00600 [2662]一例

遭遇2662错误,

Thu Jul 24 00:45:47 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jul 24 00:45:47 2014
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Thu Jul 24 00:45:47 2014
ARC1: Becoming the heartbeat ARCH
Thu Jul 24 00:45:47 2014
SMON: enabling cache recovery
Thu Jul 24 00:45:47 2014
Errors in file /ora10g/admin/lu10g/udump/lu10g_ora_16481.trc:
ORA-00600: internal error code, arguments: [2662], [0], [10480758], [0], [10480787], [4194313], [], []
Thu Jul 24 00:45:48 2014
Errors in file /ora10g/admin/lu10g/udump/lu10g_ora_16481.trc:
ORA-00600: internal error code, arguments: [2662], [0], [10480758], [0], [10480787], [4194313], [], []
Thu Jul 24 00:45:48 2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 16481
ORA-1092 signalled during: alter database open resetlogs...

根据《谈谈adjust_scn的计算方法》的方式处理,使用adjust处理的计算过程如下:
因为c=0所以c*4=0,因为D=10480787 < 1073741824,所以WRAP=0+1,所以整个adjust的level 为1,则处理的语句为如下:
alter session set events ‘IMMEDIATE trace name ADJUST_SCN level 1’;

当然这里也可以通过oradebug poke的方式更改scn的wrap+1以达到一样的效果.

登录数据库执行处理如下:

SQL&gt; conn / as sysdba
Connected to an idle instance.
SQL&gt; startup mount
ORACLE instance started.
Total System Global Area 318767104 bytes
Fixed Size 1267260 bytes
Variable Size 100665796 bytes
Database Buffers 213909504 bytes
Redo Buffers 2924544 bytes
Database mounted.

SQL&gt; alter session set events 'IMMEDIATE trace name ADJUST_SCN level 1';

Session altered.

SQL&gt; alter database open;

Database altered.

至此数据库已经打开,此时应该用dump工具导出数据重建数据库,因为此时数据库运行会非常艰难,后台会不断报错,比如:
ORA-00600: internal error code, arguments: [kcsadjn1], [], [], [], [], [], [], []

所以建议还是需要做好备份以及容灾,避免出现各类问题。当然这里的scn还可以通过其他推进scn的方式来解决,比如在11204版本后,就需要使用其他手段来推进SCN,传统方式在110204已经不支持,具体参考以前写的手工推荐scn的5种方式。