Skip to content

Database - 41. page

有关log file parallel write等待

该事件只在lgwr进程上发生的等待事件,为lgwr进程将内存中redo entry写进redo file的IO工作期间等待该事件。此等待事件与db file parallel write等待原理类似。比如在IO系统没达到高负荷时候,但是脏缓冲图的数据量过多时候,db file parallel write的等待事件就会增多,与此相同,当IO没问题时候,但是重做数据量过多的时候,log file parallel write等待就可能会增加。该问题一方面受限于oracle lgwr的处理能力,一方面受限于redo entry的产生量(归根到底还是应用方面)。

如果想要降低不是因为IO问题引起的log file parallel write的等待,一般都从以下几点思考

1.减少提交次数
2.尽量少产生redo entry
3.避免hot backup
4.提高redo所在文件系统的io性能

该等待事件在平常能留给我们调整的余地较小,在目前的OLTP容灾普及的情况下,nologging方式基本不可能。

手工推进oracle scn方式(2)通过修改控制文件推进scn

手工推进oracle scn方式(2)通过修改控制文件推进scn

其实该方式可以理解为bbed修改数据文件头重建控制文件的另外一种方式,直接修改控制文件。生产库敬请谨慎变更!

该方式的要点,
1.定位控制文件的scn位置
2.修改scn以及check sum
3.重启DB确认修改成功

1.数据库当前的SCN号

SQL> select file#,checkpoint_change#,last_change# from v$datafile;

FILE# CHECKPOINT_CHANGE# LAST_CHANGE#
---------- ------------------ ------------
1 507224 507224
2 507224 507224
3 507224 507224
4 507224 507224

SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
------------------
507224

SQL> select to_char('507224','xxxxxxxxx') from dual;

TO_CHAR('5
----------
7bd58

2.用ue打开控制文件
通过07bd58定位当前scn所在的块为00048000H,偏移量为60的行的第9个字节开始到11个字节之间;flag为该块第一行偏移量为15的0x04;checksum为该块第二行的offset 16,17的0x6682
C0DP4J09_R1{X$ZYMWTZS)O

这里将scn推进1个g,总的scn为1024*1024*1024+507224=1074249048,转化为16进制为如下

SQL> select to_char('1074249048','xxxxxxxxx') from dual;

TO_CHAR('1
----------
4007bd58

在UE修改控制文件将checksum以及flag清0,将scn变更为0x004007bd58,变更后的控制文件为如下图:

6]W127Q)LLLT$T$76_{VX~5

重启数据库验证

SQL> shutdown immediate
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 436207616 bytes
Fixed Size 1267908 bytes
Variable Size 125830972 bytes
Database Buffers 301989888 bytes
Redo Buffers 7118848 bytes
Database mounted.

检查修改后的SCN,已经变为我们要修改的SCN号了^@^

SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
------------------
1074249048

SQL> alter database open;

Database altered.

在生产库上请慎重,该方式风险比较大。

通过oradebug的方式解决ORA-00600 [2662]

2662出问题的根源是数据文件的某些block的scn比文件头的scn还要大,通常处理的方法就是将当前的scn推进到比block的scn更大的时候就可以.

使用oradebug的方式处理该问题主要流程如下:

1.定位出问题的block,一般为2662错误后面的e为该block dba,查处该block的scn(2662错误后的d)
2.定位当前系统的scn,一般为2662错误后面的b为该current scn
3.核对系统scn与block scn的差值(根据adjust算法得出的scn差值 也可以)
4.使用oradebug 推荐当前系统scn比block还大为止
5.打开数据库,执行重建

ORA-1194/00354/00353/00334处理思路:_allow_resetlogs_corruption

遭遇ORA-1194,00354等错误,在线日志损坏,处理思路过程如下,

--告警报错信息
Thu Jul 24 00:16:08 2014
ORA-1194 signalled during: alter database open resetlogs...
Thu Jul 24 00:16:23 2014
ALTER DATABASE RECOVER database until cancel
Thu Jul 24 00:16:23 2014
Media Recovery Start
parallel recovery started with 2 processes
ORA-279 signalled during: ALTER DATABASE RECOVER database until cancel ...
Thu Jul 24 00:18:35 2014
ALTER DATABASE RECOVER LOGFILE '/oradata/ora10g/lu10g/redo03.log'
Thu Jul 24 00:18:35 2014
Media Recovery Log /oradata/ora10g/lu10g/redo03.log
Thu Jul 24 00:18:36 2014
Errors in file /ora10g/admin/lu10g/udump/lu10g_ora_14994.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 1508 change 4305448435 time 07/23/2014 23:56:57
ORA-00334: archived log: '/oradata/ora10g/lu10g/redo03.log'
Recovery interrupted!
Recovered data files to a consistent state at change 10480752

通过告警报错ORA-00354: corrupt redo log block header 从该错误可以看出当前日志的redo block的header出现问题,导致oracle无法读取redo进行恢复。通过以下检查发现数据库文件头都一致,没有需要恢复的文件,但是由于数据库是非正常关闭,last_change scn没有被写入,数据库判断需要恢复,则读取相应的redo log group 3,但是此时该文件已经损坏,尝试清理log,但是因为group 3在宕机时候为current状态,所以不能被清楚,则尝试通过设置_allow_reset_logs_corruption参数让系统跳过该错误,具体如下过程:

--检查文件状态
SQL> select file#,online_status,to_char(change#,'999999999999') from v$recover_file;

no rows selected

SQL> select file#,to_char(checkpoint_change#,'999999999999'),
2 to_char(last_change#,'999999999999') from v$datafile;

FILE# TO_CHAR(CHECK TO_CHAR(LAST_
---------- ------------- -------------
1 10480749
2 10480749
3 10480749
4 10480749
SQL> /

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
1 1 31 52428800 1 YES INACTIVE 10391212 19-JUL-14
3 1 33 52428800 1 NO CURRENT 10480310 23-JUL-14
2 1 32 52428800 1 YES INACTIVE 10427919 20-JUL-14

SQL> alter database clear unarchived logfile group 3;
alter database clear unarchived logfile group 3
*
ERROR at line 1:
ORA-01624: log 3 needed for crash recovery of instance lu10g (thread 1)
ORA-00312: online log 3 thread 1: '/oradata/ora10g/lu10g/redo03.log'

--在pfile中加入参数
_allow_resetlogs_corruption=true
_allow_error_simulation=TRUE

重启数据库以resetlog方式打开,得到以下告警信息:

Thu Jul 24 00:45:45 2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Thu Jul 24 00:45:45 2014
Errors in file /ora10g/admin/lu10g/udump/lu10g_ora_16481.trc:
ORA-00265: instance recovery required, cannot set ARCHIVELOG mode
Archive all online redo logfiles failed:265
RESETLOGS after incomplete recovery UNTIL CHANGE 10480752
Resetting resetlogs activation ID 2561579617 (0x98ae9a61)
Thu Jul 24 00:45:47 2014
Setting recovery target incarnation to 3
Thu Jul 24 00:45:47 2014
Assigning activation ID 2580873586 (0x99d50172)
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=16, OS id=16490
Thu Jul 24 00:45:47 2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=17, OS id=16492
Thu Jul 24 00:45:47 2014
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /oradata/ora10g/lu10g/redo01.log
Successful open of redo thread 1
Thu Jul 24 00:45:47 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jul 24 00:45:47 2014
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Thu Jul 24 00:45:47 2014
ARC1: Becoming the heartbeat ARCH
Thu Jul 24 00:45:47 2014
SMON: enabling cache recovery
Thu Jul 24 00:45:47 2014
Errors in file /ora10g/admin/lu10g/udump/lu10g_ora_16481.trc:
ORA-00600: internal error code, arguments: [2662], [0], [10480758], [0], [10480787], [4194313], [], []
Thu Jul 24 00:45:48 2014
Errors in file /ora10g/admin/lu10g/udump/lu10g_ora_16481.trc:
ORA-00600: internal error code, arguments: [2662], [0], [10480758], [0], [10480787], [4194313], [], []
Thu Jul 24 00:45:48 2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 16481
ORA-1092 signalled during: alter database open resetlogs...

至此数据库2662错误已经出来了,那就好办了,2662的处理办法参考手工推进SCN的办法以及《谈谈adjust_scn的计算方法》

手工推进oracle scn方式(1)Oradebug 推进scn

环境:LINUX_x86

这里需要注意地方为:

1.我的系统为32bit的小端系统.
2.大小端的SCN_WRAP和SCN_BASE是相反的.在大端中,kcsgscn_的前4字节为SCN_BASE,5-8字节为SCN_WRAP.
3.在32bit系统中寻址只能是4字节,而64位系统可以8字节.

该测试为32bit 小端系统,环境如下:

[root@ludatou ~]# file /sbin/init
/sbin/init: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped

[root@ludatou ~]# su - ora10g
[ora10g@ludatou ~]$ sqlplus '/as sysdba'

SQL*Plus: Release 10.2.0.4.0 - Production on Thu Jul 24 01:58:10 2014

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> startup mount
ORACLE instance started.

Total System Global Area  318767104 bytes
Fixed Size                  1267260 bytes
Variable Size             100665796 bytes
Database Buffers          213909504 bytes
Redo Buffers                2924544 bytes
Database mounted.

检查当前scn情况并以oradebug的方式推进scn:

--当前scn
SQL> select to_char(checkpoint_change#,'XXXXXXXXXXXXXXXX') from v$database;

TO_CHAR(CHECKPOIN
-----------------
A03E2F

SQL> oradebug setmypid
Statement processed.

--获取内存scn对应
SQL> oradebug dumpvar sga kcsgscn_
kcslf kcsgscn_ [20009228, 20009248) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 20009034
--因为32bit系统所以采用4字节寻址,更改scn_wrap
SQL> oradebug poke 0x20009228 4 0x01
BEFORE: [20009228, 2000922C) = 00000000
AFTER: [20009228, 2000922C) = 00000001
--更改scn_base
SQL> oradebug poke 0x2000922C 4 0x01
BEFORE: [2000922C, 20009230) = 00000000
AFTER: [2000922C, 20009230) = 00000001
SQL> oradebug dumpvar sga kcsgscn_
kcslf kcsgscn_ [20009228, 20009248) = 00000001 00000001 00000000 00000000 00000000 00000000 00000000 20009034
SQL> oradebug poke 0x2000922C 4 0xA03E2F
BEFORE: [2000922C, 20009230) = 00000001
AFTER: [2000922C, 20009230) = 00A03E2F
SQL> oradebug dumpvar sga kcsgscn_
kcslf kcsgscn_ [20009228, 20009248) = 00000001 00A03E2F 00000000 00000000 00000000 00000000 00000000 20009034
--更改完成后打开数据
SQL> alter database open;

Database altered.
--可以发现scn已经推进了wrap+1位
SQL> select to_char(current_scn,'XXXXXXXXXXXXXXXX') from v$database;

TO_CHAR(CURRENT_S
-----------------
100A04185