Skip to content

遭遇ora-00600 ktbair1 伴随25012,2662

这个错误比较偏门与诡异,mos上是有个文章关于这个错误,但是只是说了bug,涉及版本号.
还有个文档说是和6061错误一起伴随,是索引的logical corruption伴随一起出现,而实际情况是否认的.

我的场景是9208 sun sparc 64bit平台,9月到现在出现该错误3次,伴随错误主要是2662和25012.期间怀疑过(trace文件中报错的SQL包含的对象)涉及的2张表和索引过大导致,这几个对象没做过碎片清理操作,而且其中有个表和索引已经超过300G,第一次和第二次前后不短的时间出现问题后将该表数据清理剩余500M,同时对表和索引进行重建,过3周后同样问题继续出现,mos也没有吻合的现象解释和bug,首先就是报错ktbair1错误,接着undo和内存中的数据出现不一致报错2662和25012,重启后smon自行修复完成.通过rman对database分别进行phy corruption和logical corruption 进行检查都没发现对应的错误,

我的报错信息:

Ora-600 errors reported in alert log file,

Wed Dec 17 12:58:10 2014
Errors in file /oracle/admin/boss/udump/boss_ora_19696.trc:
ORA-00600: internal error code, arguments: [ktbair1], [0], [6], [], [], [], [], []
Wed Dec 17 12:58:11 2014
Errors in file /oracle/admin/boss/udump/boss_ora_19696.trc:
ORA-07445: exception encountered: core dump [0000000100638F9C] [SIGBUS] [Invalid address alignment] [0x203C2290D2BFFEF] [] []
ORA-00600: internal error code, arguments: [ktbair1], [0], [6], [], [], [], [], []
Wed Dec 17 12:59:01 2014
Errors in file /oracle/admin/boss/udump/boss_ora_6438.trc:
ORA-00600: internal error code, arguments: [2662], [281], [2416071487], [513], [370935493], [3305242969], [], []
ORA-08103: object no longer exists
Wed Dec 17 12:59:01 2014
Errors in file /oracle/admin/boss/udump/boss_ora_6438.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [281], [2416071487], [513], [370935493], [3305242969], [], []
ORA-08103: object no longer exists
Wed Dec 17 12:59:09 2014
Errors in file /oracle/admin/boss/udump/boss_ora_17808.trc:
ORA-00600: internal error code, arguments: [25012], [12], [606], [], [], [], [], []
.
.
Wed Dec 17 13:31:59 2014
Errors in file /oracle/admin/boss/udump/boss_ora_27048.trc:
ORA-00600: internal error code, arguments: [2662], [281], [2422679772], [513], [370935493], [3305242969], [], []
Wed Dec 17 13:36:55 2014
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 648
Wed Dec 17 13:41:56 2014
Shutting down instance (abort)
License high water mark = 648
Instance terminated by USER, pid = 11139
Wed Dec 17 13:43:20 2014
Starting ORACLE instance (force)
.
.
Completed: ALTER DATABASE MOUNT
Wed Dec 17 13:43:57 2014
ALTER DATABASE OPEN
Wed Dec 17 13:43:57 2014
Beginning crash recovery of 1 threads
Wed Dec 17 13:43:57 2014
Started redo scan
Wed Dec 17 13:43:57 2014
Completed redo scan
788 redo blocks read, 64 data blocks need recovery
Wed Dec 17 13:43:57 2014
Started recovery at
Thread 1: logseq 323774, block 803834, scn 0.0
Wed Dec 17 13:43:57 2014
Recovery of Online Redo Log: Thread 1 Group 6 Seq 323774 Reading mem 0
Mem# 0 errs 0: /tzboss_sysdata/boss/redo06.log
Wed Dec 17 13:43:57 2014
Completed redo application
Wed Dec 17 13:43:57 2014
Ended recovery at
Thread 1: logseq 323774, block 804622, scn 281.2424682375
64 data blocks read, 64 data blocks written, 788 redo blocks read
Crash recovery completed successfully

From trace file ora-19696.trc,

BH (0x58bfede00) file#: 64 rdba: 0x101a691a (64/1730842) class 1 ba: 0x58bd7c000
set: 80 dbwrid: 7 obj: 672167 objn: 66172
hash: [551f8af00,a67d5b010] lru: [52bfdfe68,5a4fcd768]
ckptq: [NULL] fileq: [NULL]
use: [a683b8eb8,a683b8eb8] wait: [NULL]
st: CR md: EXCL rsop: 0x0 tch: 0
cr:[[scn: 0x0119.8e356168],[xid: 0x000a.02c.008b183e],[uba: 0x2dc978b1.2a93.04],[cls: 0x0119.900211c1],[sfl: 0x1]]
flags: buffer_dirty mod_started only_sequential_access
change state: ACTIVE
buffer tsn: 15 rdba: 0x0203c229 (8/246313)
scn: 0x06c5.072c0007 seq: 0x02 flg: 0x03 tail: 0x128c0600
frmt: 0x04 chkval: 0x1f46 type: 0xc1=unknown
Hex dump of corrupt header 4 = CORRUPT

涉及的对象和文件:

SQL> col object_name format a30
SQL> select owner,object_name,object_type,subobject_name,status,temporary from dba_objects where data_object_id=672167;
select file#,rfile#,name,ts# from v$datafile where rfile# in (64,8);

no rows selected

SQL>
FILE# RFILE# NAME TS#
---------- ---------- -------------------------------------------------- ----------
8 8 /tzboss_oradata/boss/ms_01.dbf 8
64 64 /tzboss_oradata/boss/bm_02.dbf 15

逻辑坏块验证结果:

Recovery Manager: Release 9.2.0.8.0 - 64bit Production

Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.

connected to target database: BOSS (DBID=1157181921)

RMAN> run {
2> allocate channel d1 type disk;
3> allocate channel d2 type disk;
4> allocate channel d3 type disk;
5> allocate channel d4 type disk;
6> backup validate check logical database;
7> }
8>
using target database controlfile instead of recovery catalog
allocated channel: d1
channel d1: sid=240 devtype=DISK

allocated channel: d2
channel d2: sid=21 devtype=DISK

allocated channel: d3
channel d3: sid=277 devtype=DISK

allocated channel: d4
channel d4: sid=179 devtype=DISK

Starting backup at 05-NOV-14
channel d1: starting full datafile backupset
channel d1: specifying datafile(s) in backupset
input datafile fno=00009 name=/tzboss_oradata/boss/cs_01.dbf
input datafile fno=00028 name=/tzboss_index/boss/cs_idx_01.dbf
input datafile fno=00183 name=/tzboss_sysdata/boss/undotbs1_05.dbf
....
input datafile fno=00226 name=/tzboss_oradata/boss/ei_05.dbf
input datafile fno=00231 name=/tzboss_oradata/boss/tr_03.dbf
input datafile fno=00243 name=/tzboss_oradata/boss/bp_18.dbf
channel d2: backup set complete, elapsed time: 00:02:36
channel d2: starting full datafile backupset
channel d2: specifying datafile(s) in backupset
input datafile fno=00251 name=/tzboss_oradata/boss/pm_04.dbf
input datafile fno=00252 name=/tzboss_oradata/boss/ei_06.dbf
channel d4: backup set complete, elapsed time: 00:02:56
channel d1: backup set complete, elapsed time: 00:02:16
channel d2: backup set complete, elapsed time: 00:01:40
channel d3: backup set complete, elapsed time: 00:02:40
Finished backup at 06-NOV-14
released channel: d1
released channel: d2
released channel: d3
released channel: d4

Recovery Manager complete.

SQL> select count(*) from v$database_block_corruption;

COUNT(*)
----------
0

oracle官方1级SR的诊断结果为

-- Search --

Note: This is INTERNAL ONLY research. No action should be taken by the customer on this information. This is research only, and may NOT be applicable to your specific situation. Searched on: Bug 6401576 : ORA-600 [KTBAIR1] / ORA-600 [KCBZPB_1] / CORRUPTION MESSAGES --> DB CRASH

但是实际上我这里从未出现过6101的错误,该bug按照如下的分析并不吻合我这个数据库所在的情况.但是按照解释这个bug也可以产生表示为ktbair1的错误,最终的流程都是先抛错误ORA-600 [KTBAIR1] / ORA-600 [KCBZPB_1] / CORRUPTION MESSAGES –> DB CRASH .进一步的核验正在进行,目前打算将报错涉及到的DBA地址所在的数据文件对象转移到其他表空间,然后看看是否能避开此类错误,如果还存在则对该版本打上6401576的补丁继续观察,与此错误的纠缠还没完.

Bug 6401576 : ORA-600 [KTBAIR1] / ORA-600 [KCBZPB_1] / CORRUPTION MESSAGES –> DB CRASH 描述:

Hdr: 6401576 9.2.0.8.0 RDBMS 9.2.0.8 RAM INDEX PRODID-5 PORTID-59 ORA-600
Abstract: ORA-600 [KTBAIR1] / ORA-600 [KCBZPB_1] / CORRUPTION MESSAGES --> DB CRASH

*** 09/07/07 03:47 am ***
*** 09/07/07 03:49 am ***
*** 09/07/07 04:19 am *** (CHG: Sta->16)
*** 09/07/07 04:19 am ***
*** 09/07/07 04:32 am ***
*** 09/07/07 04:33 am ***
*** 09/07/07 07:30 am *** (CHG: Sta->10)
*** 09/07/07 07:30 am ***
*** 09/07/07 08:01 am ***
*** 09/07/07 08:01 am ***
*** 09/07/07 08:14 am ***
*** 09/07/07 08:21 am *** ESCALATED
*** 09/07/07 08:21 am ***
*** 10/22/07 03:13 am *** (CHG: Sta->33 Asg->RDBMSREP SubComp->UNKNOWN)
*** 10/22/07 03:13 am ***
*** 10/22/07 03:13 am ***
*** 10/26/07 01:05 am *** (CHG: Sta->16)
*** 10/26/07 01:05 am ***
*** 10/26/07 01:07 am ***
*** 10/29/07 07:03 am ***
*** 10/29/07 07:03 am ***
*** 10/29/07 07:13 am ***
*** 10/29/07 11:39 am *** (CHG: Sta->10)
*** 10/29/07 11:39 am ***
*** 10/30/07 03:19 am ***
*** 10/31/07 02:32 am *** (CHG: Sta->16)
*** 10/31/07 02:32 am ***
*** 10/31/07 02:33 am ***
*** 10/31/07 02:39 am ***
*** 10/31/07 02:49 am ***
*** 10/31/07 02:50 am ***
*** 10/31/07 04:31 am *** (CHG: Sta->10)
*** 10/31/07 04:31 am ***
*** 10/31/07 07:21 am ***
*** 10/31/07 07:21 am ***
*** 11/02/07 06:40 am *** (CHG: Sta->16)
*** 11/02/07 06:40 am ***
*** 11/03/07 03:53 am *** (CHG: Sta->11 SubComp->RAM INDEX)
*** 11/03/07 03:53 am ***
*** 11/03/07 03:54 am ***
*** 11/03/07 03:54 am ***
*** 11/03/07 03:58 am ***
*** 11/06/07 08:27 pm ***
*** 11/06/07 08:27 pm ***
*** 11/06/07 08:39 pm ***
*** 11/06/07 08:39 pm *** (CHG: Sta->30)
*** 11/06/07 08:39 pm ***
*** 11/06/07 11:55 pm ***
*** 11/08/07 03:36 am *** (CHG: Sta->11)
*** 11/08/07 03:36 am ***
*** 11/08/07 06:51 am ***
*** 11/12/07 06:30 pm *** (CHG: Sta->30)
*** 11/12/07 06:30 pm ***
*** 11/14/07 03:38 am *** (CHG: Sta->11)
*** 11/14/07 03:38 am ***
*** 11/14/07 03:53 am ***
*** 11/14/07 03:28 pm ***
*** 11/21/07 04:35 pm *** (CHG: Sta->30)
*** 11/21/07 04:35 pm ***
*** 11/26/07 07:05 pm ***
*** 12/05/07 06:23 am *** (CHG: Sta->11)
*** 12/05/07 06:23 am ***
*** 12/14/07 06:04 pm *** (CHG: Sta->30)
*** 12/14/07 06:04 pm ***
*** 12/18/07 03:58 am *** (CHG: Sta->11)
*** 12/18/07 03:58 am ***
*** 12/19/07 03:42 am ***
*** 12/19/07 08:23 am ***
*** 12/20/07 11:35 am ***
*** 01/03/08 03:14 pm ***
*** 01/07/08 06:56 am ***
*** 01/07/08 08:36 am ***
*** 01/11/08 03:19 pm ***
*** 01/15/08 11:55 am ***
*** 01/16/08 08:10 am *** (ADD: Impact/Symptom->DATA CORRUPTION )
*** 01/16/08 08:11 am *** (ADD: Impact/Symptom->MEMORY CORRUPTION )
*** 01/16/08 08:12 am ***
*** 01/16/08 08:12 am *** (CHG: Sta->80)
*** 01/16/08 08:22 am ***
*** 01/16/08 08:23 am ***
Rediscovery Information: If ORA-600[6101] is encountered during the
compaction of a compressed
index block, then it is this bug.  There can also be other internal
errors such as ktbair1, kcbzpb_1, 4519.

Workaround: none
Release Notes:
]] Internal errors could occur during the compaction of a compressed
]] index block.

浅谈datapump迁移数据时候的一些技巧以及坑点(待细化)

做了不少数据迁移和升级的项目,大大小小的,为此也总结了一些技巧和坑点,记录下来以作分享和日后参考。

1.方案,流程,测试,完善
2.借机梳理系统交互的dblink
梳理dblink的思路流程和测试脚本
function和procedures对此有依赖
tnsnames的切换与外部dblink的重建
3.导出数据有技巧
完整数据的导出需要注意的设置
停机以及增量追进的办法
4.缩短数据传输时间的优化选择
数据传输的方式
5.导入数据有技巧
导入前的role,profile补建检查
导入效率上的设置:内存,并行,索引控制,归档,logging
导入约束触发器,约束和任务的设置

大坑之对象权限在基于单个schemas的迁移时候会丢失。

6.迁移顺便把索引和表的分离存储,表分区

7.job和scheduler的迁移注意点
注意检测源和目的对比,impdp导入job有缺陷
job的执行时间点

8.编译对象有技巧

9.导入后数据的对比是必要的

10.收集统计信息有技巧

11.一个最小报错的迁移方法建议

1.impdp必检job/scheduler
2.复制schemas时候impdp不会导入系统权限
2.一个最小报错的迁移方法

遭遇scls_scr_create,bug 4632899

没啥说的,刚熬完通宵,客户数据库挂了,说是机器挂了,重启以后系统恢复了,尝试启动crsctl报错

Failure at scls_scr_create with code 1
 Internal Error Information:
 Category: 1234
 Operation: scls_scr_create
 Location: mkdir
 Other: Unable to make user dir
 Dep: 2

support描述与主机名大小写有关,solaris上的一个bug 4632899.而该主机主机名设置混乱,hostname和/etc/hosts下不一致,sysconfig/network的主机名也未设置!

该bug描述:

 

Bug 4632899  CSS does not start if hostname has capital letters  This note gives a brief overview of bug 4632899.
The content was last updated on: 14-OCT-2011
Click here for details of each of the sections below.

Affects:

Product (Component) Oracle Server (PCW)
Range of versions believed to be affected Versions BELOW 11.1
Versions confirmed as being affected
Platforms affected Generic (all / most platforms affected)

Fixed:

This issue is fixed in

Symptoms:

Related To:

  • (None Specified)

Description

CSS does not start on Solaris if the hostname is in upper case.

eg:
 The following errors might be noticed when using CRS scripts:
  Failure at scls_scr_create with code 1
  Category: 1234
  Operation: scls_scr_create
  Location: mkdir
  Other: Unable to make user dir
  Dep: 2

Workaround
  Change the hostname to from uppercase to lower case.

HOOKS PSE:A203 LIKELYAFFECTS XAFFECTS_10.2.0.1 XAFFECTS_V10020001 AFFECTS=10.2.0.1 XAFFECTS_10.2.0.2 XAFFECTS_V10020002 AFFECTS=10.2.0.2 XAFFECTS_10.2.0.3 XAFFECTS_V10020003 AFFECTS=10.2.0.3 XAFFECTS_10.2.0.4 XAFFECTS_V10020004 AFFECTS=10.2.0.4 XPRODID_5 PRODUCT_ID=5 PRODID-5 PCW XCOMP_PCW COMPONENT=PCW TAG_OPSM OPSM FIXED_10.2.0.4.CRS02 FIXED_10.2.0.5 FIXED_11.1.0.6
Please note: The above is a summary description only. Actual symptoms can vary. Matching to any symptoms here does not confirm that you are encountering this problem. For questions about this bug please consult Oracle Support.

References

Bug:4632899 (This link will only work for PUBLISHED bugs)
Note:245840.1 Information on the sections in this article

索引块上递归事务专用的itl slot争用的识别判断

承接《Oracle10g版本后enq: TX – allocate ITL entry等待事件的根源以及解决思路》

由于itl争用主要是initrans不足(10g后今本不受此影响maxtrans失效),块空间不足原因引起,但是特殊的情况下的索引递归事务引起的itl争用的识别也是需要掌握的技术,虽然大量的递归情况较少见,但如何区分相比前面的2种情况就相对复杂点。主要的思路是根据索引块的itl信息来识别,因为上一篇中讲到在索引的枝节点上,有且只有一个ITL slot,它是用于当发生节点分裂的递归事务(Recursive Transaction)。在叶子节点上,第一条ITL Slot也是用于分裂的递归事务的。只要根据相关索引块的负责递归事务的itl事务槽的使用情况就可以判断争用的情况。

思路如下:
1.找出被阻塞的事务
2.根据阻塞的事务找到相关的回滚块以及相关事务起始回滚编号
3.根据回滚块的内容识别相关的索引块
4.dump出相关的索引块识别对应的itl争用情况

例子:

1.测试阻塞

session 1 更新61080块的100行

SQL> update luda set a=a
  2  where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61080
  3  and dbms_rowid.ROWID_ROW_NUMBER(rowid)=100;

1 row updated.

commit;

session 2 更新61080块的200行

SQL> update luda set a=a
  2  where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61080
  3  and dbms_rowid.ROWID_ROW_NUMBER(rowid)=200;

1 row updated.

commit;

session 3 更新61080块的300行

SQL> update luda set a=a
  2  where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61080
  3  and dbms_rowid.ROWID_ROW_NUMBER(rowid)=300;

1 row updated.

session 4 更新61080块的400行

SQL> update luda set a=a
  2  where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61080
  3  and dbms_rowid.ROWID_ROW_NUMBER(rowid)=400;

1 row updated.

session 5 更新61080块的500行hang住

2.找出被阻塞的事务以及相关回滚段信息

确认itl阻塞:

SQL> select s.sid, s.event, s.row_wait_obj#
  2  from v$session s where s.sid=151;

       SID EVENT                          ROW_WAIT_OBJ#
---------- ------------------------------ -------------
       148 enq: TX - allocate ITL entry              -1

确认对应的阻塞事务的回滚相关信息,可以发现阻塞事务对应的回滚在2号数据文件的2109数据块中,起始的回滚记录是0x2e(46的16进制)

SQL> select l.sid req_session, s.sid lock_session, l.lmode, l.request, t.xidusn, t.xidslot, t.start_ubafil, t.start_ubablk, t.start_ubarec
  2  from v$lock l, v$transaction t, v$session s
  3  where l.type = 'TX'
  4  and trunc(id1/power(2,16)) = t.xidusn
  5  and l.id2 = t.xidsqn
  6  and id1 - power(2,16)*trunc(id1/power(2,16)) = t.xidslot
  7  and t.addr = s.taddr
  8  and l.request = 4;

REQ_SESSION LOCK_SESSION      LMODE    REQUEST     XIDUSN    XIDSLOT START_UBAFIL START_UBABLK START_UBAREC
----------- ------------ ---------- ---------- ---------- ---------- ------------ ------------ ------------
        148          159          0          4          5         26            2         2109           46

dump出2号数据文件的2109号数据块

SQL> alter system dump datafile 2 block 2109;

System altered.

确认相关对象的object_id

SQL> select object_name,object_id,data_object_id from dba_objects where object_name in ('LUDA','IDX_TEST') and owner='SYS';

OBJECT_NAME                               OBJECT_ID DATA_OBJECT_ID
---------------------------------------- ---------- --------------
IDX_TEST                                      51980          51980
LUDA                                          51978          51978

分析2号数据文件的2109号数据块dump文件,从Rec #0x2e部分开始到结束只有0x2e此条与对象号51980,51978相关回滚记录

*-----------------------------
* Rec #0x2e  slt: 0x1a  objn: 51978(0x0000cb0a)  objd: 51978  tblspc: 0(0x00000000)
*       Layer:  11 (Row)   opc: 1   rci 0x00
Undo type:  Regular undo    Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
rdba: 0x00000000
*-----------------------------
uba: 0x0080083d.014b.2d ctl max scn: 0x0000.000bb362 prv tx scn: 0x0000.000bb37b
txn start scn: scn: 0x0000.000bcdc1 logon user: 0
 prev brb: 8388917 prev bcl: 0
KDO undo record:
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x0002.010.0000013b uba: 0x00800051.0100.3c
                      flg: C---    lkc:  0     scn: 0x0000.000bcd90
KDO Op code: URP row dependencies Disabled
  xtype: XAxtype KDO_KDOM2 flags: 0x00000080  bdba: 0x0040ee98  hdba: 0x0040ee91
itli: 1  ispac: 0  maxfr: 4863
tabn: 0 slot: 400(0x190) flag: 0x2c lock: 0 ckix: 71
ncol: 1 nnew: 1 size: 0
Vector content:
col  0: [ 3]  c2 31 06

End dump data blocks tsn: 1 file#: 2 minblk 2109 maxblk 2109

从bdba地址分析可以获得当前更新对象对luda,前面更新提交的1条语句的部分就是luda的61080号块。

SQL> select dbms_utility.data_block_address_file(TO_NUMBER('0040ee98', 'XXXXXXXX')) file_id,
  2  dbms_utility.data_block_address_block(TO_NUMBER('0040ee98', 'XXXXXXXX')) block_id from dual;

   FILE_ID   BLOCK_ID
---------- ----------
         1      61080

在回滚信息中没有发现索引块类型,事务对象只有objno为51978的luda表,类似的索引对象也是用此方法分析。