Skip to content

Database - 40. page

遭遇ora-00600 [6711]

错误如下:

 

Sat Jul 26 10:22:23 2014
Errors in file /oracle/admin/icdb/udump/icdb_ora_459112.trc:
ORA-00600: internal error code, arguments: [6711], [4257353], [1], [4255900], [0], [], [], []
Sat Jul 26 10:24:07 2014
Errors in file /oracle/admin/icdb/udump/icdb_ora_459112.trc:
ORA-00600: internal error code, arguments: [6711], [4257353], [1], [4255900], [0], [], [], []
Sat Jul 26 10:31:38 2014
Errors in file /oracle/admin/icdb/udump/icdb_ora_56557800.trc:
ORA-00600: 内部错误代码, 参数: [6711], [4257353], [1], [4255900], [0], [], [], []
Sat Jul 26 10:31:59 2014
Errors in file /oracle/admin/icdb/udump/icdb_ora_56557800.trc:
ORA-00600: 内部错误代码, 参数: [6711], [4257353], [1], [4255900], [0], [], [], []
Sat Jul 26 11:00:30 2014
Errors in file /oracle/admin/icdb/bdump/icdb_m000_56557774.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ], [0], [0], [1], [], [], [], []
Sat Jul 26 11:01:00 2014
Errors in file /oracle/admin/icdb/udump/icdb_ora_57344088.trc:
ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []
Sat Jul 26 12:00:39 2014
Errors in file /oracle/admin/icdb/bdump/icdb_m000_63307938.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ], [0], [0], [1], [], [], [], []
Sat Jul 26 12:06:47 2014
Thread 1 advanced to log sequence 32908 (LGWR switch)
  Current log# 1 seq# 32908 mem# 0: /database/icdb/redo01.log
Sat Jul 26 12:09:41 2014
Thread 1 advanced to log sequence 32909 (LGWR switch)
  Current log# 2 seq# 32909 mem# 0: /database/icdb/redo02.log
Sat Jul 26 12:13:06 2014
Thread 1 advanced to log sequence 32910 (LGWR switch)
  Current log# 3 seq# 32910 mem# 0: /database/icdb/redo03.log
Sat Jul 26 12:16:03 2014
Thread 1 advanced to log sequence 32911 (LGWR switch)
  Current log# 1 seq# 32911 mem# 0: /database/icdb/redo01.log
Sat Jul 26 12:18:58 2014
Thread 1 advanced to log sequence 32912 (LGWR switch)
  Current log# 2 seq# 32912 mem# 0: /database/icdb/redo02.log
Sat Jul 26 13:00:41 2014
Errors in file /oracle/admin/icdb/bdump/icdb_m000_7668146.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ], [0], [0], [1], [], [], [], []
Sat Jul 26 13:00:44 2014
Errors in file /oracle/admin/icdb/bdump/icdb_m000_7668146.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [9001], [12637871], [26], [12613723], [3], []

可以确认
ORA-00600: internal error code, arguments: [qertbFetchByRowID]
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []以及为BUG

7329252 10.2.0.4.4, 10.2.0.5, 11.1.0.7.5, 11.2.0.1 ORA-8102/ORA-1499/OERI[kdsgrp1] Index corruption after rebuild index ONLINE

 

剩下就是

[kcbz_check_objd_typ], [13013],[6711]

先分析6711的错误,根据对整体日志的分析过程如下:

[root@ludatou /]# cat alert_icdb.log | grep 6711 |wc -l
39
[root@ludatou /]# cat alert_icdb.log | grep 4257353 |wc -l
31
[root@ludatou /]# cat alert_icdb.log | grep 4255900 |wc -l
31
[root@ludatou /]# cat alert_icdb.log | grep -v 4257353 |wc -l
171179
[root@ludatou /]# cat alert_icdb.log | grep -v 4257353 |grep 6711 |wc -l
8
[root@ludatou /]# cat alert_icdb.log | grep -v 4257353 |grep 6711
Thread 1 cannot allocate new log, sequence 6711
Thread 1 advanced to log sequence 6711 (LGWR switch)
 Current log# 3 seq# 6711 mem# 0: /database/icdb/redo03.log
 Current log# 3 seq# 6711 mem# 0: /database/icdb/redo03.log
Thread 1 advanced to log sequence 16711 (LGWR switch)
 Current log# 1 seq# 16711 mem# 0: /database/icdb/redo01.log
Thread 1 advanced to log sequence 26711 (LGWR switch)
 Current log# 2 seq# 26711 mem# 0: /database/icdb/redo02.log

通过以上的分析我们可以发现,6711的错误集中的对应数据块都为
[6711], [4257353], [1], [4255900]

SQL> SELECT dbms_utility.data_block_address_block(4257353) "BLOCK",
 2 dbms_utility.data_block_address_file(4257353) "FILE" from dual;

 BLOCK FILE
---------- ----------
 63049 1

SQL> SELECT dbms_utility.data_block_address_block(4255900) "BLOCK",
 2 dbms_utility.data_block_address_file(4255900) "FILE" from dual;

 BLOCK FILE
---------- ----------
 61596 1

set linesize 200;
select segment_name, segment_type
from dba_extents
where relative_fno = 1
and (663049 between block_id and block_id + blocks or
61596 between block_id and block_id + blocks);



遭遇ORA-00603/27102|Linux-x86_64 Error: 12

这个问题来的比较急,来邮件时候是下午4点多,但是数据库运行是正常的,从告警日志上看是进程无法获取内存了,导致客户端进程宕掉。

27102错误一般是oracle用户的系统资源限制设置过小导致,但是这里的情况却是不一样。

具体先看如下告警日志:
 

Fri Jul 25 12:51:08 CST 2014
Thread 1 advanced to log sequence 5862 (LGWR switch)
  Current log# 3 seq# 5862 mem# 0: +DG1/wasudb/onlinelog/group_3.263.842714473
Fri Jul 25 12:51:10 CST 2014
ARC1: Standby redo logfile selected for thread 1 sequence 5861 for destination LOG_ARCHIVE_DEST_2
Fri Jul 25 12:51:57 CST 2014
Thread 1 cannot allocate new log, sequence 5863
Checkpoint not complete
  Current log# 3 seq# 5862 mem# 0: +DG1/wasudb/onlinelog/group_3.263.842714473
Fri Jul 25 12:51:59 CST 2014
Thread 1 advanced to log sequence 5863 (LGWR switch)
  Current log# 1 seq# 5863 mem# 0: +DG1/wasudb/onlinelog/group_1.261.842714471
Fri Jul 25 12:52:02 CST 2014
ARC1: Standby redo logfile selected for thread 1 sequence 5862 for destination LOG_ARCHIVE_DEST_2
Fri Jul 25 13:01:45 CST 2014
Errors in file /oracle/admin/wasudb/udump/wasudb_ora_13870.trc:
ORA-27102: out of memory
Linux-x86_64 Error: 12: Cannot allocate memory
Additional information: 108
Additional information: 3080192
Fri Jul 25 13:01:48 CST 2014
Errors in file /oracle/admin/wasudb/udump/wasudb_ora_13870.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27102: out of memory
Linux-x86_64 Error: 12: Cannot allocate memory
Additional information: 108
Additional information: 3080192

告警的分析在第一段话就已经带出,在此时我们通过ulimitd -a查看oracle用户的分配信息,都是非常大的,不存在过小的设置。
在这种情况下非常有必要看下这个ora_13870.trc文件,此文件部分信息如下:

/oracle/admin/wasudb/udump/wasudb_ora_13870.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/10.2.0/db_1
System name:	Linux
Node name:	localhost
Release:	3.8.13-16.2.1.el6uek.x86_64
Version:	#1 SMP Thu Nov 7 17:01:44 PST 2013
Machine:	x86_64
Instance name: wasudb
Redo thread mounted by this instance: 1
Oracle process number: 51
Unix process pid: 13870, image: oracle@localhost

*** 2014-07-25 13:01:44.015
*** SERVICE NAME:(SYS$USERS) 2014-07-25 13:01:43.995
*** SESSION ID:(6209.59205) 2014-07-25 13:01:43.995
=======================================
TOP 10 MEMORY USES FOR THIS PROCESS
---------------------------------------
81% 3328 MB, 213622 chunks: "pmuccst: adt/record       "
         koh-kghu call   ds=0x7f8e128632f8  dsprt=0x6ae9740
13%  544 MB, 8868 chunks: "permanent memory          "  SQL
         sort subheap    ds=0x7f8e121f7a00  dsprt=0x7f8e12891058
 5%  209 MB, 13443 chunks: "pmucalm coll              "
         koh-kghu call   ds=0x7f8e128632f8  dsprt=0x6ae9740
 0% 7467 KB, 8948 chunks: "free memory               "
         session heap    ds=0x7f8e12866290  dsprt=0x6aed2c0
 0% 1284 KB,  22 chunks: "permanent memory          "
         pga heap        ds=0x6ae9740  dsprt=(nil)
 0%  424 KB,   6 chunks: "free memory               "
         top call heap   ds=0x6aed0a0  dsprt=(nil)
 0%  331 KB,  82 chunks: "kxsFrame4kPage            "
         session heap    ds=0x7f8e12866290  dsprt=0x6aed2c0
 0%  274 KB,  10 chunks: "static frame of inst      "
         koh-kghu sessi  ds=0x7f8e11fabe48  dsprt=0x7f8e12866290
 0%  224 KB,   7 chunks: "kxsFrame32kPage           "
         session heap    ds=0x7f8e12866290  dsprt=0x6aed2c0
 0%  128 KB,   1 chunk : "free memory               "
         KSFD vec heap   ds=0x6aeed80  dsprt=0x6ae9740

=======================================
PRIVATE MEMORY SUMMARY FOR THIS PROCESS
---------------------------------------
******************************************************
PRIVATE HEAP SUMMARY DUMP
4103 MB total:
  4101 MB commented, 1286 KB permanent
   537 KB free (0 KB in empty extents),
    3546 MB,   3 heaps:   "koh-kghu call  "            51 KB free held
     554 MB,   1 heap:    "session heap   "
------------------------------------------------------
Summary of subheaps at depth 1
4091 MB total:
  4084 MB commented, 87 KB permanent
  7603 KB free (132 KB in empty extents),
     2 KB uncommented freeable with mark,     3328 MB, 213622 chunks:  "pmuccst: adt/record       "
     552 MB,   1 heap:    "kxs-heap-w     "            7408 KB free held

=========================================
REAL-FREE ALLOCATOR DUMP FOR THIS PROCESS
-----------------------------------------

Dump of Real-Free Memory Allocator Heap [0x7f8e1283e000]
mag=0xfefe0001 flg=0x5000003 fds=0x7 blksz=65536
blkdstbl=0x7f8e1283e010, iniblk=68608 maxblk=262144 numsegs=115
In-use num=65638 siz=8978432, Freeable num=0 siz=0, Free num=1 siz=6619136

==========================================
INSTANCE-WIDE PRIVATE MEMORY USAGE SUMMARY
------------------------------------------

Dumping Work Area Table (level=1)
=====================================

  Global SGA Info
  ---------------

    global target:    40000 MB
    auto target:      31163 MB
    max pga:           2048 MB
    pga limit:         4096 MB
    pga limit known:  0
    pga limit errors:     0

    pga inuse:         6534 MB
    pga alloc:         7384 MB
    pga freeable:       707 MB
    pga freed:        5776451 MB
    pga to free:          0 %

    pga auto:          1160 MB
    pga manual:           0 MB

    pga alloc  (max): 15183 MB
    pga auto   (max):  4383 MB
    pga manual (max):    10 MB

    # workareas     :     0
    # workareas(max):   115



================================
PER-PROCESS PRIVATE MEMORY USAGE
--------------------------------

Private memory usage per Oracle process

-------------------------
Top 10 processes:
-------------------------
(percentage is of 8065 MB total allocated memory)
51% pid 51: 4102 MB used of 4105 MB allocated  < = CURRENT PROC
 7% pid 101: 536 MB used of 537 MB allocated
 3% pid 127: 52 MB used of 211 MB allocated (158 MB freeable)
 2% pid 94: 37 MB used of 138 MB allocated (101 MB freeable)
 1% pid 215: 101 MB used of 120 MB allocated (18 MB freeable)
 1% pid 216: 101 MB used of 119 MB allocated (18 MB freeable)
 1% pid 254: 101 MB used of 119 MB allocated (18 MB freeable)
 1% pid 278: 101 MB used of 119 MB allocated (18 MB freeable)
 1% pid 155: 100 MB used of 116 MB allocated (15 MB freeable)
 1% pid 156: 111 MB used of 111 MB allocated

-------------------------
All processes:
-------------------------
(session detail when over 403 MB allocated)

pid 2: 295 KB used of 374 KB allocated
pid 3: 294 KB used of 374 KB allocated
pid 4: 388 KB used of 502 KB allocated
pid 5: 8205 KB used of 14 MB allocated (3136 KB freeable)
pid 6: 8409 KB used of 13 MB allocated (2112 KB freeable)
pid 7: 8356 KB used of 15 MB allocated (4160 KB freeable)
....................(省略很多)

通过trace中的信息我们可以看到PER-PROCESS PRIVATE MEMORY USAGE的TOP 10 process以及all process的信息,其中PID为51的进程(TOP 1)马上就入了大头的法眼~进程开销PGA已经4个G了,基本可以联想到肯定是超大DML事务带有排序等没有批量提交导致把内存撑爆!经过检查发现是一个存储过程在跑大量数据的处理,验证了我的判断,协助客户发给应用开发处理,并建议缩小事务,尽量采用批量的提交方式。

从以上也可以看出,硬件资源是有限的,即使给oracle用户分配的资源再多,只要应用开发设计得不合理,你永远都不会满足这应用的需求!

长沙/武汉/南京招聘Oracle DBA

公司需要在长沙,武汉,南京需要招Oracle DBA了,感兴趣的朋友欢迎自荐或推荐。此次职位的工作地点是长沙/武汉/南京,如果家乡在这三个城市或者有意向在这三个城市发展的的朋友,这是不可多得的机会。
公司:北京东方龙马软件发展有限公司
简历请发:feigigi@qq.com
职位描述如下:
一、加入公司,即会获得:

1、良好的工作待遇
2、齐全的保险和公积金
3、以人为本,开放自由的Home office工作环境
4、公司提供良好的职业发展机遇
5、金融领域Oracle的经验积累

二、要求

1、对Oracle数据库有强烈的兴趣
2、有扎实的基础,擅长Oracle数据库的安装部署、性能优化、故障处理,熟悉RAC
3、有3年以上工作经验
4、有OCP证书优先
5、有较强的沟通能力
6、有较强的文档写作能力

三、工作地点

长沙/武汉/南京

手工推进oracle scn方式(3)通过bbed推进scn

采用bbed修改scn的原理如下

File 1 block 1 是块头,记录了该 datafile header scn(整个datafile 的scn已该header scn为准的)

BBED> set file 1 block 1
FILE# 1
BLOCK# 1

BBED> p kcvfhckp (只用在block为1的地方)
struct kcvfhckp, 36 bytes @484
struct kcvcpscn, 8 bytes @484
ub4 kscnbas @484 0x000f2d1a
ub2 kscnwrp @488 0x0000
ub4 kcvcptim @492 0x2def0690
ub2 kcvcpthr @496 0x0001
union u, 12 bytes @500
struct kcvcprba, 12 bytes @500
ub4 kcrbaseq @500 0x00000005
ub4 kcrbabno @504 0x000033cf
ub2 kcrbabof @508 0x0010
ub1 kcvcpetb[0] @512 0x02
ub1 kcvcpetb[1] @513 0x00
ub1 kcvcpetb[2] @514 0x00
ub1 kcvcpetb[3] @515 0x00
ub1 kcvcpetb[4] @516 0x00
ub1 kcvcpetb[5] @517 0x00
ub1 kcvcpetb[6] @518 0x00
ub1 kcvcpetb[7] @519 0x00

一个block 8k(8192),在偏移量为8的位置上记录了该block的SCN,

BBED> set block 128
BLOCK# 128

BBED> p bas_kcbh
ub4 bas_kcbh @8 0x000f2d16

可以看到当前block的scn(0x000f2d16<0x000f2d1a) 正常

BBED> p tailchk
ub4 tailchk @8188 0x2d160e01

Tailchecks (p tailchk)
The tail of an Oracle 8+ block is a concatenation of the lower order two bytes ofthe SCN base, the block type and the SCN sequence number.
Oracle block tail (该block的 最后4个bytes)由4个bytes组成,但实际上只用了低2个bytes来存放。 2个bytes的tail 由scn base,block type 和 scn sequence 组成。

E.g, if the SCN base number is 0x000f2d16 ,the block type is 0e and the SCN sequence number is 0x01, the tail check wouldbe 0x97280602:

SCN base Type SCN seq
2d16 0e 01

可以计算出 在block 的offset 8188(8192-2*2)为该block的 scn base=用p bas_kcbh查出的 后四位
所以在修改block的scn时,除了需要修改block bas_kcbh(offset 8) 还要修改tailchk的scn,只修改scn bash,而block type、scn sequence不变
并且注意:
在存放的时候是按照低位字节存放的the numbers are stored in little endian format (the low-order byte of thenumber is stored in memory at the lowest address) as this example database isrunning on Linux on an Intel platform.

根据以上描述同理,我们可以对数据库的scn进行变更,增大或者缩小。而2662的错误就是因为,某个数据块的scn与数据文件头scn不一致,比数据文件头的scn还要大导致的,所以同理也可以用此方法模拟ora-00600 2662错误。