遭遇ORA-00603/27102|Linux-x86_64 Error: 12
这个问题来的比较急,来邮件时候是下午4点多,但是数据库运行是正常的,从告警日志上看是进程无法获取内存了,导致客户端进程宕掉。
27102错误一般是oracle用户的系统资源限制设置过小导致,但是这里的情况却是不一样。
具体先看如下告警日志:
Fri Jul 25 12:51:08 CST 2014 Thread 1 advanced to log sequence 5862 (LGWR switch) Current log# 3 seq# 5862 mem# 0: +DG1/wasudb/onlinelog/group_3.263.842714473 Fri Jul 25 12:51:10 CST 2014 ARC1: Standby redo logfile selected for thread 1 sequence 5861 for destination LOG_ARCHIVE_DEST_2 Fri Jul 25 12:51:57 CST 2014 Thread 1 cannot allocate new log, sequence 5863 Checkpoint not complete Current log# 3 seq# 5862 mem# 0: +DG1/wasudb/onlinelog/group_3.263.842714473 Fri Jul 25 12:51:59 CST 2014 Thread 1 advanced to log sequence 5863 (LGWR switch) Current log# 1 seq# 5863 mem# 0: +DG1/wasudb/onlinelog/group_1.261.842714471 Fri Jul 25 12:52:02 CST 2014 ARC1: Standby redo logfile selected for thread 1 sequence 5862 for destination LOG_ARCHIVE_DEST_2 Fri Jul 25 13:01:45 CST 2014 Errors in file /oracle/admin/wasudb/udump/wasudb_ora_13870.trc: ORA-27102: out of memory Linux-x86_64 Error: 12: Cannot allocate memory Additional information: 108 Additional information: 3080192 Fri Jul 25 13:01:48 CST 2014 Errors in file /oracle/admin/wasudb/udump/wasudb_ora_13870.trc: ORA-00603: ORACLE server session terminated by fatal error ORA-27102: out of memory Linux-x86_64 Error: 12: Cannot allocate memory Additional information: 108 Additional information: 3080192
告警的分析在第一段话就已经带出,在此时我们通过ulimitd -a查看oracle用户的分配信息,都是非常大的,不存在过小的设置。
在这种情况下非常有必要看下这个ora_13870.trc文件,此文件部分信息如下:
/oracle/admin/wasudb/udump/wasudb_ora_13870.trc Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /oracle/product/10.2.0/db_1 System name: Linux Node name: localhost Release: 3.8.13-16.2.1.el6uek.x86_64 Version: #1 SMP Thu Nov 7 17:01:44 PST 2013 Machine: x86_64 Instance name: wasudb Redo thread mounted by this instance: 1 Oracle process number: 51 Unix process pid: 13870, image: oracle@localhost *** 2014-07-25 13:01:44.015 *** SERVICE NAME:(SYS$USERS) 2014-07-25 13:01:43.995 *** SESSION ID:(6209.59205) 2014-07-25 13:01:43.995 ======================================= TOP 10 MEMORY USES FOR THIS PROCESS --------------------------------------- 81% 3328 MB, 213622 chunks: "pmuccst: adt/record " koh-kghu call ds=0x7f8e128632f8 dsprt=0x6ae9740 13% 544 MB, 8868 chunks: "permanent memory " SQL sort subheap ds=0x7f8e121f7a00 dsprt=0x7f8e12891058 5% 209 MB, 13443 chunks: "pmucalm coll " koh-kghu call ds=0x7f8e128632f8 dsprt=0x6ae9740 0% 7467 KB, 8948 chunks: "free memory " session heap ds=0x7f8e12866290 dsprt=0x6aed2c0 0% 1284 KB, 22 chunks: "permanent memory " pga heap ds=0x6ae9740 dsprt=(nil) 0% 424 KB, 6 chunks: "free memory " top call heap ds=0x6aed0a0 dsprt=(nil) 0% 331 KB, 82 chunks: "kxsFrame4kPage " session heap ds=0x7f8e12866290 dsprt=0x6aed2c0 0% 274 KB, 10 chunks: "static frame of inst " koh-kghu sessi ds=0x7f8e11fabe48 dsprt=0x7f8e12866290 0% 224 KB, 7 chunks: "kxsFrame32kPage " session heap ds=0x7f8e12866290 dsprt=0x6aed2c0 0% 128 KB, 1 chunk : "free memory " KSFD vec heap ds=0x6aeed80 dsprt=0x6ae9740 ======================================= PRIVATE MEMORY SUMMARY FOR THIS PROCESS --------------------------------------- ****************************************************** PRIVATE HEAP SUMMARY DUMP 4103 MB total: 4101 MB commented, 1286 KB permanent 537 KB free (0 KB in empty extents), 3546 MB, 3 heaps: "koh-kghu call " 51 KB free held 554 MB, 1 heap: "session heap " ------------------------------------------------------ Summary of subheaps at depth 1 4091 MB total: 4084 MB commented, 87 KB permanent 7603 KB free (132 KB in empty extents), 2 KB uncommented freeable with mark, 3328 MB, 213622 chunks: "pmuccst: adt/record " 552 MB, 1 heap: "kxs-heap-w " 7408 KB free held ========================================= REAL-FREE ALLOCATOR DUMP FOR THIS PROCESS ----------------------------------------- Dump of Real-Free Memory Allocator Heap [0x7f8e1283e000] mag=0xfefe0001 flg=0x5000003 fds=0x7 blksz=65536 blkdstbl=0x7f8e1283e010, iniblk=68608 maxblk=262144 numsegs=115 In-use num=65638 siz=8978432, Freeable num=0 siz=0, Free num=1 siz=6619136 ========================================== INSTANCE-WIDE PRIVATE MEMORY USAGE SUMMARY ------------------------------------------ Dumping Work Area Table (level=1) ===================================== Global SGA Info --------------- global target: 40000 MB auto target: 31163 MB max pga: 2048 MB pga limit: 4096 MB pga limit known: 0 pga limit errors: 0 pga inuse: 6534 MB pga alloc: 7384 MB pga freeable: 707 MB pga freed: 5776451 MB pga to free: 0 % pga auto: 1160 MB pga manual: 0 MB pga alloc (max): 15183 MB pga auto (max): 4383 MB pga manual (max): 10 MB # workareas : 0 # workareas(max): 115 ================================ PER-PROCESS PRIVATE MEMORY USAGE -------------------------------- Private memory usage per Oracle process ------------------------- Top 10 processes: ------------------------- (percentage is of 8065 MB total allocated memory) 51% pid 51: 4102 MB used of 4105 MB allocated < = CURRENT PROC 7% pid 101: 536 MB used of 537 MB allocated 3% pid 127: 52 MB used of 211 MB allocated (158 MB freeable) 2% pid 94: 37 MB used of 138 MB allocated (101 MB freeable) 1% pid 215: 101 MB used of 120 MB allocated (18 MB freeable) 1% pid 216: 101 MB used of 119 MB allocated (18 MB freeable) 1% pid 254: 101 MB used of 119 MB allocated (18 MB freeable) 1% pid 278: 101 MB used of 119 MB allocated (18 MB freeable) 1% pid 155: 100 MB used of 116 MB allocated (15 MB freeable) 1% pid 156: 111 MB used of 111 MB allocated ------------------------- All processes: ------------------------- (session detail when over 403 MB allocated) pid 2: 295 KB used of 374 KB allocated pid 3: 294 KB used of 374 KB allocated pid 4: 388 KB used of 502 KB allocated pid 5: 8205 KB used of 14 MB allocated (3136 KB freeable) pid 6: 8409 KB used of 13 MB allocated (2112 KB freeable) pid 7: 8356 KB used of 15 MB allocated (4160 KB freeable) ....................(省略很多)
通过trace中的信息我们可以看到PER-PROCESS PRIVATE MEMORY USAGE的TOP 10 process以及all process的信息,其中PID为51的进程(TOP 1)马上就入了大头的法眼~进程开销PGA已经4个G了,基本可以联想到肯定是超大DML事务带有排序等没有批量提交导致把内存撑爆!经过检查发现是一个存储过程在跑大量数据的处理,验证了我的判断,协助客户发给应用开发处理,并建议缩小事务,尽量采用批量的提交方式。
从以上也可以看出,硬件资源是有限的,即使给oracle用户分配的资源再多,只要应用开发设计得不合理,你永远都不会满足这应用的需求!
长沙/武汉/南京招聘Oracle DBA
公司需要在长沙,武汉,南京需要招Oracle DBA了,感兴趣的朋友欢迎自荐或推荐。此次职位的工作地点是长沙/武汉/南京,如果家乡在这三个城市或者有意向在这三个城市发展的的朋友,这是不可多得的机会。
公司:北京东方龙马软件发展有限公司
简历请发:feigigi@qq.com
职位描述如下:
一、加入公司,即会获得:
1、良好的工作待遇
2、齐全的保险和公积金
3、以人为本,开放自由的Home office工作环境
4、公司提供良好的职业发展机遇
5、金融领域Oracle的经验积累
二、要求
1、对Oracle数据库有强烈的兴趣
2、有扎实的基础,擅长Oracle数据库的安装部署、性能优化、故障处理,熟悉RAC
3、有3年以上工作经验
4、有OCP证书优先
5、有较强的沟通能力
6、有较强的文档写作能力
三、工作地点
长沙/武汉/南京
手工推进oracle scn方式(3)通过bbed推进scn
采用bbed修改scn的原理如下
File 1 block 1 是块头,记录了该 datafile header scn(整个datafile 的scn已该header scn为准的)
BBED> set file 1 block 1 FILE# 1 BLOCK# 1 BBED> p kcvfhckp (只用在block为1的地方) struct kcvfhckp, 36 bytes @484 struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0x000f2d1a ub2 kscnwrp @488 0x0000 ub4 kcvcptim @492 0x2def0690 ub2 kcvcpthr @496 0x0001 union u, 12 bytes @500 struct kcvcprba, 12 bytes @500 ub4 kcrbaseq @500 0x00000005 ub4 kcrbabno @504 0x000033cf ub2 kcrbabof @508 0x0010 ub1 kcvcpetb[0] @512 0x02 ub1 kcvcpetb[1] @513 0x00 ub1 kcvcpetb[2] @514 0x00 ub1 kcvcpetb[3] @515 0x00 ub1 kcvcpetb[4] @516 0x00 ub1 kcvcpetb[5] @517 0x00 ub1 kcvcpetb[6] @518 0x00 ub1 kcvcpetb[7] @519 0x00
一个block 8k(8192),在偏移量为8的位置上记录了该block的SCN,
BBED> set block 128 BLOCK# 128 BBED> p bas_kcbh ub4 bas_kcbh @8 0x000f2d16
可以看到当前block的scn(0x000f2d16<0x000f2d1a) 正常
BBED> p tailchk ub4 tailchk @8188 0x2d160e01
Tailchecks (p tailchk)
The tail of an Oracle 8+ block is a concatenation of the lower order two bytes ofthe SCN base, the block type and the SCN sequence number.
Oracle block tail (该block的 最后4个bytes)由4个bytes组成,但实际上只用了低2个bytes来存放。 2个bytes的tail 由scn base,block type 和 scn sequence 组成。
E.g, if the SCN base number is 0x000f2d16 ,the block type is 0e and the SCN sequence number is 0x01, the tail check wouldbe 0x97280602:
SCN base Type SCN seq
2d16 0e 01
可以计算出 在block 的offset 8188(8192-2*2)为该block的 scn base=用p bas_kcbh查出的 后四位
所以在修改block的scn时,除了需要修改block bas_kcbh(offset 8) 还要修改tailchk的scn,只修改scn bash,而block type、scn sequence不变
并且注意:
在存放的时候是按照低位字节存放的the numbers are stored in little endian format (the low-order byte of thenumber is stored in memory at the lowest address) as this example database isrunning on Linux on an Intel platform.
根据以上描述同理,我们可以对数据库的scn进行变更,增大或者缩小。而2662的错误就是因为,某个数据块的scn与数据文件头scn不一致,比数据文件头的scn还要大导致的,所以同理也可以用此方法模拟ora-00600 2662错误。
有关log file parallel write等待
该事件只在lgwr进程上发生的等待事件,为lgwr进程将内存中redo entry写进redo file的IO工作期间等待该事件。此等待事件与db file parallel write等待原理类似。比如在IO系统没达到高负荷时候,但是脏缓冲图的数据量过多时候,db file parallel write的等待事件就会增多,与此相同,当IO没问题时候,但是重做数据量过多的时候,log file parallel write等待就可能会增加。该问题一方面受限于oracle lgwr的处理能力,一方面受限于redo entry的产生量(归根到底还是应用方面)。
如果想要降低不是因为IO问题引起的log file parallel write的等待,一般都从以下几点思考
1.减少提交次数
2.尽量少产生redo entry
3.避免hot backup
4.提高redo所在文件系统的io性能
该等待事件在平常能留给我们调整的余地较小,在目前的OLTP容灾普及的情况下,nologging方式基本不可能。