这个问题来的比较急,来邮件时候是下午4点多,但是数据库运行是正常的,从告警日志上看是进程无法获取内存了,导致客户端进程宕掉。
27102错误一般是oracle用户的系统资源限制设置过小导致,但是这里的情况却是不一样。
具体先看如下告警日志:
Fri Jul 25 12:51:08 CST 2014 Thread 1 advanced to log sequence 5862 (LGWR switch) Current log# 3 seq# 5862 mem# 0: +DG1/wasudb/onlinelog/group_3.263.842714473 Fri Jul 25 12:51:10 CST 2014 ARC1: Standby redo logfile selected for thread 1 sequence 5861 for destination LOG_ARCHIVE_DEST_2 Fri Jul 25 12:51:57 CST 2014 Thread 1 cannot allocate new log, sequence 5863 Checkpoint not complete Current log# 3 seq# 5862 mem# 0: +DG1/wasudb/onlinelog/group_3.263.842714473 Fri Jul 25 12:51:59 CST 2014 Thread 1 advanced to log sequence 5863 (LGWR switch) Current log# 1 seq# 5863 mem# 0: +DG1/wasudb/onlinelog/group_1.261.842714471 Fri Jul 25 12:52:02 CST 2014 ARC1: Standby redo logfile selected for thread 1 sequence 5862 for destination LOG_ARCHIVE_DEST_2 Fri Jul 25 13:01:45 CST 2014 Errors in file /oracle/admin/wasudb/udump/wasudb_ora_13870.trc: ORA-27102: out of memory Linux-x86_64 Error: 12: Cannot allocate memory Additional information: 108 Additional information: 3080192 Fri Jul 25 13:01:48 CST 2014 Errors in file /oracle/admin/wasudb/udump/wasudb_ora_13870.trc: ORA-00603: ORACLE server session terminated by fatal error ORA-27102: out of memory Linux-x86_64 Error: 12: Cannot allocate memory Additional information: 108 Additional information: 3080192
告警的分析在第一段话就已经带出,在此时我们通过ulimitd -a查看oracle用户的分配信息,都是非常大的,不存在过小的设置。
在这种情况下非常有必要看下这个ora_13870.trc文件,此文件部分信息如下:
/oracle/admin/wasudb/udump/wasudb_ora_13870.trc Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /oracle/product/10.2.0/db_1 System name: Linux Node name: localhost Release: 3.8.13-16.2.1.el6uek.x86_64 Version: #1 SMP Thu Nov 7 17:01:44 PST 2013 Machine: x86_64 Instance name: wasudb Redo thread mounted by this instance: 1 Oracle process number: 51 Unix process pid: 13870, image: oracle@localhost *** 2014-07-25 13:01:44.015 *** SERVICE NAME:(SYS$USERS) 2014-07-25 13:01:43.995 *** SESSION ID:(6209.59205) 2014-07-25 13:01:43.995 ======================================= TOP 10 MEMORY USES FOR THIS PROCESS --------------------------------------- 81% 3328 MB, 213622 chunks: "pmuccst: adt/record " koh-kghu call ds=0x7f8e128632f8 dsprt=0x6ae9740 13% 544 MB, 8868 chunks: "permanent memory " SQL sort subheap ds=0x7f8e121f7a00 dsprt=0x7f8e12891058 5% 209 MB, 13443 chunks: "pmucalm coll " koh-kghu call ds=0x7f8e128632f8 dsprt=0x6ae9740 0% 7467 KB, 8948 chunks: "free memory " session heap ds=0x7f8e12866290 dsprt=0x6aed2c0 0% 1284 KB, 22 chunks: "permanent memory " pga heap ds=0x6ae9740 dsprt=(nil) 0% 424 KB, 6 chunks: "free memory " top call heap ds=0x6aed0a0 dsprt=(nil) 0% 331 KB, 82 chunks: "kxsFrame4kPage " session heap ds=0x7f8e12866290 dsprt=0x6aed2c0 0% 274 KB, 10 chunks: "static frame of inst " koh-kghu sessi ds=0x7f8e11fabe48 dsprt=0x7f8e12866290 0% 224 KB, 7 chunks: "kxsFrame32kPage " session heap ds=0x7f8e12866290 dsprt=0x6aed2c0 0% 128 KB, 1 chunk : "free memory " KSFD vec heap ds=0x6aeed80 dsprt=0x6ae9740 ======================================= PRIVATE MEMORY SUMMARY FOR THIS PROCESS --------------------------------------- ****************************************************** PRIVATE HEAP SUMMARY DUMP 4103 MB total: 4101 MB commented, 1286 KB permanent 537 KB free (0 KB in empty extents), 3546 MB, 3 heaps: "koh-kghu call " 51 KB free held 554 MB, 1 heap: "session heap " ------------------------------------------------------ Summary of subheaps at depth 1 4091 MB total: 4084 MB commented, 87 KB permanent 7603 KB free (132 KB in empty extents), 2 KB uncommented freeable with mark, 3328 MB, 213622 chunks: "pmuccst: adt/record " 552 MB, 1 heap: "kxs-heap-w " 7408 KB free held ========================================= REAL-FREE ALLOCATOR DUMP FOR THIS PROCESS ----------------------------------------- Dump of Real-Free Memory Allocator Heap [0x7f8e1283e000] mag=0xfefe0001 flg=0x5000003 fds=0x7 blksz=65536 blkdstbl=0x7f8e1283e010, iniblk=68608 maxblk=262144 numsegs=115 In-use num=65638 siz=8978432, Freeable num=0 siz=0, Free num=1 siz=6619136 ========================================== INSTANCE-WIDE PRIVATE MEMORY USAGE SUMMARY ------------------------------------------ Dumping Work Area Table (level=1) ===================================== Global SGA Info --------------- global target: 40000 MB auto target: 31163 MB max pga: 2048 MB pga limit: 4096 MB pga limit known: 0 pga limit errors: 0 pga inuse: 6534 MB pga alloc: 7384 MB pga freeable: 707 MB pga freed: 5776451 MB pga to free: 0 % pga auto: 1160 MB pga manual: 0 MB pga alloc (max): 15183 MB pga auto (max): 4383 MB pga manual (max): 10 MB # workareas : 0 # workareas(max): 115 ================================ PER-PROCESS PRIVATE MEMORY USAGE -------------------------------- Private memory usage per Oracle process ------------------------- Top 10 processes: ------------------------- (percentage is of 8065 MB total allocated memory) 51% pid 51: 4102 MB used of 4105 MB allocated < = CURRENT PROC 7% pid 101: 536 MB used of 537 MB allocated 3% pid 127: 52 MB used of 211 MB allocated (158 MB freeable) 2% pid 94: 37 MB used of 138 MB allocated (101 MB freeable) 1% pid 215: 101 MB used of 120 MB allocated (18 MB freeable) 1% pid 216: 101 MB used of 119 MB allocated (18 MB freeable) 1% pid 254: 101 MB used of 119 MB allocated (18 MB freeable) 1% pid 278: 101 MB used of 119 MB allocated (18 MB freeable) 1% pid 155: 100 MB used of 116 MB allocated (15 MB freeable) 1% pid 156: 111 MB used of 111 MB allocated ------------------------- All processes: ------------------------- (session detail when over 403 MB allocated) pid 2: 295 KB used of 374 KB allocated pid 3: 294 KB used of 374 KB allocated pid 4: 388 KB used of 502 KB allocated pid 5: 8205 KB used of 14 MB allocated (3136 KB freeable) pid 6: 8409 KB used of 13 MB allocated (2112 KB freeable) pid 7: 8356 KB used of 15 MB allocated (4160 KB freeable) ....................(省略很多)
通过trace中的信息我们可以看到PER-PROCESS PRIVATE MEMORY USAGE的TOP 10 process以及all process的信息,其中PID为51的进程(TOP 1)马上就入了大头的法眼~进程开销PGA已经4个G了,基本可以联想到肯定是超大DML事务带有排序等没有批量提交导致把内存撑爆!经过检查发现是一个存储过程在跑大量数据的处理,验证了我的判断,协助客户发给应用开发处理,并建议缩小事务,尽量采用批量的提交方式。
从以上也可以看出,硬件资源是有限的,即使给oracle用户分配的资源再多,只要应用开发设计得不合理,你永远都不会满足这应用的需求!