Skip to content

Oracle 排障配置与调整 - 4. page

Ora-00494 / Aiowait timeout cause instance crashed.(SYSTEM DUMP分析案例)

报错如下:

Wed Oct 21 03:06:22 2015
Thread 1 advanced to log sequence 254206 (LGWR switch)
  Current log# 9 seq# 254206 mem# 0: /boss_sysdata/oradata/dtvboss/redo09.log
Wed Oct 21 03:17:10 2015
WARNING: aiowait timed out 1 times
Wed Oct 21 03:22:11 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 367'
Wed Oct 21 03:22:17 2015
System State dumped to trace file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc
Wed Oct 21 03:22:50 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 367'
Wed Oct 21 03:22:55 2015
Killing enqueue blocker (pid=367) on resource CF-00000000-00000000
 by killing session 1648.1
Wed Oct 21 03:22:55 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-07445: exception encountered: core dump [ksuklms()+672] [SIGSEGV] [Address not mapped to object] [0x000000062] [] []
Wed Oct 21 03:23:18 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_pmon_343.trc:
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:18 2015
PMON: terminating instance due to error 470
Wed Oct 21 03:23:18 2015
ORA-470 encountered when generating server alert SMG-3503
Wed Oct 21 03:23:18 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_j006_22335.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:22 2015
Termination issued to instance processes. Waiting for the processes to exit
Wed Oct 21 03:23:28 2015
Instance termination failed to kill one or more processes
Wed Oct 21 03:23:58 2015
Instance terminated by PMON, pid = 343
Wed Oct 21 03:24:06 2015
Shutting down instance (abort)
Wed Oct 21 03:25:18 2015
WARNING: inbound connection timed out (ORA-3136)
Wed Oct 21 03:25:18 2015
WARNING: inbound connection timed out (ORA-3136)
Wed Oct 21 03:25:18 2015

1.宕机的起因分析:

告警日志分析:
Wed Oct 21 03:06:22 2015
Thread 1 advanced to log sequence 254206 (LGWR switch)
Current log# 9 seq# 254206 mem# 0: /boss_sysdata/oradata/dtvboss/redo09.log
Wed Oct 21 03:17:10 2015
WARNING: aiowait timed out 1 times –该时间段系统出现aio等待的告警,该告警由_iowait_timeouts隐含参数控制,超过该参数设置的值后会导致系统宕机,这里并未超过系统设置值。但是提示我们当时系统在io调度上出现了问题。
Wed Oct 21 03:22:11 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by ‘inst 1, osid 367’ –22分17秒系统抛错,进程367持有controlfile的enqueue lock时间被其他进程(arc0)等待超时,该enqueue lock持有时间由隐含参数_controlfile_enqueue_timeout控制,默认为900秒
Wed Oct 21 03:22:17 2015
System State dumped to trace file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc
Wed Oct 21 03:22:50 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by ‘inst 1, osid 367’ –22分50秒,进程367持有的controlfile的enqueue lock时间被其他进程(lgwr)等待超时抛错
Wed Oct 21 03:22:55 2015
Killing enqueue blocker (pid=367) on resource CF-00000000-00000000 –LGWR进程发现异常查杀持有controlfile equeue lock的进程367
by killing session 1648.1
Wed Oct 21 03:22:55 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-07445: exception encountered: core dump [ksuklms()+672] [SIGSEGV] [Address not mapped to object] [0x000000062] [] []
Wed Oct 21 03:23:18 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_pmon_343.trc: –后续一系列报错宕机开始
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:18 2015
PMON: terminating instance due to error 470
Wed Oct 21 03:23:18 2015
ORA-470 encountered when generating server alert SMG-3503
Wed Oct 21 03:23:18 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_j006_22335.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:22 2015
Termination issued to instance processes. Waiting for the processes to exit
Wed Oct 21 03:23:28 2015
Instance termination failed to kill one or more processes

Trace文件分析(trace过长摘取关键部分):
*** SERVICE NAME:(SYS$BACKGROUND) 2015-10-21 03:22:11.264
*** SESSION ID:(1646.3) 2015-10-21 03:22:11.264
*** 2015-10-21 03:22:11.264
Unable to get enqueue on resource CF-00000000-00000000 (ges mode req=3 held=6)
Possible local blocker ospid=367 sid=1648 sser=1 time_held=37 secs (ges mode req=6 held=4)
DUMP LOCAL BLOCKER: initiate state dump for KILL BLOCKER
possible owner[17.367] on resource CF-00000000-00000000
Dumping process info of pid[17.367] requested by pid[23.404]
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by ‘inst 1, osid 367’

——————————————————————————-
ENQUEUE [CF] HELD FOR TOO LONG

enqueue holder: ‘inst 1, osid 367’

Process ‘inst 1, osid 367’ is holding an enqueue for maximum allowed time.
The process will be terminated.

Oracle Support Services triaging information: to find the root-cause, look
at the call stack of process ‘inst 1, osid 367’ located below. Ask the
developer that owns the first NON-service layer in the stack to investigate.
Common service layers are enqueues (ksq), latches (ksl), library cache
pins and locks (kgl), and row cache locks (kqr).

Dumping process 17.367 info:
*** 2015-10-21 03:22:11.300
Dumping diagnostic information for ospid 367:
OS pid = 367
loadavg : 0.44 0.62 1.82
swap info: free_mem = 1435.35M rsv = 59461.24M
alloc = 55941.14M avail = 55124.91 swap_free = 58645.01M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S oracle 367 1 0 39 20 ? 4116160 ? Mar 30 ? 498:14 ora_ckpt_dtvboss — 整个系统当时的空闲内存和swap还留有余地,足够当时系统的使用,所以内存不足的原因可以排除。这里提示进程367为oracle ckpt进程,可以确认当时持有controlfile enqueue lock的进程为oracle ckpt进程

PROCESS 23: –该进程为arc0 归档进程,trace信息显示在等待controlfile enqueue 队列锁。等待事件为enq: CF – contention
—————————————-
SO: 8a9004f88, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=23, calls cur/top: 8a95c6650/8a95c6650, flag: (2) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 24
last post received-location: ksasnd
last process to post me: b66002018 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: b66005770 1 2
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 5eb3388f0
O/S info: user: oracle, term: UNKNOWN, ospid: 404
OSD pid info: Unix process pid: 404, image: oracle@hzboss-ora1 (ARC0)
Dump of memory from 0x00000005F033B1F0 to 0x00000005F033B3F8
5F033B1F0 0000000C 00000000 00000008 AF739C70 [………….s.p]
5F033B200 00000006 000313A7 00000008 A95D1620 [………….]. ]
5F033B210 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B220 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B230 00000006 000313A7 00000008 A95D1FB8 [………….]..]
5F033B240 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B250 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B260 00000006 000313A7 00000008 A95D1FB8 [………….]..]
5F033B270 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B280 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B290 00000006 000313A7 00000008 A95D1FB8 [………….]..]
5F033B2A0 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B2B0 00000006 000313A7 00000008 A95D17E8 [………….]..]
5F033B2C0 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B2D0 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B2E0 00000006 000313A7 00000008 A95D1880 [………….]..]
5F033B2F0 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B300 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B310 00000006 000313A7 00000008 A95D16B8 [………….]..]
5F033B320 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B330 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B340 00000006 000313A7 00000008 A95D1880 [………….]..]
5F033B350 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B360 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B370 00000006 000313A7 00000008 A95D16B8 [………….]..]
5F033B380 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B390 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B3A0 00000006 000313A7 00000008 A95D17E8 [………….]..]
5F033B3B0 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B3C0 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B3D0 00000006 000313A7 00000008 A95D17E8 [………….]..]
5F033B3E0 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B3F0 00000006 000313A7 [……..]
(FOB) flags=2 fib=8af713780 incno=0 pending i/o cnt=0
fname=/boss_sysdata/oradata/dtvboss/control03.ctl
fno=2 lblksz=16384 fsiz=1308
(FOB) flags=2 fib=8af7133e0 incno=0 pending i/o cnt=0
fname=/boss_sysdata/oradata/dtvboss/control02.ctl
fno=1 lblksz=16384 fsiz=1308
(FOB) flags=2 fib=8af713040 incno=0 pending i/o cnt=0
fname=/boss_sysdata/oradata/dtvboss/control01.ctl
fno=0 lblksz=16384 fsiz=1308
—————————————-
SO: 8a9778500, type: 11, owner: 8a9004f88, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 8a9004f88,
event: 21, last message event: 21,
last message waited event: 21, messages read: 0
channel: (5eb52cc20) scumnt mount lock
scope: 1, event: 22, last mesage event: 20,
publishers/subscribers: 0/20,
messages published: 1
—————————————-
SO: 8a95b7328, type: 4, owner: 8a9004f88, flag: INIT/-/-/0x00
(session) sid: 1646 trans: 0, creator: 8a9004f88, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-0017-0000000D, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
service name: SYS$BACKGROUND
last wait for ‘enq: CF – contention’ blocking sess=0x0 seq=43322 wait_time=2929723 seconds since wait started=36
name|mode=43460004, 0=0, operation=0
Dumping Session Wait History
for ‘enq: CF – contention’ count=1 wait_time=2929723
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929725
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929726
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929726
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929725
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929723
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929725
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929726
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929724
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929750
name|mode=43460004, 0=0, operation=0

SO: 8a95d1fb8, type: 5, owner: 8a95c6650, flag: INIT/-/-/0x00
(enqueue) CF-00000000-00000000 DID: 0001-0017-0000000D
lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x5eb40f608, lock_flag: 0x0
own: 0x8a95b7328, sess: 0x8a95b7328, prv: 0x8a95d1fe8 –这里说明arc0在申请0x5eb40f608的队列锁资源。而在process 17号的ckpt中可以看到0x5eb40f608资源被ckpt持有

process 17(ckpt进程)中的句柄:

SO: 8a95b9df8, type: 4, owner: 8a90037d0, flag: INIT/-/-/0x00
(session) sid: 1648 trans: 0, creator: 8a90037d0, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-0011-00000005, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
service name: SYS$BACKGROUND
waiting for ‘direct path write’ blocking sess=0x0 seq=39369 wait_time=0 seconds since wait started=37
file number=72, first dba=1, block cnt=1
Dumping Session Wait History
for ‘direct path write’ count=1 wait_time=0
file number=73, first dba=1, block cnt=1
for ‘direct path write’ count=1 wait_time=0
file number=74, first dba=1, block cnt=1
for ‘direct path write’ count=1 wait_time=0
file number=75, first dba=1, block cnt=1
for ‘direct path write’ count=1 wait_time=0
file number=76, first dba=1, block cnt=1
for ‘direct path write’ count=1 wait_time=309
file number=77, first dba=1, block cnt=1
for ‘direct path read’ count=1 wait_time=1
file number=8b, first dba=1, block cnt=1
for ‘direct path read’ count=1 wait_time=0
file number=8a, first dba=1, block cnt=1
for ‘direct path read’ count=1 wait_time=0
file number=89, first dba=1, block cnt=1
for ‘direct path read’ count=1 wait_time=1
file number=88, first dba=1, block cnt=1
for ‘direct path read’ count=1 wait_time=1
file number=87, first dba=1, block cnt=1
temporary object counter: 0
—————————————-
SO: 8a95d1cc0, type: 5, owner: 8a95c5df8, flag: INIT/-/-/0x00
(enqueue) CF-00000000-00000000 DID: 0001-0011-00000005
lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x5eb40f608, mode: SSX, lock_flag: 0x0 —————-ckpt以ssx的方式持有了该资源
own: 0x8a95b9df8, sess: 0x8a95b9df8, proc: 0x8a90037d0, prv: 0x5eb40f618

PROCESS 10: LGWR进程
—————————————-
SO: b66002018, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=10, calls cur/top: b665c6de0/b665c6de0, flag: (6) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 24
last post received-location: ksasnd
last process to post me: 8a9000860 1 6
last post sent: 109 0 4
last post sent-location: kslpsr
last process posted by me: 8a902ef90 105 0
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 5eb3388f0
O/S info: user: oracle, term: UNKNOWN, ospid: 365
OSD pid info: Unix process pid: 365, image: oracle@hzboss-ora1 (LGWR)

SO: b665bf398, type: 4, owner: b66002018, flag: INIT/-/-/0x00
(session) sid: 824 trans: 0, creator: b66002018, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-000A-00000006, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
service name: SYS$BACKGROUND
waiting for ‘enq: CF – contention’ blocking sess=0x8a95b9df8 seq=61266 wait_time=0 seconds since wait started=0 0x8a95b9df8该句柄持有者为process 17的ckpt进程
name|mode=43460005, 0=0, operation=0
Dumping Session Wait History
for ‘enq: CF – contention’ count=1 wait_time=2929719
name|mode=43460005, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929716
name|mode=43460005, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929716
name|mode=43460005, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929718
name|mode=43460005, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929719
name|mode=43460005, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929719

—————————————-
SO: b665c6de0, type: 3, owner: b66002018, flag: INIT/-/-/0x00
(call) sess: cur b665bf398, rec 0, usr b665bf398; depth: 0
—————————————-
SO: b665cce90, type: 5, owner: b665c6de0, flag: INIT/-/-/0x00
(enqueue) CF-00000000-00000000 DID: 0001-000A-00000006
lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x5eb40f608, req: SSX, lock_flag: 0x0 ————前面已经说明了,该5eb40f608的资源队列锁已经被ckpt进程持有,而lgwr正在申请以ssx的方式持有。
own: 0xb665bf398, sess: 0xb665bf398, proc: 0xb66002018, prv: 0x5eb40f628

综上可以得出结论,由于arc0进程要写日志需要等待控制文件CF锁,lgwr估计要写日志或者切换也在等待控制文件CF锁,而ckpt进程持有controlfile enqueue lock时间过长,arch0等待超过900秒,导致oracle数据库系统判断异常。
此时oracle数据库系统会通知lgwr进程kill持有controlfile enqueue lock的进程,相当于lgwr进程杀了ckpt进程,而ckpt进程又为oracle关键进程,连锁反应,引起宕机。

Oracle Rac hang 案例分析一则(WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK)

LOG FILE
————–
Filename=zsinsraca1_151006_0000.nmon.xlsx
See the following:

+++ CPU_ALL ========> CPU usage is NOT high

CPU Total zsinsraca1 User% Sys% Wait% Idle% CPU% Logical CPUs (SMTmode=4)
10/6/15 10:30 AM 5.2 1.7 0.2 92.9 6.9 48
10/6/15 10:31 AM 4.3 1.6 0.1 94 5.9 48
10/6/15 10:32 AM 7.6 1.5 0.2 90.6 9.1 48
10/6/15 10:33 AM 8.6 1.7 0 89.7 10.3 48
10/6/15 10:34 AM 8.8 1.8 0 89.3 10.6 48
10/6/15 10:35 AM 10.7 2 0.1 87.2 12.7 48
10/6/15 10:36 AM 9.3 2.1 0.1 88.6 11.4 48
10/6/15 10:37 AM 9.4 2.1 0.1 88.4 11.5 48
10/6/15 10:38 AM 10.7 2.5 0.2 86.6 13.2 48
10/6/15 10:39 AM 10.2 2.4 0.1 87.3 12.6 48
10/6/15 10:40 AM 9.6 1.8 0 88.5 11.4 48
10/6/15 10:41 AM 9.6 2.1 0 88.2 11.7 48
10/6/15 10:42 AM 9 2.2 0.1 88.7 11.2 48
10/6/15 10:43 AM 9.2 1.7 0 89 10.9 48
10/6/15 10:44 AM 11 1.8 0 87.2 12.8 48
10/6/15 10:45 AM 10.7 1.9 0.1 87.3 12.6 48
10/6/15 10:46 AM 12.9 2.1 0.1 84.9 15 48
10/6/15 10:47 AM 9.4 1.5 0 89 10.9 48
10/6/15 10:48 AM 11.3 1.6 0.1 87.1 12.9 48
10/6/15 10:49 AM 11.8 2.4 0.1 85.7 14.2 48
10/6/15 10:50 AM 12.2 1.8 0 85.9 14 48
10/6/15 10:51 AM 13.2 1.9 0.1 84.8 15.1 48
10/6/15 10:52 AM 12.4 1.3 0.1 86.3 13.7 48
10/6/15 10:53 AM 12.7 1.8 0 85.5 14.5 48
10/6/15 10:54 AM 11.9 1.9 0.1 86.2 13.8 48
10/6/15 10:55 AM 10.2 2 0.1 87.6 12.2 48
10/6/15 10:56 AM 11.5 2.4 0.2 85.9 13.9 48
10/6/15 10:57 AM 11.4 1.8 0.1 86.6 13.2 48
10/6/15 10:58 AM 11.9 1.7 0 86.3 13.6 48
10/6/15 10:59 AM 10.3 1.6 0 88.1 11.9 48
10/6/15 11:00 AM 12.3 1.5 0.2 86 13.8 48
10/6/15 11:01 AM 12.1 1.7 0.1 86.2 13.8 48
10/6/15 11:02 AM 10.6 1.6 0.1 87.8 12.2 48
10/6/15 11:03 AM 13.8 2.4 0.1 83.7 16.2 48
10/6/15 11:04 AM 12.5 2.6 0.1 84.9 15.1 48
10/6/15 11:05 AM 11.4 1.8 0.1 86.7 13.2 48
10/6/15 11:06 AM 10.1 2.3 0.1 87.5 12.4 48
10/6/15 11:07 AM 9.7 2.1 0.1 88.2 11.8 48
10/6/15 11:08 AM 10.5 2 0 87.5 12.5 48
10/6/15 11:09 AM 11.3 1.1 0.1 87.5 12.4 48
10/6/15 11:10 AM 11.6 1.8 0.1 86.6 13.4 48
10/6/15 11:11 AM 9.5 1.4 0.1 89 10.9 48
10/6/15 11:12 AM 8.4 1.4 0 90.2 9.8 48
10/6/15 11:13 AM 9 1.6 0 89.3 10.6 48
10/6/15 11:14 AM 10.1 1.9 0 87.9 12 48
10/6/15 11:15 AM 11.7 2.2 0.1 86.1 13.9 48
10/6/15 11:16 AM 10.5 1.9 0.1 87.6 12.4 48
10/6/15 11:17 AM 11 1.2 0.1 87.8 12.2 48
10/6/15 11:18 AM 12.6 2.2 0.1 85.2 14.8 48
10/6/15 11:19 AM 9.9 1.6 0.1 88.4 11.5 48
10/6/15 11:20 AM 11.3 2.1 0.1 86.5 13.4 48
10/6/15 11:21 AM 11.4 1.3 0.1 87.3 12.7 48
10/6/15 11:22 AM 11.6 1.5 0.1 86.8 13.1 48
10/6/15 11:23 AM 12.7 3.2 0.1 84 15.9 48
10/6/15 11:24 AM 11.8 3.1 0.1 85.1 14.9 48
10/6/15 11:25 AM 9.1 1.5 0 89.3 10.6 48
10/6/15 11:26 AM 11.1 1.9 0.1 86.9 13 48
10/6/15 11:27 AM 13.3 3.9 0.4 82.4 17.2 48
10/6/15 11:28 AM 10.3 2 0.8 86.9 12.3 48
10/6/15 11:29 AM 10.2 1.2 0 88.6 11.4 48
10/6/15 11:30 AM 11.3 1.7 0 87 13 48
10/6/15 11:31 AM 13.9 2.8 0.1 83.2 16.7 48
10/6/15 11:32 AM 13.2 3.1 0.1 83.6 16.3 48
10/6/15 11:33 AM 10.9 2.2 0.1 86.8 13.1 48
10/6/15 11:34 AM 10.6 2.6 0.1 86.8 13.2 48
10/6/15 11:35 AM 12 2.6 0 85.4 14.6 48
10/6/15 11:36 AM 14.5 5.3 0.1 80.1 19.8 48
10/6/15 11:37 AM 11.4 1.6 0.1 86.9 13 48
10/6/15 11:38 AM 10.8 1.7 0 87.5 12.5 48
10/6/15 11:39 AM 9.7 1.6 0 88.6 11.3 48
10/6/15 11:40 AM 13.2 1.9 0 84.9 15.1 48
10/6/15 11:41 AM 10.6 1.3 0 88.2 11.9 48
10/6/15 11:42 AM 12.1 4 0.2 83.7 16.1 48
10/6/15 11:43 AM 11.9 4.4 0 83.6 16.3 48
10/6/15 11:44 AM 10.1 1.6 0 88.3 11.7 48
10/6/15 11:45 AM 12.4 2.4 0.4 84.8 14.8 48
10/6/15 11:46 AM 15.3 6.5 1.3 77 21.8 48
10/6/15 11:47 AM 13.4 4.4 1 81.2 17.8 48
10/6/15 11:48 AM 16.5 6.8 0.1 76.6 23.3 48
10/6/15 11:49 AM 15.9 5.1 0.1 78.9 21 48
10/6/15 11:50 AM 8.5 4.8 0.2 86.5 13.3 48

+++ MEM =========> free memory 在11:50时仍有3GB free

Memory zsinsraca1 Real Free % Virtual free % Real free(MB) Virtual free(MB) Real total(MB) Virtual total(MB)
10/6/15 10:30 AM 7.6 88 7476.4 28845 98304 32768
10/6/15 10:31 AM 7.6 88 7472.8 28845 98304 32768
10/6/15 10:32 AM 7.6 88 7454.3 28845 98304 32768
10/6/15 10:33 AM 7.5 88 7412.8 28845 98304 32768
10/6/15 10:34 AM 7.5 88 7374.2 28845.5 98304 32768
10/6/15 10:35 AM 7.4 88 7303.3 28845.5 98304 32768
10/6/15 10:36 AM 7.3 88 7147.1 28845.5 98304 32768
10/6/15 10:37 AM 7.3 88 7210.1 28845.5 98304 32768
10/6/15 10:38 AM 7.3 88 7170 28845.5 98304 32768
10/6/15 10:39 AM 7.3 88 7128 28845.5 98304 32768
10/6/15 10:40 AM 7.2 88 7126.4 28845.5 98304 32768
10/6/15 10:41 AM 7.2 88 7104.1 28845.5 98304 32768
10/6/15 10:42 AM 7.2 88 7052 28845.5 98304 32768
10/6/15 10:43 AM 7 88 6888.5 28845.6 98304 32768
10/6/15 10:44 AM 7.1 88 7028.3 28845.6 98304 32768
10/6/15 10:45 AM 7 88 6914.8 28845 98304 32768
10/6/15 10:46 AM 7 88 6917.3 28844.5 98304 32768
10/6/15 10:47 AM 7 88 6886.8 28844.1 98304 32768
10/6/15 10:48 AM 6.9 88 6822.9 28843.3 98304 32768
10/6/15 10:49 AM 6.8 88 6668 28842 98304 32768
10/6/15 10:50 AM 6.7 88 6602.4 28841.1 98304 32768
10/6/15 10:51 AM 6.6 88 6463.4 28840 98304 32768
10/6/15 10:52 AM 6.3 88 6235.7 28838.4 98304 32768
10/6/15 10:53 AM 6.5 88 6397.8 28837.6 98304 32768
10/6/15 10:54 AM 6.5 88 6384.6 28837.4 98304 32768
10/6/15 10:55 AM 6.5 88 6340.9 28836.2 98304 32768
10/6/15 10:56 AM 6.2 88 6123.9 28836.2 98304 32768
10/6/15 10:57 AM 6.3 88 6228.7 28835.8 98304 32768
10/6/15 10:58 AM 6.2 88 6143.9 28834.9 98304 32768
10/6/15 10:59 AM 6.2 88 6111.3 28834.2 98304 32768
10/6/15 11:00 AM 6.1 88 6017.9 28833.1 98304 32768
10/6/15 11:01 AM 5.9 88 5837.6 28832.9 98304 32768
10/6/15 11:02 AM 6 88 5888.8 28832 98304 32768
10/6/15 11:03 AM 5.9 88 5848.5 28831.3 98304 32768
10/6/15 11:04 AM 5.7 88 5606.2 28827.8 98304 32768
10/6/15 11:05 AM 5.7 88 5555.2 28827.1 98304 32768
10/6/15 11:06 AM 5.6 88 5503.2 28826.3 98304 32768
10/6/15 11:07 AM 5.4 88 5271.7 28825.5 98304 32768
10/6/15 11:08 AM 5.5 88 5388.1 28824.7 98304 32768
10/6/15 11:09 AM 5.3 88 5239.6 28824.7 98304 32768
10/6/15 11:10 AM 5.5 88 5391.1 28824.1 98304 32768
10/6/15 11:11 AM 5.4 88 5312.4 28823.9 98304 32768
10/6/15 11:12 AM 5.4 88 5276.5 28823.4 98304 32768
10/6/15 11:13 AM 5.4 88 5325.3 28823.3 98304 32768
10/6/15 11:14 AM 5.3 88 5231.7 28822.8 98304 32768
10/6/15 11:15 AM 5.2 88 5157.4 28821.8 98304 32768
10/6/15 11:16 AM 5.1 88 5054 28820.6 98304 32768
10/6/15 11:17 AM 5.1 88 5017.2 28820.5 98304 32768
10/6/15 11:18 AM 5 88 4928.5 28819.5 98304 32768
10/6/15 11:19 AM 5.1 88 5049.6 28821.2 98304 32768
10/6/15 11:20 AM 5.1 88 5015.3 28820.4 98304 32768
10/6/15 11:21 AM 5 88 4935.5 28819.6 98304 32768
10/6/15 11:22 AM 4.9 87.9 4815.3 28818.4 98304 32768
10/6/15 11:23 AM 4.7 87.9 4594.5 28816 98304 32768
10/6/15 11:24 AM 4.6 87.9 4528.6 28815.4 98304 32768
10/6/15 11:25 AM 4.6 87.9 4550.3 28814.9 98304 32768
10/6/15 11:26 AM 4.6 87.9 4486.5 28814.3 98304 32768
10/6/15 11:27 AM 4.4 87.9 4361.3 28813.4 98304 32768
10/6/15 11:28 AM 4.4 87.9 4287 28812.4 98304 32768
10/6/15 11:29 AM 4.3 87.9 4251.7 28811.8 98304 32768
10/6/15 11:30 AM 4.3 87.9 4187.3 28811 98304 32768
10/6/15 11:31 AM 4.2 87.9 4097.9 28810.4 98304 32768
10/6/15 11:32 AM 4.2 87.9 4101.4 28809.9 98304 32768
10/6/15 11:33 AM 4.1 87.9 4062.2 28809.6 98304 32768
10/6/15 11:34 AM 4 87.9 3960.2 28808.4 98304 32768
10/6/15 11:35 AM 3.9 87.9 3858.5 28806.8 98304 32768
10/6/15 11:36 AM 3.9 87.9 3790.8 28805.9 98304 32768
10/6/15 11:37 AM 3.8 87.9 3774.6 28805.7 98304 32768
10/6/15 11:38 AM 3.6 87.9 3586.9 28805.1 98304 32768
10/6/15 11:39 AM 3.9 87.9 3838.9 28804.7 98304 32768
10/6/15 11:40 AM 3.8 87.9 3763 28804.7 98304 32768
10/6/15 11:41 AM 3.7 87.9 3661.5 28804 98304 32768
10/6/15 11:42 AM 3.8 88.1 3763 28864.2 98304 32768
10/6/15 11:43 AM 3.8 88.1 3753.1 28864.3 98304 32768
10/6/15 11:44 AM 3.7 88.1 3676.4 28864.3 98304 32768
10/6/15 11:45 AM 3.5 88.1 3405.6 28864.3 98304 32768
10/6/15 11:46 AM 3.5 88.1 3484.2 28864.4 98304 32768
10/6/15 11:47 AM 3.5 88.1 3477 28864.5 98304 32768
10/6/15 11:48 AM 3.6 88.1 3570.6 28864.5 98304 32768
10/6/15 11:49 AM 4.7 88.1 4646.2 28868.9 98304 32768
10/6/15 11:50 AM 34.5 98 33959.3 32123.8 98304 32768

LOG FILE
————–
Filename=alert_zsiims1.log
See the following:

Tue Oct 6 02:34:52 2015
Thread 1 advanced to log sequence 37413 (LGWR switch)
Current log# 1 seq# 37413 mem# 0: +V5DATA/zsiims/onlinelog/group_1.9844.799569609
Tue Oct 6 10:50:38 2015
ALTER SYSTEM ARCHIVE LOG
Tue Oct 6 10:50:39 2015
Thread 1 advanced to log sequence 37414 (LGWR switch)
Current log# 3 seq# 37414 mem# 0: +V5DATA/zsiims/onlinelog/group_3.9851.799569423
Tue Oct 6 11:21:49 2015
Errors in file /oracle/admin/zsiims/bdump/zsiims1_j001_10027156.trc:
ORA-12012: error on auto execute of job 54375
ORA-27468: “EXFSYS.RLM$SCHDNEGACTION” is locked by another process
Tue Oct 6 11:23:07 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=142
System State dumped to trace file /oracle/admin/zsiims/udump/zsiims1_ora_20971728.trc
Tue Oct 6 11:25:43 2015
Errors in file /oracle/admin/zsiims/bdump/zsiims1_j002_6815818.trc:
ORA-12012: error on auto execute of job 54374
ORA-27468: “EXFSYS.RLM$EVTCLEANUP” is locked by another process
Tue Oct 6 11:26:57 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=28
System State dumped to trace file /oracle/admin/zsiims/bdump/zsiims1_smon_15335466.trc
Tue Oct 6 11:35:26 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=193
System State dumped to trace file /oracle/admin/zsiims/udump/zsiims1_ora_32309462.trc
Tue Oct 6 11:41:27 2015
Shutting down instance: further logons disabled

LOG FILE
————–
Filename=alert_zsiims2.log
See the following:
Tue Oct 6 10:50:40 2015
Thread 2 advanced to log sequence 25153 (LGWR switch)
Current log# 5 seq# 25153 mem# 0: +V5DATA/zsiims/onlinelog/group_5.9853.799569371
Tue Oct 6 11:45:06 2015
IPC Send timeout detected.Sender: ospid 7798804
Receiver: inst 1 binc 4 ospid 6881304
Tue Oct 6 11:45:08 2015
Trace dumping is performing id=[cdmp_20151006114508]
Tue Oct 6 11:48:58 2015
Shutting down instance (abort)
License high water mark = 450
Instance terminated by USER, pid = 20578736
Tue Oct 6 11:49:50 2015
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0

Filename=zsiims1_ora_20971728A.trc ===> 发生问题后产生的第一个systemstate dump, 在Oct 6 11:23:07AM产生的,此时数据库的状况应该已经很糟糕了
See the following:

50几个session在等待latch: row cache objects

33: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=45472
37: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=18549
68: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=13886
71: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=46595
81: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=48525
106: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11855
118: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11183
122: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8913
132: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=12194
140: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12643
149: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12173
154: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=15022
156: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12143
160: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=20615
173: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11470
174: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11668
177: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11249
183: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=10725
184: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11485
200: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9970
204: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11019
207: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8356
208: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8811
215: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8914
219: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10169
230: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9156
241: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=7911
244: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8785
249: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=17542
255: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9209
264: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9247
276: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8368
288: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8211
310: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=7413
317: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=17869
331: waiting for ‘latch: row cache objects'[Rcache object=700000418a9adf8,] seq=5309
352: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2625
353: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10183
361: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5948
364: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5303
366: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=5300
378: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=760
381: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4788
394: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4758
401: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=109
424: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2924
429: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=3120
432: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2426
454: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1861
457: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1579
463: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1603
468: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1016
470: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1076
486: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=501
576: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=16368
606: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=30065

还有很多session在等待row cache lock

28: waiting for ‘row cache lock’ [Rcache object=70000064a573ef8,] seq=40578
40: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=59768
93: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2938
98: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2777
193: waiting for ‘row cache lock’ [Rcache object=700000621252470,] seq=966
199: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=107
214: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3501
220: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=86
232: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1148
239: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=12734
252: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2908
259: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1134
260: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=65
266: waiting for ‘row cache lock’ [Rcache object=700000550af2690,] seq=1791
268: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1288
274: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=821
278: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1357
287: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=637
289: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
293: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3363
302: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
319: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=98
320: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=366
323: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2618
327: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=167
328: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1137
330: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=31
332: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=53
333: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
334: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=368
337: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=202
341: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=56
342: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=18
343: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
345: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=359
349: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=562
350: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=24
351: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=407
358: waiting for ‘row cache lock’ [Rcache object=70000078b3d6290,] seq=28
359: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
363: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
365: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
371: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
373: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
375: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=89
379: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=605
383: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=523
385: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=595
387: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=243
390: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
391: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=355
393: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=20
395: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
397: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1080
404: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
409: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
410: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=60
411: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1101
413: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=95
414: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=948
416: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=315
417: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=516
419: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=102
420: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=607
421: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=831
426: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=42
433: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=58
437: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=421
442: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=44
445: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=185
447: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=35
450: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=55
456: waiting for ‘row cache lock’ [Rcache object=700000789bc8cc8,] seq=349
459: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=127
465: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=234
474: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=52
475: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=25
478: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=41
490: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=49
610: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=15070

分析发生问题时候的systemstate dump trace zsiims1_ora_20971728A.trc:

50几个session在等待latch: row cache objects, 大部分是和700000768fd8ea8有关的

33: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=45472
37: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=18549
68: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=13886
71: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=46595
81: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=48525
106: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11855
118: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11183
122: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8913
132: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=12194
140: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12643
149: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12173
154: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=15022
156: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12143
160: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=20615
173: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11470
174: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11668
177: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11249
183: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=10725
184: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11485
200: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9970
204: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11019
207: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8356
208: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8811
215: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8914
219: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10169
230: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9156
241: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=7911
244: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8785
249: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=17542
255: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9209
264: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9247
276: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8368
288: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8211
310: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=7413
317: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=17869
331: waiting for ‘latch: row cache objects'[Rcache object=700000418a9adf8,] seq=5309
352: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2625
353: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10183
361: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5948
364: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5303
366: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=5300
378: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=760
381: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4788
394: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4758
401: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=109
424: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2924
429: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=3120
432: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2426
454: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1861
457: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1579
463: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1603
468: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1016
470: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1076
486: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=501
576: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=16368
606: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=30065

以其中一个比较典型的process为例:

PROCESS 576:
—————————————-
SO: 70000078326f638, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=576, calls cur/top: 70000039983fd08/70000039983f788, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 109 0 4
last post received-location: kslpsr
last process to post me: 700000786279d80 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: 700000786279d80 1 6
(latch info) wait_event=0 bits=0
Location from where call was made: kghfrunp: clatch: wait:
waiting for 700000768fd8ea8 Child row cache objects level=4 child#=9
Location from where latch is held: kghfrunp: clatch: wait: ===============> 可以看到这个latch是因为kghfrunp相关的功能发起的,而这个功能是需要从shared pool中释放空间有关
Context saved from call: 0 ===========================>kghfrunp: (KGH: Ask client to free unpinned space)
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
207 (5, 1444101815, 2)
174 (5, 1444101815, 2)
322 (5, 1444101815, 2)

还有很多session在等待row cache lock, 最多的对象70000078e9f0d28

28: waiting for ‘row cache lock’ [Rcache object=70000064a573ef8,] seq=40578
40: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=59768
93: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2938
98: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2777
193: waiting for ‘row cache lock’ [Rcache object=700000621252470,] seq=966
199: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=107
214: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3501
220: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=86
232: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1148
239: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=12734
252: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2908
259: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1134
260: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=65
266: waiting for ‘row cache lock’ [Rcache object=700000550af2690,] seq=1791
268: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1288
274: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=821
278: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1357
287: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=637
289: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
293: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3363
302: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
319: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=98
320: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=366
323: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2618
327: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=167
328: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1137
330: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=31
332: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=53
333: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
334: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=368
337: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=202
341: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=56
342: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=18
343: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
345: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=359
349: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=562
350: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=24
351: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=407
358: waiting for ‘row cache lock’ [Rcache object=70000078b3d6290,] seq=28
359: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
363: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
365: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
371: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
373: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
375: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=89
379: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=605
383: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=523
385: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=595
387: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=243
390: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
391: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=355
393: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=20
395: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
397: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1080
404: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
409: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
410: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=60
411: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1101
413: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=95
414: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=948
416: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=315
417: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=516
419: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=102
420: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=607
421: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=831
426: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=42
433: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=58
437: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=421
442: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=44
445: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=185
447: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=35
450: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=55
456: waiting for ‘row cache lock’ [Rcache object=700000789bc8cc8,] seq=349
459: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=127
465: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=234
474: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=52
475: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=25
478: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=41
490: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=49
610: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=15070

以其中的process 610为例:

PROCESS 610:
—————————————-
SO: 700000784271270, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=610, calls cur/top: 700000387cb3b58/700000387cb35d8, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 109 0 4
last post received-location: kslpsr
last process to post me: 700000786279d80 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: 700000786279d80 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 7000007852bdfc8
O/S info: user: oracle, term: UNKNOWN, ospid: 39321988
OSD pid info: Unix process pid: 39321988, image: oracle@zsinsraca1
Short stack dump:
ksdxfstk+002c< -ksdxcb+04e4<-sspuser+0074<-000047DC<-sskgpwwait+0034<-skgpwwait+00bc<-ksliwat+06c0<-kslwaitns_timed+0024<-kskthbwt+022c<-kslwait+00f4<-kqrigt+05e0<-kqrpre1+0960<-kqrpre+001c<-ktusmous_online_undoseg+05f4<-ktusmaus_add_us+0300<-ktubnd+04ac<-ktuchg+055c<-ktbchg2+01f0<-kdtchg+0524<-kdtwrp+2540<-kdtInsRow+0a20<-insrow+043c<-insdrv+0428<-inscovexe+02b8<-insExecStmtExecIniEngine+005c<-insexe+02f8<-opiexe+2738<-opipls+185c<-opiodr+0ae0<-rpidrus+01bc<-skgmstack+00c8<-rpidru+0088<-rpiswu2+034c<-rpidrv+095c<-psddr0+02bc<-psdnal+01d0<-pevm_EXECC+01f8<-pfrinstr_EXECC+0070<-pfrrun_no_tool+005c<-pfrrun+1014<-plsql_run+06b4<-pricar+042c<-pricbr+0258<-prient2+03a8<-prient+0b0c<-kkxrpc+0250<-kporpc+01d4<-opiodr+0ae0<-ttcpip+1020<-opitsk+1124<-opiino+0990<-opiodr+0ae0<-opidrv+0484<-sou2o+0090<-opimai_real+01bc<-main+0098<-__start+0070 ---------------------------------------- SO: 70000078f2c84a0, type: 4, owner: 700000784271270, flag: INIT/-/-/0x00 (session) sid: 236 trans: 7000007619fcf08, creator: 700000784271270, flag: (100041) USR/- BSY/-/-/-/-/- DID: 0001-025F-00004ADB, short-term DID: 0001-0262-00000005 txn branch: 700000760262cd0 oct: 2, prv: 0, sql: 70000043a39dda8, psql: 7000006feac35a8, user: 88/ZSCLMINTF service name: zsiims O/S info: user: oracle, term: , ospid: 10027038, machine: zsinsraca2 program: oracle@zsinsraca2 (TNS V1-V3) application name: oracle@zsinsraca2 (TNS V1-V3), hash value=450405937 waiting for 'row cache lock' blocking sess=0x0 seq=15070 wait_time=0 seconds since wait started=1605 cache id=3, mode=0, request=3 ============> cache id为3, 即dc_rollback_segments上的争用
Dumping Session Wait History
for ‘row cache lock’ count=1 wait_time=1388976
cache id=3, mode=0, request=3

此时我们并不能简单的认为,这个问题和rollback/undo有关
因为此时LCK0进程也在等待和kghfrunp(释放shared pool中内存)有关的latch,而在RAC环境里, row cache lock/library cache lock这样的instance lock的获取和释放必须经过LCK0进程,如果LCK0异常,那么就会发生row cache lock的问题.

PROCESS 33:
—————————————-
SO: 70000078627a560, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=33, calls cur/top: 7000007853a1520/7000007853a1520, flag: (6) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 33
last post received-location: ksrpublish
last process to post me: 700000782261488 12 0
last post sent: 0 0 21
last post sent-location: ksbria
last process posted by me: 70000078627a560 1 6
(latch info) wait_event=0 bits=0
Location from where call was made: kqrbip:
waiting for 70000075f1cd020 Child row cache objects level=4 child#=16
Location from where latch is held: kghfrunp: clatch: nowait:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
33 (3, 3, 3)
431 (3, 1444101789, 3)
486 (3, 1444101789, 3)
492 (3, 1444101789, 3)
468 (3, 1444101789, 3)
321 (3, 1444101789, 3)
470 (3, 1444101789, 3)
66 (3, 1444101789, 3)
276 (3, 1444101789, 3)
388 (3, 1444101789, 3)
207 (3, 1444101789, 3)
205 (3, 1444101789, 3)
443 (3, 1444101789, 3)
waiter count=13
gotten 1588533458 times wait, failed first 46002888 sleeps 3053343
gotten 2849791 times nowait, failed: 4267742
possible holder pid = 75 ospid=32440736
on wait list for 70000075f1cd020
Process Group: DEFAULT, pseudo proc: 7000007852bdfc8
O/S info: user: oracle, term: UNKNOWN, ospid: 6881304
OSD pid info: Unix process pid: 6881304, image: oracle@zsinsraca1 (LCK0)
Short stack dump:
ksdxfstk+002c< -ksdxcb+04e4<-sspuser+0074<-000047DC<-sskgpwwait+0034<-skgpwwait+00bc<-kslges+054c<-kslgetl+033c<-kqrbip+03c8<-kqrbfr+01ec<-kqrbtm+00c8<-ksbcti+03c8<-ksbabs+03fc<-kclabs+00d8<-ksbrdp+04b4<-opirip+03fc<-opidrv+0458<-sou2o+0090<-opimai_real+0150<-main+0098<-__start+0070 ---------------------------------------- SO: 70000078f366290, type: 4, owner: 70000078627a560, flag: INIT/-/-/0x00 (session) sid: 1062 trans: 0, creator: 70000078627a560, flag: (51) USR/- BSY/-/-/-/-/- DID: 0000-0021-00000003, short-term DID: 0000-0000-00000000 txn branch: 0 oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS service name: SYS$BACKGROUND waiting for 'latch: row cache objects' blocking sess=0x0 seq=45472 wait_time=0 seconds since wait started=0 address=70000075f1cd020, number=c8, tries=1 Dumping Session Wait History for 'latch: row cache objects' count=1 wait_time=906249 address=70000075f1cd020, number=c8, tries=0 for 'latch: row cache objects' count=1 wait_time=27503 address=70000075f1cd020, number=c8, tries=2 for 'latch: row cache objects' count=1 wait_time=292982 address=70000075f1cd020, number=c8, tries=1 综合考虑,从现在得到的信息来看,很可能发生问题时shared pool的使用出现了问题,但是因为我们没有更具体的dba_hist_active_sess_history数据,并且并没有在发生问题的第一时间收集systemstate dump,我们可能无法更进一步的确认它了. LOG FILE -------------- Filename=awrrpt_1_25919_25920.html ===> 发生问题前1个小时的AWR report
See the following:

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 25919 06-Oct-15 09:00:26 224 6.6
End Snap: 25920 06-Oct-15 10:00:36 233 6.3
Elapsed: 60.17 (mins)
DB Time: 18.51 (mins)

Cache Sizes

Begin End
Buffer Cache: 21,776M 21,776M Std Block Size: 8K
Shared Pool Size: 8,752M 8,752M Log Buffer: 30,672K

SGA breakdown difference
ordered by Pool, Name
N/A value for Begin MB or End MB indicates the size of that Pool/Name was insignificant, or zero in that snapshot
Pool Name Begin MB End MB % Diff
shared CCursor 302.83 323.98 6.98
shared Cursor Stats 222.80 222.80 0.00
shared KGH: NO ACCESS 6,056.99 6,056.99 0.00 ===========> 将近6个GB是被KGH: NO ACCESS所使用
shared PCursor 252.15 274.83 9.00
shared free memory 5,239.86 4,815.81 -8.09
shared gcs resources 418.68 418.68 0.00
shared gcs shadows 384.49 384.49 0.00
shared kglsim heap 245.99 245.99 0.00
shared kglsim object batch 416.87 416.87 0.00
shared library cache 251.35 258.46 2.83
shared sql area 382.75 764.75 99.80

1. 发生问题前的awr report显示shared pool中大部分的内存都被KGH: NO ACCESS占用
2. 而发生问题后的systemstate dump显示非常多的进程都在等待跟kghfrunp功能有关的latch (LCK0也在等这样的latch, 导致了系统出现严重的row cache enqueue/row cache lock问题)
而这个kghfrunp功能是要从shared pool中释放空间,这说明shared pool当时不够用

8666117.8, patch 8666117 可以优化lck0在shared pool有压力下的工作,可以减缓row cache lock的问题
note 801787.1里 ,有以下的三个方式可以减少 KGH: NO ACCESS的大小:

a). 设置最小的buffer cache及shared pool的值
b). 减少shared pool resize的频率:设置 _memory_broker_stat_interval
c). 打patch 7189722

Aix clone cause the user process limit value changed

aix 5以及aix6系统在克隆以后,导致oracle用户的process数量限制初始化为128.在一切ha搭建或者主备机切换过程会导致监听以及中间件,客户端时而可连时而不可连接的状况。

在实施oracle要求中这个参数应该设置为65536或者大于65536.可以通过在root用户中进去smit chgsys修改,当然也可以用命令 chdev -l sys0 -a maxuproc ‘65536’来修改。

最近几个客户在克隆系统后,都遭遇了此类问题。希望对大伙有帮助,能引起注意。

ORA-12518: TNS: listener could not hand off client connection

ORA-12518: TNS: listener could not hand off client connection

Note:
Starting in 11gR2, the listener log is not activated by default. You have to turn on listener logging: logging_listener_name=on

Oracle docs note this about ORA-12518:

ORA-12518:
TNS:listener could not hand off client connection

Cause:
The process of handing off a client connection to another process failed.

Action:
Turn on listener tracing and re-execute the operation. Verify that the listener and database instance are properly configured for direct handoff. If problem persists, call Oracle Support.

ORA-12518 has been known to occur in Oracle Net Services version 9.2.0.5-11.2.0.1 in which attempting new connections on the database throw either ORA-12518 or TNS-12518.ORA-12518 appears in the listener log upon new user connection.

来自orace.群员的提问摘录。

微信一个朋友问的imp过程遭遇Ora-01114

Oracle process number: 8

Windows thread id: 580, image: ORACLE.EXE


*** SESSION ID:(7.10) 2015-09-10 00:15:03.000
*** 2015-09-10 00:15:03.000
ksedmp: internal or fatal error
ORA-01114: 将块写入文件 201 时出现 IO 错误 (块 # 524044)
ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O
OSD-04026: 传递的参数无效。 (OS 524050)
ORA-01114: 将块写入文件 201 时出现 IO 错误 (块 # 524044)
ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O
OSD-04026: 传递的参数无效。 (OS 524050)
ORA-01114: 将块写入文件 201 时出现 IO 错误 (块 # 524044)
ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O
OSD-04026: 传递的参数无效。 (OS 524050)
ORA-01114: 将块写入文件 201 时出现 IO 错误 (块 # 524037)
ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O
OSD-04026: 传递的参数无效。 (OS 524043)
No current SQL statement being executed.
----- Call Stack Trace -----

创建唯一索引时候报错,文件201号,temp4G大小(db中唯一的4g大小文件),524*8k/1024=4G,不能远程查验,初步猜测临时表空间不能扩展导致,而且此过程报错‘ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O’,说明io超出了文件范围,验证了临时表空间的可能性。
最终通过增加temp的大小解决。