Skip to content

Oracle Rac hang 案例分析一则(WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK)

LOG FILE
————–
Filename=zsinsraca1_151006_0000.nmon.xlsx
See the following:

+++ CPU_ALL ========> CPU usage is NOT high

CPU Total zsinsraca1 User% Sys% Wait% Idle% CPU% Logical CPUs (SMTmode=4)
10/6/15 10:30 AM 5.2 1.7 0.2 92.9 6.9 48
10/6/15 10:31 AM 4.3 1.6 0.1 94 5.9 48
10/6/15 10:32 AM 7.6 1.5 0.2 90.6 9.1 48
10/6/15 10:33 AM 8.6 1.7 0 89.7 10.3 48
10/6/15 10:34 AM 8.8 1.8 0 89.3 10.6 48
10/6/15 10:35 AM 10.7 2 0.1 87.2 12.7 48
10/6/15 10:36 AM 9.3 2.1 0.1 88.6 11.4 48
10/6/15 10:37 AM 9.4 2.1 0.1 88.4 11.5 48
10/6/15 10:38 AM 10.7 2.5 0.2 86.6 13.2 48
10/6/15 10:39 AM 10.2 2.4 0.1 87.3 12.6 48
10/6/15 10:40 AM 9.6 1.8 0 88.5 11.4 48
10/6/15 10:41 AM 9.6 2.1 0 88.2 11.7 48
10/6/15 10:42 AM 9 2.2 0.1 88.7 11.2 48
10/6/15 10:43 AM 9.2 1.7 0 89 10.9 48
10/6/15 10:44 AM 11 1.8 0 87.2 12.8 48
10/6/15 10:45 AM 10.7 1.9 0.1 87.3 12.6 48
10/6/15 10:46 AM 12.9 2.1 0.1 84.9 15 48
10/6/15 10:47 AM 9.4 1.5 0 89 10.9 48
10/6/15 10:48 AM 11.3 1.6 0.1 87.1 12.9 48
10/6/15 10:49 AM 11.8 2.4 0.1 85.7 14.2 48
10/6/15 10:50 AM 12.2 1.8 0 85.9 14 48
10/6/15 10:51 AM 13.2 1.9 0.1 84.8 15.1 48
10/6/15 10:52 AM 12.4 1.3 0.1 86.3 13.7 48
10/6/15 10:53 AM 12.7 1.8 0 85.5 14.5 48
10/6/15 10:54 AM 11.9 1.9 0.1 86.2 13.8 48
10/6/15 10:55 AM 10.2 2 0.1 87.6 12.2 48
10/6/15 10:56 AM 11.5 2.4 0.2 85.9 13.9 48
10/6/15 10:57 AM 11.4 1.8 0.1 86.6 13.2 48
10/6/15 10:58 AM 11.9 1.7 0 86.3 13.6 48
10/6/15 10:59 AM 10.3 1.6 0 88.1 11.9 48
10/6/15 11:00 AM 12.3 1.5 0.2 86 13.8 48
10/6/15 11:01 AM 12.1 1.7 0.1 86.2 13.8 48
10/6/15 11:02 AM 10.6 1.6 0.1 87.8 12.2 48
10/6/15 11:03 AM 13.8 2.4 0.1 83.7 16.2 48
10/6/15 11:04 AM 12.5 2.6 0.1 84.9 15.1 48
10/6/15 11:05 AM 11.4 1.8 0.1 86.7 13.2 48
10/6/15 11:06 AM 10.1 2.3 0.1 87.5 12.4 48
10/6/15 11:07 AM 9.7 2.1 0.1 88.2 11.8 48
10/6/15 11:08 AM 10.5 2 0 87.5 12.5 48
10/6/15 11:09 AM 11.3 1.1 0.1 87.5 12.4 48
10/6/15 11:10 AM 11.6 1.8 0.1 86.6 13.4 48
10/6/15 11:11 AM 9.5 1.4 0.1 89 10.9 48
10/6/15 11:12 AM 8.4 1.4 0 90.2 9.8 48
10/6/15 11:13 AM 9 1.6 0 89.3 10.6 48
10/6/15 11:14 AM 10.1 1.9 0 87.9 12 48
10/6/15 11:15 AM 11.7 2.2 0.1 86.1 13.9 48
10/6/15 11:16 AM 10.5 1.9 0.1 87.6 12.4 48
10/6/15 11:17 AM 11 1.2 0.1 87.8 12.2 48
10/6/15 11:18 AM 12.6 2.2 0.1 85.2 14.8 48
10/6/15 11:19 AM 9.9 1.6 0.1 88.4 11.5 48
10/6/15 11:20 AM 11.3 2.1 0.1 86.5 13.4 48
10/6/15 11:21 AM 11.4 1.3 0.1 87.3 12.7 48
10/6/15 11:22 AM 11.6 1.5 0.1 86.8 13.1 48
10/6/15 11:23 AM 12.7 3.2 0.1 84 15.9 48
10/6/15 11:24 AM 11.8 3.1 0.1 85.1 14.9 48
10/6/15 11:25 AM 9.1 1.5 0 89.3 10.6 48
10/6/15 11:26 AM 11.1 1.9 0.1 86.9 13 48
10/6/15 11:27 AM 13.3 3.9 0.4 82.4 17.2 48
10/6/15 11:28 AM 10.3 2 0.8 86.9 12.3 48
10/6/15 11:29 AM 10.2 1.2 0 88.6 11.4 48
10/6/15 11:30 AM 11.3 1.7 0 87 13 48
10/6/15 11:31 AM 13.9 2.8 0.1 83.2 16.7 48
10/6/15 11:32 AM 13.2 3.1 0.1 83.6 16.3 48
10/6/15 11:33 AM 10.9 2.2 0.1 86.8 13.1 48
10/6/15 11:34 AM 10.6 2.6 0.1 86.8 13.2 48
10/6/15 11:35 AM 12 2.6 0 85.4 14.6 48
10/6/15 11:36 AM 14.5 5.3 0.1 80.1 19.8 48
10/6/15 11:37 AM 11.4 1.6 0.1 86.9 13 48
10/6/15 11:38 AM 10.8 1.7 0 87.5 12.5 48
10/6/15 11:39 AM 9.7 1.6 0 88.6 11.3 48
10/6/15 11:40 AM 13.2 1.9 0 84.9 15.1 48
10/6/15 11:41 AM 10.6 1.3 0 88.2 11.9 48
10/6/15 11:42 AM 12.1 4 0.2 83.7 16.1 48
10/6/15 11:43 AM 11.9 4.4 0 83.6 16.3 48
10/6/15 11:44 AM 10.1 1.6 0 88.3 11.7 48
10/6/15 11:45 AM 12.4 2.4 0.4 84.8 14.8 48
10/6/15 11:46 AM 15.3 6.5 1.3 77 21.8 48
10/6/15 11:47 AM 13.4 4.4 1 81.2 17.8 48
10/6/15 11:48 AM 16.5 6.8 0.1 76.6 23.3 48
10/6/15 11:49 AM 15.9 5.1 0.1 78.9 21 48
10/6/15 11:50 AM 8.5 4.8 0.2 86.5 13.3 48

+++ MEM =========> free memory 在11:50时仍有3GB free

Memory zsinsraca1 Real Free % Virtual free % Real free(MB) Virtual free(MB) Real total(MB) Virtual total(MB)
10/6/15 10:30 AM 7.6 88 7476.4 28845 98304 32768
10/6/15 10:31 AM 7.6 88 7472.8 28845 98304 32768
10/6/15 10:32 AM 7.6 88 7454.3 28845 98304 32768
10/6/15 10:33 AM 7.5 88 7412.8 28845 98304 32768
10/6/15 10:34 AM 7.5 88 7374.2 28845.5 98304 32768
10/6/15 10:35 AM 7.4 88 7303.3 28845.5 98304 32768
10/6/15 10:36 AM 7.3 88 7147.1 28845.5 98304 32768
10/6/15 10:37 AM 7.3 88 7210.1 28845.5 98304 32768
10/6/15 10:38 AM 7.3 88 7170 28845.5 98304 32768
10/6/15 10:39 AM 7.3 88 7128 28845.5 98304 32768
10/6/15 10:40 AM 7.2 88 7126.4 28845.5 98304 32768
10/6/15 10:41 AM 7.2 88 7104.1 28845.5 98304 32768
10/6/15 10:42 AM 7.2 88 7052 28845.5 98304 32768
10/6/15 10:43 AM 7 88 6888.5 28845.6 98304 32768
10/6/15 10:44 AM 7.1 88 7028.3 28845.6 98304 32768
10/6/15 10:45 AM 7 88 6914.8 28845 98304 32768
10/6/15 10:46 AM 7 88 6917.3 28844.5 98304 32768
10/6/15 10:47 AM 7 88 6886.8 28844.1 98304 32768
10/6/15 10:48 AM 6.9 88 6822.9 28843.3 98304 32768
10/6/15 10:49 AM 6.8 88 6668 28842 98304 32768
10/6/15 10:50 AM 6.7 88 6602.4 28841.1 98304 32768
10/6/15 10:51 AM 6.6 88 6463.4 28840 98304 32768
10/6/15 10:52 AM 6.3 88 6235.7 28838.4 98304 32768
10/6/15 10:53 AM 6.5 88 6397.8 28837.6 98304 32768
10/6/15 10:54 AM 6.5 88 6384.6 28837.4 98304 32768
10/6/15 10:55 AM 6.5 88 6340.9 28836.2 98304 32768
10/6/15 10:56 AM 6.2 88 6123.9 28836.2 98304 32768
10/6/15 10:57 AM 6.3 88 6228.7 28835.8 98304 32768
10/6/15 10:58 AM 6.2 88 6143.9 28834.9 98304 32768
10/6/15 10:59 AM 6.2 88 6111.3 28834.2 98304 32768
10/6/15 11:00 AM 6.1 88 6017.9 28833.1 98304 32768
10/6/15 11:01 AM 5.9 88 5837.6 28832.9 98304 32768
10/6/15 11:02 AM 6 88 5888.8 28832 98304 32768
10/6/15 11:03 AM 5.9 88 5848.5 28831.3 98304 32768
10/6/15 11:04 AM 5.7 88 5606.2 28827.8 98304 32768
10/6/15 11:05 AM 5.7 88 5555.2 28827.1 98304 32768
10/6/15 11:06 AM 5.6 88 5503.2 28826.3 98304 32768
10/6/15 11:07 AM 5.4 88 5271.7 28825.5 98304 32768
10/6/15 11:08 AM 5.5 88 5388.1 28824.7 98304 32768
10/6/15 11:09 AM 5.3 88 5239.6 28824.7 98304 32768
10/6/15 11:10 AM 5.5 88 5391.1 28824.1 98304 32768
10/6/15 11:11 AM 5.4 88 5312.4 28823.9 98304 32768
10/6/15 11:12 AM 5.4 88 5276.5 28823.4 98304 32768
10/6/15 11:13 AM 5.4 88 5325.3 28823.3 98304 32768
10/6/15 11:14 AM 5.3 88 5231.7 28822.8 98304 32768
10/6/15 11:15 AM 5.2 88 5157.4 28821.8 98304 32768
10/6/15 11:16 AM 5.1 88 5054 28820.6 98304 32768
10/6/15 11:17 AM 5.1 88 5017.2 28820.5 98304 32768
10/6/15 11:18 AM 5 88 4928.5 28819.5 98304 32768
10/6/15 11:19 AM 5.1 88 5049.6 28821.2 98304 32768
10/6/15 11:20 AM 5.1 88 5015.3 28820.4 98304 32768
10/6/15 11:21 AM 5 88 4935.5 28819.6 98304 32768
10/6/15 11:22 AM 4.9 87.9 4815.3 28818.4 98304 32768
10/6/15 11:23 AM 4.7 87.9 4594.5 28816 98304 32768
10/6/15 11:24 AM 4.6 87.9 4528.6 28815.4 98304 32768
10/6/15 11:25 AM 4.6 87.9 4550.3 28814.9 98304 32768
10/6/15 11:26 AM 4.6 87.9 4486.5 28814.3 98304 32768
10/6/15 11:27 AM 4.4 87.9 4361.3 28813.4 98304 32768
10/6/15 11:28 AM 4.4 87.9 4287 28812.4 98304 32768
10/6/15 11:29 AM 4.3 87.9 4251.7 28811.8 98304 32768
10/6/15 11:30 AM 4.3 87.9 4187.3 28811 98304 32768
10/6/15 11:31 AM 4.2 87.9 4097.9 28810.4 98304 32768
10/6/15 11:32 AM 4.2 87.9 4101.4 28809.9 98304 32768
10/6/15 11:33 AM 4.1 87.9 4062.2 28809.6 98304 32768
10/6/15 11:34 AM 4 87.9 3960.2 28808.4 98304 32768
10/6/15 11:35 AM 3.9 87.9 3858.5 28806.8 98304 32768
10/6/15 11:36 AM 3.9 87.9 3790.8 28805.9 98304 32768
10/6/15 11:37 AM 3.8 87.9 3774.6 28805.7 98304 32768
10/6/15 11:38 AM 3.6 87.9 3586.9 28805.1 98304 32768
10/6/15 11:39 AM 3.9 87.9 3838.9 28804.7 98304 32768
10/6/15 11:40 AM 3.8 87.9 3763 28804.7 98304 32768
10/6/15 11:41 AM 3.7 87.9 3661.5 28804 98304 32768
10/6/15 11:42 AM 3.8 88.1 3763 28864.2 98304 32768
10/6/15 11:43 AM 3.8 88.1 3753.1 28864.3 98304 32768
10/6/15 11:44 AM 3.7 88.1 3676.4 28864.3 98304 32768
10/6/15 11:45 AM 3.5 88.1 3405.6 28864.3 98304 32768
10/6/15 11:46 AM 3.5 88.1 3484.2 28864.4 98304 32768
10/6/15 11:47 AM 3.5 88.1 3477 28864.5 98304 32768
10/6/15 11:48 AM 3.6 88.1 3570.6 28864.5 98304 32768
10/6/15 11:49 AM 4.7 88.1 4646.2 28868.9 98304 32768
10/6/15 11:50 AM 34.5 98 33959.3 32123.8 98304 32768

LOG FILE
————–
Filename=alert_zsiims1.log
See the following:

Tue Oct 6 02:34:52 2015
Thread 1 advanced to log sequence 37413 (LGWR switch)
Current log# 1 seq# 37413 mem# 0: +V5DATA/zsiims/onlinelog/group_1.9844.799569609
Tue Oct 6 10:50:38 2015
ALTER SYSTEM ARCHIVE LOG
Tue Oct 6 10:50:39 2015
Thread 1 advanced to log sequence 37414 (LGWR switch)
Current log# 3 seq# 37414 mem# 0: +V5DATA/zsiims/onlinelog/group_3.9851.799569423
Tue Oct 6 11:21:49 2015
Errors in file /oracle/admin/zsiims/bdump/zsiims1_j001_10027156.trc:
ORA-12012: error on auto execute of job 54375
ORA-27468: “EXFSYS.RLM$SCHDNEGACTION” is locked by another process
Tue Oct 6 11:23:07 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=142
System State dumped to trace file /oracle/admin/zsiims/udump/zsiims1_ora_20971728.trc
Tue Oct 6 11:25:43 2015
Errors in file /oracle/admin/zsiims/bdump/zsiims1_j002_6815818.trc:
ORA-12012: error on auto execute of job 54374
ORA-27468: “EXFSYS.RLM$EVTCLEANUP” is locked by another process
Tue Oct 6 11:26:57 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=28
System State dumped to trace file /oracle/admin/zsiims/bdump/zsiims1_smon_15335466.trc
Tue Oct 6 11:35:26 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=193
System State dumped to trace file /oracle/admin/zsiims/udump/zsiims1_ora_32309462.trc
Tue Oct 6 11:41:27 2015
Shutting down instance: further logons disabled

LOG FILE
————–
Filename=alert_zsiims2.log
See the following:
Tue Oct 6 10:50:40 2015
Thread 2 advanced to log sequence 25153 (LGWR switch)
Current log# 5 seq# 25153 mem# 0: +V5DATA/zsiims/onlinelog/group_5.9853.799569371
Tue Oct 6 11:45:06 2015
IPC Send timeout detected.Sender: ospid 7798804
Receiver: inst 1 binc 4 ospid 6881304
Tue Oct 6 11:45:08 2015
Trace dumping is performing id=[cdmp_20151006114508]
Tue Oct 6 11:48:58 2015
Shutting down instance (abort)
License high water mark = 450
Instance terminated by USER, pid = 20578736
Tue Oct 6 11:49:50 2015
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0

Filename=zsiims1_ora_20971728A.trc ===> 发生问题后产生的第一个systemstate dump, 在Oct 6 11:23:07AM产生的,此时数据库的状况应该已经很糟糕了
See the following:

50几个session在等待latch: row cache objects

33: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=45472
37: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=18549
68: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=13886
71: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=46595
81: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=48525
106: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11855
118: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11183
122: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8913
132: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=12194
140: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12643
149: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12173
154: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=15022
156: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12143
160: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=20615
173: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11470
174: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11668
177: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11249
183: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=10725
184: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11485
200: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9970
204: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11019
207: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8356
208: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8811
215: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8914
219: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10169
230: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9156
241: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=7911
244: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8785
249: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=17542
255: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9209
264: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9247
276: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8368
288: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8211
310: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=7413
317: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=17869
331: waiting for ‘latch: row cache objects'[Rcache object=700000418a9adf8,] seq=5309
352: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2625
353: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10183
361: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5948
364: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5303
366: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=5300
378: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=760
381: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4788
394: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4758
401: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=109
424: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2924
429: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=3120
432: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2426
454: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1861
457: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1579
463: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1603
468: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1016
470: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1076
486: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=501
576: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=16368
606: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=30065

还有很多session在等待row cache lock

28: waiting for ‘row cache lock’ [Rcache object=70000064a573ef8,] seq=40578
40: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=59768
93: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2938
98: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2777
193: waiting for ‘row cache lock’ [Rcache object=700000621252470,] seq=966
199: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=107
214: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3501
220: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=86
232: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1148
239: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=12734
252: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2908
259: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1134
260: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=65
266: waiting for ‘row cache lock’ [Rcache object=700000550af2690,] seq=1791
268: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1288
274: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=821
278: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1357
287: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=637
289: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
293: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3363
302: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
319: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=98
320: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=366
323: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2618
327: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=167
328: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1137
330: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=31
332: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=53
333: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
334: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=368
337: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=202
341: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=56
342: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=18
343: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
345: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=359
349: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=562
350: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=24
351: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=407
358: waiting for ‘row cache lock’ [Rcache object=70000078b3d6290,] seq=28
359: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
363: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
365: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
371: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
373: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
375: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=89
379: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=605
383: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=523
385: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=595
387: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=243
390: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
391: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=355
393: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=20
395: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
397: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1080
404: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
409: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
410: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=60
411: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1101
413: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=95
414: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=948
416: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=315
417: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=516
419: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=102
420: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=607
421: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=831
426: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=42
433: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=58
437: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=421
442: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=44
445: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=185
447: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=35
450: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=55
456: waiting for ‘row cache lock’ [Rcache object=700000789bc8cc8,] seq=349
459: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=127
465: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=234
474: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=52
475: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=25
478: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=41
490: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=49
610: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=15070

分析发生问题时候的systemstate dump trace zsiims1_ora_20971728A.trc:

50几个session在等待latch: row cache objects, 大部分是和700000768fd8ea8有关的

33: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=45472
37: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=18549
68: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=13886
71: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=46595
81: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=48525
106: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11855
118: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11183
122: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8913
132: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=12194
140: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12643
149: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12173
154: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=15022
156: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12143
160: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=20615
173: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11470
174: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11668
177: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11249
183: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=10725
184: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11485
200: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9970
204: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11019
207: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8356
208: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8811
215: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8914
219: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10169
230: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9156
241: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=7911
244: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8785
249: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=17542
255: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9209
264: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9247
276: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8368
288: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8211
310: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=7413
317: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=17869
331: waiting for ‘latch: row cache objects'[Rcache object=700000418a9adf8,] seq=5309
352: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2625
353: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10183
361: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5948
364: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5303
366: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=5300
378: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=760
381: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4788
394: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4758
401: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=109
424: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2924
429: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=3120
432: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2426
454: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1861
457: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1579
463: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1603
468: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1016
470: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1076
486: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=501
576: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=16368
606: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=30065

以其中一个比较典型的process为例:

PROCESS 576:
—————————————-
SO: 70000078326f638, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=576, calls cur/top: 70000039983fd08/70000039983f788, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 109 0 4
last post received-location: kslpsr
last process to post me: 700000786279d80 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: 700000786279d80 1 6
(latch info) wait_event=0 bits=0
Location from where call was made: kghfrunp: clatch: wait:
waiting for 700000768fd8ea8 Child row cache objects level=4 child#=9
Location from where latch is held: kghfrunp: clatch: wait: ===============> 可以看到这个latch是因为kghfrunp相关的功能发起的,而这个功能是需要从shared pool中释放空间有关
Context saved from call: 0 ===========================>kghfrunp: (KGH: Ask client to free unpinned space)
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
207 (5, 1444101815, 2)
174 (5, 1444101815, 2)
322 (5, 1444101815, 2)

还有很多session在等待row cache lock, 最多的对象70000078e9f0d28

28: waiting for ‘row cache lock’ [Rcache object=70000064a573ef8,] seq=40578
40: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=59768
93: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2938
98: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2777
193: waiting for ‘row cache lock’ [Rcache object=700000621252470,] seq=966
199: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=107
214: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3501
220: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=86
232: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1148
239: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=12734
252: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2908
259: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1134
260: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=65
266: waiting for ‘row cache lock’ [Rcache object=700000550af2690,] seq=1791
268: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1288
274: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=821
278: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1357
287: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=637
289: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
293: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3363
302: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
319: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=98
320: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=366
323: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2618
327: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=167
328: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1137
330: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=31
332: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=53
333: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
334: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=368
337: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=202
341: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=56
342: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=18
343: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
345: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=359
349: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=562
350: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=24
351: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=407
358: waiting for ‘row cache lock’ [Rcache object=70000078b3d6290,] seq=28
359: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
363: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
365: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
371: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
373: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
375: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=89
379: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=605
383: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=523
385: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=595
387: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=243
390: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
391: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=355
393: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=20
395: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
397: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1080
404: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
409: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
410: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=60
411: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1101
413: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=95
414: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=948
416: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=315
417: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=516
419: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=102
420: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=607
421: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=831
426: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=42
433: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=58
437: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=421
442: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=44
445: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=185
447: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=35
450: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=55
456: waiting for ‘row cache lock’ [Rcache object=700000789bc8cc8,] seq=349
459: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=127
465: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=234
474: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=52
475: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=25
478: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=41
490: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=49
610: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=15070

以其中的process 610为例:

PROCESS 610:
—————————————-
SO: 700000784271270, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=610, calls cur/top: 700000387cb3b58/700000387cb35d8, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 109 0 4
last post received-location: kslpsr
last process to post me: 700000786279d80 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: 700000786279d80 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 7000007852bdfc8
O/S info: user: oracle, term: UNKNOWN, ospid: 39321988
OSD pid info: Unix process pid: 39321988, image: oracle@zsinsraca1
Short stack dump:
ksdxfstk+002c< -ksdxcb+04e4<-sspuser+0074<-000047DC<-sskgpwwait+0034<-skgpwwait+00bc<-ksliwat+06c0<-kslwaitns_timed+0024<-kskthbwt+022c<-kslwait+00f4<-kqrigt+05e0<-kqrpre1+0960<-kqrpre+001c<-ktusmous_online_undoseg+05f4<-ktusmaus_add_us+0300<-ktubnd+04ac<-ktuchg+055c<-ktbchg2+01f0<-kdtchg+0524<-kdtwrp+2540<-kdtInsRow+0a20<-insrow+043c<-insdrv+0428<-inscovexe+02b8<-insExecStmtExecIniEngine+005c<-insexe+02f8<-opiexe+2738<-opipls+185c<-opiodr+0ae0<-rpidrus+01bc<-skgmstack+00c8<-rpidru+0088<-rpiswu2+034c<-rpidrv+095c<-psddr0+02bc<-psdnal+01d0<-pevm_EXECC+01f8<-pfrinstr_EXECC+0070<-pfrrun_no_tool+005c<-pfrrun+1014<-plsql_run+06b4<-pricar+042c<-pricbr+0258<-prient2+03a8<-prient+0b0c<-kkxrpc+0250<-kporpc+01d4<-opiodr+0ae0<-ttcpip+1020<-opitsk+1124<-opiino+0990<-opiodr+0ae0<-opidrv+0484<-sou2o+0090<-opimai_real+01bc<-main+0098<-__start+0070 ---------------------------------------- SO: 70000078f2c84a0, type: 4, owner: 700000784271270, flag: INIT/-/-/0x00 (session) sid: 236 trans: 7000007619fcf08, creator: 700000784271270, flag: (100041) USR/- BSY/-/-/-/-/- DID: 0001-025F-00004ADB, short-term DID: 0001-0262-00000005 txn branch: 700000760262cd0 oct: 2, prv: 0, sql: 70000043a39dda8, psql: 7000006feac35a8, user: 88/ZSCLMINTF service name: zsiims O/S info: user: oracle, term: , ospid: 10027038, machine: zsinsraca2 program: oracle@zsinsraca2 (TNS V1-V3) application name: oracle@zsinsraca2 (TNS V1-V3), hash value=450405937 waiting for 'row cache lock' blocking sess=0x0 seq=15070 wait_time=0 seconds since wait started=1605 cache id=3, mode=0, request=3 ============> cache id为3, 即dc_rollback_segments上的争用
Dumping Session Wait History
for ‘row cache lock’ count=1 wait_time=1388976
cache id=3, mode=0, request=3

此时我们并不能简单的认为,这个问题和rollback/undo有关
因为此时LCK0进程也在等待和kghfrunp(释放shared pool中内存)有关的latch,而在RAC环境里, row cache lock/library cache lock这样的instance lock的获取和释放必须经过LCK0进程,如果LCK0异常,那么就会发生row cache lock的问题.

PROCESS 33:
—————————————-
SO: 70000078627a560, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=33, calls cur/top: 7000007853a1520/7000007853a1520, flag: (6) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 33
last post received-location: ksrpublish
last process to post me: 700000782261488 12 0
last post sent: 0 0 21
last post sent-location: ksbria
last process posted by me: 70000078627a560 1 6
(latch info) wait_event=0 bits=0
Location from where call was made: kqrbip:
waiting for 70000075f1cd020 Child row cache objects level=4 child#=16
Location from where latch is held: kghfrunp: clatch: nowait:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
33 (3, 3, 3)
431 (3, 1444101789, 3)
486 (3, 1444101789, 3)
492 (3, 1444101789, 3)
468 (3, 1444101789, 3)
321 (3, 1444101789, 3)
470 (3, 1444101789, 3)
66 (3, 1444101789, 3)
276 (3, 1444101789, 3)
388 (3, 1444101789, 3)
207 (3, 1444101789, 3)
205 (3, 1444101789, 3)
443 (3, 1444101789, 3)
waiter count=13
gotten 1588533458 times wait, failed first 46002888 sleeps 3053343
gotten 2849791 times nowait, failed: 4267742
possible holder pid = 75 ospid=32440736
on wait list for 70000075f1cd020
Process Group: DEFAULT, pseudo proc: 7000007852bdfc8
O/S info: user: oracle, term: UNKNOWN, ospid: 6881304
OSD pid info: Unix process pid: 6881304, image: oracle@zsinsraca1 (LCK0)
Short stack dump:
ksdxfstk+002c< -ksdxcb+04e4<-sspuser+0074<-000047DC<-sskgpwwait+0034<-skgpwwait+00bc<-kslges+054c<-kslgetl+033c<-kqrbip+03c8<-kqrbfr+01ec<-kqrbtm+00c8<-ksbcti+03c8<-ksbabs+03fc<-kclabs+00d8<-ksbrdp+04b4<-opirip+03fc<-opidrv+0458<-sou2o+0090<-opimai_real+0150<-main+0098<-__start+0070 ---------------------------------------- SO: 70000078f366290, type: 4, owner: 70000078627a560, flag: INIT/-/-/0x00 (session) sid: 1062 trans: 0, creator: 70000078627a560, flag: (51) USR/- BSY/-/-/-/-/- DID: 0000-0021-00000003, short-term DID: 0000-0000-00000000 txn branch: 0 oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS service name: SYS$BACKGROUND waiting for 'latch: row cache objects' blocking sess=0x0 seq=45472 wait_time=0 seconds since wait started=0 address=70000075f1cd020, number=c8, tries=1 Dumping Session Wait History for 'latch: row cache objects' count=1 wait_time=906249 address=70000075f1cd020, number=c8, tries=0 for 'latch: row cache objects' count=1 wait_time=27503 address=70000075f1cd020, number=c8, tries=2 for 'latch: row cache objects' count=1 wait_time=292982 address=70000075f1cd020, number=c8, tries=1 综合考虑,从现在得到的信息来看,很可能发生问题时shared pool的使用出现了问题,但是因为我们没有更具体的dba_hist_active_sess_history数据,并且并没有在发生问题的第一时间收集systemstate dump,我们可能无法更进一步的确认它了. LOG FILE -------------- Filename=awrrpt_1_25919_25920.html ===> 发生问题前1个小时的AWR report
See the following:

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 25919 06-Oct-15 09:00:26 224 6.6
End Snap: 25920 06-Oct-15 10:00:36 233 6.3
Elapsed: 60.17 (mins)
DB Time: 18.51 (mins)

Cache Sizes

Begin End
Buffer Cache: 21,776M 21,776M Std Block Size: 8K
Shared Pool Size: 8,752M 8,752M Log Buffer: 30,672K

SGA breakdown difference
ordered by Pool, Name
N/A value for Begin MB or End MB indicates the size of that Pool/Name was insignificant, or zero in that snapshot
Pool Name Begin MB End MB % Diff
shared CCursor 302.83 323.98 6.98
shared Cursor Stats 222.80 222.80 0.00
shared KGH: NO ACCESS 6,056.99 6,056.99 0.00 ===========> 将近6个GB是被KGH: NO ACCESS所使用
shared PCursor 252.15 274.83 9.00
shared free memory 5,239.86 4,815.81 -8.09
shared gcs resources 418.68 418.68 0.00
shared gcs shadows 384.49 384.49 0.00
shared kglsim heap 245.99 245.99 0.00
shared kglsim object batch 416.87 416.87 0.00
shared library cache 251.35 258.46 2.83
shared sql area 382.75 764.75 99.80

1. 发生问题前的awr report显示shared pool中大部分的内存都被KGH: NO ACCESS占用
2. 而发生问题后的systemstate dump显示非常多的进程都在等待跟kghfrunp功能有关的latch (LCK0也在等这样的latch, 导致了系统出现严重的row cache enqueue/row cache lock问题)
而这个kghfrunp功能是要从shared pool中释放空间,这说明shared pool当时不够用

8666117.8, patch 8666117 可以优化lck0在shared pool有压力下的工作,可以减缓row cache lock的问题
note 801787.1里 ,有以下的三个方式可以减少 KGH: NO ACCESS的大小:

a). 设置最小的buffer cache及shared pool的值
b). 减少shared pool resize的频率:设置 _memory_broker_stat_interval
c). 打patch 7189722

One thought on “Oracle Rac hang 案例分析一则(WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK)

  1. luda says:

    Common Cause for ORA-4031 in 10gR2, Excess “KGH: NO ACCESS” Memory Allocation [Video] (文档 ID 801787.1) 转到底部转到底部

    In this Document
    Symptoms
    Cause
    Solution
    References
    APPLIES TO:

    Oracle Database – Enterprise Edition – Version 10.2.0.1 to 10.2.0.5 [Release 10.2]
    Information in this document applies to any platform.
    ***Checked for relevance on 12-Mar-2013***
    ***Checked for relevance on 13-Aug-2013***
    SYMPTOMS

    Video – How to determine you are hitting this issue (02:44) Trouble seeing this video?

    After several hours of database operation, ORA-4031 errors are reported.

    An examination of the ORA-4031 trace file will show the Memory Allocation called “KGH: NO ACCESS” is consuming large amount of memory. For example the following shows over 500Mbytes for this allocation.

    Allocation Name Size
    _________________________ __________
    “free memory ” 163766600

    “KGH: NO ACCESS ” 560713888

    Note that it is normal to periodically see “KGH: NO ACCESS” allocations up to about 64M.
    This is memory that is in transition between SGA components when automatic memory management is resizing SGA components. However, it is not normal to see persistent high allocations or a steady build up of this allocation type over time. The exception is when the database needs to make large changes i.e. when changing memory after a heavy load, or starting up using sub-optimal SGA settings, such as when an SPFILE is not being used.

    The following query confirms that Memory Allocation “KGH: NO ACCESS” is large:

    select * from v$sgastat where pool = ‘shared pool’ and (name in (‘free memory’, ‘sql area’, ‘library cache’, ‘miscellaneous’, ‘row cache’, ‘KGH: NO ACCESS’) );

    The following query shows many GROW and SHRINK operations for “DEFAULT buffer cache” and “shared pool”:

    ALTER SESSION SET nls_date_format = ‘DD/MM/YYYY HH:MI:SS’;
    SET PAGESIZE 900
    SET LINESIZE 255
    COL COMPONENT FORMAT A25
    COL INITIAL_SIZE FORMAT A10
    COL FINAL_SIZE FORMAT A10
    SPOOL ASMM_RESIZE.TXT
    select START_TIME, component, oper_type, oper_mode, initial_size/1024/1024 “INITIAL”, FINAL_SIZE/1024/1024 “FINAL”, END_TIME
    from v$sga_resize_ops
    where component in (‘DEFAULT buffer cache’, ‘shared pool’) and status = ‘COMPLETE’
    order by start_time, component;
    SPOOL END
    CAUSE

    Too frequent resizes of Shared Pool and Buffer Cache, leading to excess “KGH: NO ACCESS” memory allocation, that consumes SGA memory.

    Several bugs have been logged for this problem, for different versions within 10gR2;

    Fixed 10.2.0.2
    Unpublished Bug 4507532: SGA_TARGET DOESN’T WORK AS EXPECTED
    Bug 5045507 ASMM – FREQUENT RESIZING OF SHARED POOL & BUFFER CACHE
    Fixed 10.2.0.4
    Unpublished Bug 6528336: APPSST 10G GSI: LARGE NUMBER OF SESSIONS WAITING ON CURSOR: PIN S WAIT ON X
    Fixed 10.2.0.5
    Unpublished Bug 7189722: APPSST GSI 10G: VERY FREQUENT GROW/SHRINK SGA RESIZE OPERATION HAPPENING
    If you are seeing this in version: 11.1.0.6 to 11.2.0.1, please see note:
    Note 1127833.1 ORA-04031 in 11g & 11gR2, Excess “KGH: NO ACCESS” Memory Allocation

    SOLUTION

    In summary;

    1. Disable ASMM

    OR

    2. Set the minimum values for the Shared Pool and the Database Buffer Cache

    OR

    3. Increase the time between resize operations

    OR

    4. Apply the fix, by either upgrading or applying a one-off patch, depending on your version.

    To fix the problem being encountered, where excessive ASMM resize operations leads to the “KGH: NO ACCESS” memory allocation consuming the memory available to the SGA, the above solutions are available, as per MOS notes;
    Note 451960.1 How To Prevent The Growth Of The Component ‘KGH: NO ACCESS’ In The Shared Pool When ASMM Is Enabled
    Note 742599.1 High ‘cursor: pin S wait on X’ and/or ‘library cache lock’ Waits. Cause: Shared Pool/Buffer Cache Resize Activity

    Disabling ASMM will mean memory is no longer swapped between the pools, but it does require the SGA parameters to be set manually.

    Setting minimum values for the Shared Pool and Buffer cache when ASMM is enabled means ASMM is still in operation, but any operations that would try to change the SGA component sizes below the minimum values will not occur, leading to less memory needing to pass through the “KGH: NO ACCESS” memory allocation. This is the only operation that does not need a database shutdown/startup.

    Increase the time between the resize operations will mean that the default of 30 seconds is increased to a much larger time interval.

    Finally, applying the patch or upgrading will fix the code in accordance to Oracle Development recommendations.

    Solution 1: Disable ASMM and revert to setting the SGA Manually

    Determine reasonable values for SGA Parameters DB_CACHE_SIZE, SHARED_POOL_SIZE, LARGE_POOL_SIZE, JAVA_POOL_SIZE and STREAMS_POOL_SIZE (as apprioriate).
    For further assistance, please review MOS note:
    Note 1008866.6 How to determine SGA Size (7.x, 8.0.x, 8i, 9i, 10g)
    Disable ASMM:
    SQL> alter system set SGA_TARGET=0 scope=spfile;

    Set the SGA pool sizes manually, using the values you determined from Step 1 (above):
    For example:
    SQL> alter system set SHARED_POOL_SIZE=1G scope=spfile.

    NOTE: Not all the parameters may need to be set, and will default to 0.
    Shutdown and startup the database, so that ASMM is turned off, and the new manual SGA settings take affect.
    Solution 2: Leave ASMM enabled, but set minimum values for the Shared Pool and Buffer Cache

    During a typical, busy period on the database, run the following query:
    SET PAGESIZE 100
    COL COMPONENT FORMAT A25
    COL FINAL_SIZE FORMAT A15
    select component, AVG(FINAL_SIZE) “AVG FINAL”, MEDIAN(FINAL_SIZE) “MEDIAN FINAL”, MAX(FINAL_SIZE) “MAX FINAL”
    from v$sga_resize_ops
    group by component;

    For the row “DEFAULT buffer cache”, determine which value is greater, “AVG FINAL” or “MEDIAN FINAL”.
    Record this value as the Minimum Buffer Cache n.
    For the row “shared pool”, determine which value is greater, “AVG FINAL” or “MEDIAN FINAL”.
    Record this value as the Minimum Shared Pool m.
    Add together the values for the Minimum Buffer Cache & Minimum Shared Pool, and compare this to your current SGA_TARGET or SGA_MAX_SIZE.
    If the total is greater than SGA_TARGET or SGA_MAX_SIZE, then they will also need to be increased. Determine how big SGA_TARGET or SGA_MAX_SIZE can be, and then implement the increase, for example:
    SQL> ALTER SYSTEM SET SGA_TARGET=nnn SCOPE=BOTH;

    Set parameter DB_CACHE_SIZE to the value of the Minimum Buffer Cache (as per Step 2 above).
    SQL> ALTER SYSTEM SET DB_CACHE_SIZE=n SCOPE=SPFILE;

    Set parameter SHARED_POOL_SIZE to the value of the Minimum Shared Pool (as per Step 3 above).
    SQL> ALTER SYSTEM SET SHARED_POOL_SIZE=m SCOPE=SPFILE;

    Re-start the database.
    Alternately, you can try to perform the memory changes without re-starting the database.
    However, you need to determine what the dynamic settings are first, by running the following query:
    SQL> select component, current_size from v$sga_dynamic_components where component like ‘% pool’ or component = ‘DEFAULT buffer cache’;

    If both the “shared pool” and the “DEFAULT buffer cache” are less than the minimum values determined in Steps 2 & 3 above, then you can try using SCOPE=BOTH, when setting the DB_CACHE_SIZE and SHARED_POOL_SIZE.
    Solution 3: Increase the time between resize operations

    Review the Solutions from Note 742599.1, in particular, the solution involving parameter “_memory_broker_stat_interval”.
    For your database, determine a reasonable period of time for the delay between resize operations, given the default is 30 seconds.
    Set parameter “_memory_broker_stat_interval” to the time period n you determined in Step 2 (above):
    SQL> ALTER SYSTEM SET “_memory_broker_stat_interval”=n SCOPE=SPFILE;

    Shutdown and startup the database, so that the underscore parameter takes affect.
    Solution 4: Apply the fix

    10.2.0.1

    If you are using v10.2.0.1, upgrade to a minimum of v10.2.0.3 (or, see below).
    10.2.0.2

    If you are using v10.2.0.2, no bugs have been logged for this version, so it is recommended that you upgrade (see below).
    10.2.0.3

    Download and review the readme and pre-requisites for Patch 6528336
    As of April 2009, the platforms are; Linux x86, IBM AIX 64-bit, HP-UX Itanium.
    If the patch does not exist for your platform, please log a Service Request, requesting a Backport of Bug 6528336 to your platform.
    10.2.0.4

    Download and review the readme and pre-requisites for Patch 7189722
    As of April 2009, the platforms are; Linux x86, Linux x86-64, IBM AIX 64-bit, HP-UX Itanium, Sun Solaris SPARC 64-bit.
    If the patch does not exist for your platform, please log a Service Request, requesting a Backport of Bug 7189722 to your platform.
    OR Oracle recommends that you upgrade to v10.2.0.5, when it is available for your platform.

    REFERENCES

    BUG:5045507 – ASMM – FREQUENT RESIZING OF SHARED POOL & BUFFER CACHE

    NOTE:1008866.6 – How to determine SGA Size (7.x, 8.x, 9.x, 10g)
    NOTE:1127833.1 – ORA-04031 in 11g & 11gR2, Excess “KGH: NO ACCESS” Memory Allocation
    NOTE:451960.1 – How To Prevent The Growth Of The Component ‘KGH: NO ACCESS’ In The Shared Pool When ASMM Is Enabled
    NOTE:742599.1 – High ‘cursor: pin S wait on X’ and/or ‘library cache lock’ Waits. Cause: Shared Pool/Buffer Cache Resize Activity

Comments are closed.