Skip to content

Instance tuning

Exadata x6 遭遇大量的 TCP socket 等待事件

客户从IBM POWER迁移到EXADATA X6后遭遇大量的TCP SOCKET等待,该等待开销了大约15%的cpu资源, 系统运行未受影响,待过一段时间后,系统cpu降低,该等待减少.

经过分析客户大量使用了utl相关东西,网络延时不稳定. 该等待事件与HTTP和网络的延时有关.

和power上历史允许的性能数据对比发现历史也存在TCP SOCKET在top 5里.

如下图:

历史小机运行信息:

IMG_3634

x6运行信息:

IMG_3632

 

X6网络情况:

IMG_3633

官方给予的解释:数据库层面不用理会,关注http和网络相关情况.

“TCP Socket (KGAS)” Reference Note

This is a reference note for the wait event “TCP Socket (KGAS)” which includes the following subsections:

See Note:61998.1 for an introduction to Wait Events.

Definition:

  • Versions:10.2 – Documentation: None
  • KGAS is a component in the server which handles TCP/IP sockets which is typically used in dedicated connections in 10.2+ by some PLSQL built in packages such as UTL_HTTP and UTL_TCP.
    The KGAS interface is not involved in client / server communication but is a layer which may be used when a session on the server makes some outbound TCP/IP call using a PLSQL package such as UTL_HTTP. Packages such as UTL_FILE have also reported this wait when making an SMTP call.
  • Note that this wait event is new in 10gR2. Earlier versions of Oracle doing the same operations would still wait inside KGAS for socket operations but such waits were not instrumented and so did not show up as waits.

Individual Waits:

 Parameters:	
    P1 = operation
    P2 = Not used
    P3 = Not used

	operation   P1 gives an indication of what 
		    operation is being performed. 
		    Possible values include:
			 1 Event Post
			 2 Call
			 3 Disconnect
			 4 Receive data
			 5 Send data
			 6 Wait for an event (eg wait for data to arrive)
			 7 Sleep
			 8 Special wait (for single object)
			 9 Special wait (for multiple objects)
			10 Select operation
                

 Wait Time:	
    The wait blocks until the current operation completes
    (or times out / errors as appropriate).

Systemwide Waits:

If the TIME spent waiting for this event is significant then it is best to determine which sessions are showing the wait and drill into what those sessions are doing as the wait is usually related to whatever application code is executing .
eg: What part of the application may be using UTL_HTTP or similar and is experiencing waits. This statement can be used to see which sessions may be worth tracing:

  SELECT sid, total_waits, time_waited
    FROM v$session_event
   WHERE event='TCP Socket (KGAS)'
     and total_waits>0
   ORDER BY 3,2
  ;

Reducing Waits / Wait times:

The waits incurred depend on what sockets are being opened to which remote end points and for what reason. To help find the origin of the socket operations try:

  • Check the current SQL / module / action of V$SESSION for sessions that are waiting on the event at the time that they are waiting to try and identify any common area of application code waiting on the event.
  • Get an ERRORSTACK level 3 dump of some sessions waiting on the event. This should help show the exact PLSQL and C call stacks invoking the socket operation if the dump is taken when the session is waiting. Customers may need assistance from Oracle Support in order to get and interpret such a dump but it can help pinpoint the relevant application code.
  • Trace sessions incurring the waits including wait tracing to try and place the waits in the context of the code executing around the waits. eg: Use event 10046 level 8 or DBMS_MONITOR.SESSION_TRACE_ENABLE.
  • Use DBA_DEPENDENCIES to find any application packages which may ultimately be using UTL_HTTP or UTL_TCP underneath for some operation.

Note that there are no real tunables within Oracle for these waits as they involve the session making a call to some remote TCP/IP socket and typically waiting on data from that source. Once you know what is being called, and why, then you can determine if the response times from that remote source are sensible or not and if not why.Example:
Execute the following SQL from a session on a dedicated connection and then check the resulting trace file to see “TCP Socket (KGAS)” waits:

alter session set events '10046 trace name context forever, level 8';
select utl_http.request('http://www.oracle.com/') from dual;

Related:

Tracing User sessions Note:404833.1

关于11g密码大小写验证以及密码延迟验证特性引发的血案所思

在Oracle的11g之前的版本中密码是不区分大小写的(使用双引号强制除外)。在Oracle的11g版本中对此有所增强。从此密码有了大小写的区分,这个大小写敏感特性是通过SEC_CASE_SENSITIVE_LOGON参数来控制的。该参数默认设置为true。

前阵子客户数据从10g迁往11g后,这个参数未修改,新上线第二天全国业务人员输入密码时候参照以前习惯密码全部小写,导致部分业务无法登陆,由于反复尝试登陆导致触发了著名的登陆延迟验证的特性延伸出了library cache lock(Delay after three failed login attempts),导致全线应用响应缓慢假死。当然解决过程当时根据现象,进程达到限制值,做了等待事件模型的分析,发现大量的session都没有成功登陆,很多信息都为null逐判断因为密码延迟验证的原因导致,而密码错误的事情再经过现场验证后发现了大小写敏感的问题。后面就是很熟套的剧情,领导抽了根烟,痛骂了原厂一顿,把此2个特性都关掉后,没有再出现过进程累积的现象。由此问题引发了对密码大小写验证以及密码延迟验证特性的思考.

一:关于sec_case_sensitive_logon的设置

关于sec_case_sensitive_logon参数所关联的密码大小写敏感,我建议在升级数据库系统时候关闭,在新建数据库系统时候关闭。可以配合应用密码策略使用,默认时候可以商讨关闭。

二:密码延迟验证的特性使用与否
关于Delay after three failed login attempts的特性参考yangtingkun的http://blog.itpub.net/4227/viewspace-672925

这个密码延迟特性特性的解决办法有几种,这里我大概描述下:

    1.设置登陆密码验证失败次数超过几次后锁定用户,该办法可以防止用户反复登陆验证,但是也会增加一定的维护工作
    2.通过诊断事件28401关闭密码延迟验证

同时我并不建议用第二种办法,密码延迟验证的特性可以有效遏制恶意破解密码的行为,因此从数据库安全的角度我建议设置密码失败超数锁用户。

关于event 28401

ALTER SYSTEM SET EVENT = '28401 TRACE NAME CONTEXT FOREVER, LEVEL 1' SCOPE = SPFILE

[oracle@ludatou ~]$ oerr ora 28401
28401, 00000, "Event to disable delay after three failed login attempts"
// *Document: NO
// *Cause: N/A
// *Action: Set this event in your environment to disable the login delay
//          which will otherwise take place after three failed login attempts.
// *Note: THIS IS NOT A USER ERROR NUMBER/MESSAGE. THIS DOES NOT NEED TO BE
//        TRANSLATED OR DOCUMENTED.

Oracle Rac hang 案例分析一则(WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK)

LOG FILE
————–
Filename=zsinsraca1_151006_0000.nmon.xlsx
See the following:

+++ CPU_ALL ========> CPU usage is NOT high

CPU Total zsinsraca1 User% Sys% Wait% Idle% CPU% Logical CPUs (SMTmode=4)
10/6/15 10:30 AM 5.2 1.7 0.2 92.9 6.9 48
10/6/15 10:31 AM 4.3 1.6 0.1 94 5.9 48
10/6/15 10:32 AM 7.6 1.5 0.2 90.6 9.1 48
10/6/15 10:33 AM 8.6 1.7 0 89.7 10.3 48
10/6/15 10:34 AM 8.8 1.8 0 89.3 10.6 48
10/6/15 10:35 AM 10.7 2 0.1 87.2 12.7 48
10/6/15 10:36 AM 9.3 2.1 0.1 88.6 11.4 48
10/6/15 10:37 AM 9.4 2.1 0.1 88.4 11.5 48
10/6/15 10:38 AM 10.7 2.5 0.2 86.6 13.2 48
10/6/15 10:39 AM 10.2 2.4 0.1 87.3 12.6 48
10/6/15 10:40 AM 9.6 1.8 0 88.5 11.4 48
10/6/15 10:41 AM 9.6 2.1 0 88.2 11.7 48
10/6/15 10:42 AM 9 2.2 0.1 88.7 11.2 48
10/6/15 10:43 AM 9.2 1.7 0 89 10.9 48
10/6/15 10:44 AM 11 1.8 0 87.2 12.8 48
10/6/15 10:45 AM 10.7 1.9 0.1 87.3 12.6 48
10/6/15 10:46 AM 12.9 2.1 0.1 84.9 15 48
10/6/15 10:47 AM 9.4 1.5 0 89 10.9 48
10/6/15 10:48 AM 11.3 1.6 0.1 87.1 12.9 48
10/6/15 10:49 AM 11.8 2.4 0.1 85.7 14.2 48
10/6/15 10:50 AM 12.2 1.8 0 85.9 14 48
10/6/15 10:51 AM 13.2 1.9 0.1 84.8 15.1 48
10/6/15 10:52 AM 12.4 1.3 0.1 86.3 13.7 48
10/6/15 10:53 AM 12.7 1.8 0 85.5 14.5 48
10/6/15 10:54 AM 11.9 1.9 0.1 86.2 13.8 48
10/6/15 10:55 AM 10.2 2 0.1 87.6 12.2 48
10/6/15 10:56 AM 11.5 2.4 0.2 85.9 13.9 48
10/6/15 10:57 AM 11.4 1.8 0.1 86.6 13.2 48
10/6/15 10:58 AM 11.9 1.7 0 86.3 13.6 48
10/6/15 10:59 AM 10.3 1.6 0 88.1 11.9 48
10/6/15 11:00 AM 12.3 1.5 0.2 86 13.8 48
10/6/15 11:01 AM 12.1 1.7 0.1 86.2 13.8 48
10/6/15 11:02 AM 10.6 1.6 0.1 87.8 12.2 48
10/6/15 11:03 AM 13.8 2.4 0.1 83.7 16.2 48
10/6/15 11:04 AM 12.5 2.6 0.1 84.9 15.1 48
10/6/15 11:05 AM 11.4 1.8 0.1 86.7 13.2 48
10/6/15 11:06 AM 10.1 2.3 0.1 87.5 12.4 48
10/6/15 11:07 AM 9.7 2.1 0.1 88.2 11.8 48
10/6/15 11:08 AM 10.5 2 0 87.5 12.5 48
10/6/15 11:09 AM 11.3 1.1 0.1 87.5 12.4 48
10/6/15 11:10 AM 11.6 1.8 0.1 86.6 13.4 48
10/6/15 11:11 AM 9.5 1.4 0.1 89 10.9 48
10/6/15 11:12 AM 8.4 1.4 0 90.2 9.8 48
10/6/15 11:13 AM 9 1.6 0 89.3 10.6 48
10/6/15 11:14 AM 10.1 1.9 0 87.9 12 48
10/6/15 11:15 AM 11.7 2.2 0.1 86.1 13.9 48
10/6/15 11:16 AM 10.5 1.9 0.1 87.6 12.4 48
10/6/15 11:17 AM 11 1.2 0.1 87.8 12.2 48
10/6/15 11:18 AM 12.6 2.2 0.1 85.2 14.8 48
10/6/15 11:19 AM 9.9 1.6 0.1 88.4 11.5 48
10/6/15 11:20 AM 11.3 2.1 0.1 86.5 13.4 48
10/6/15 11:21 AM 11.4 1.3 0.1 87.3 12.7 48
10/6/15 11:22 AM 11.6 1.5 0.1 86.8 13.1 48
10/6/15 11:23 AM 12.7 3.2 0.1 84 15.9 48
10/6/15 11:24 AM 11.8 3.1 0.1 85.1 14.9 48
10/6/15 11:25 AM 9.1 1.5 0 89.3 10.6 48
10/6/15 11:26 AM 11.1 1.9 0.1 86.9 13 48
10/6/15 11:27 AM 13.3 3.9 0.4 82.4 17.2 48
10/6/15 11:28 AM 10.3 2 0.8 86.9 12.3 48
10/6/15 11:29 AM 10.2 1.2 0 88.6 11.4 48
10/6/15 11:30 AM 11.3 1.7 0 87 13 48
10/6/15 11:31 AM 13.9 2.8 0.1 83.2 16.7 48
10/6/15 11:32 AM 13.2 3.1 0.1 83.6 16.3 48
10/6/15 11:33 AM 10.9 2.2 0.1 86.8 13.1 48
10/6/15 11:34 AM 10.6 2.6 0.1 86.8 13.2 48
10/6/15 11:35 AM 12 2.6 0 85.4 14.6 48
10/6/15 11:36 AM 14.5 5.3 0.1 80.1 19.8 48
10/6/15 11:37 AM 11.4 1.6 0.1 86.9 13 48
10/6/15 11:38 AM 10.8 1.7 0 87.5 12.5 48
10/6/15 11:39 AM 9.7 1.6 0 88.6 11.3 48
10/6/15 11:40 AM 13.2 1.9 0 84.9 15.1 48
10/6/15 11:41 AM 10.6 1.3 0 88.2 11.9 48
10/6/15 11:42 AM 12.1 4 0.2 83.7 16.1 48
10/6/15 11:43 AM 11.9 4.4 0 83.6 16.3 48
10/6/15 11:44 AM 10.1 1.6 0 88.3 11.7 48
10/6/15 11:45 AM 12.4 2.4 0.4 84.8 14.8 48
10/6/15 11:46 AM 15.3 6.5 1.3 77 21.8 48
10/6/15 11:47 AM 13.4 4.4 1 81.2 17.8 48
10/6/15 11:48 AM 16.5 6.8 0.1 76.6 23.3 48
10/6/15 11:49 AM 15.9 5.1 0.1 78.9 21 48
10/6/15 11:50 AM 8.5 4.8 0.2 86.5 13.3 48

+++ MEM =========> free memory 在11:50时仍有3GB free

Memory zsinsraca1 Real Free % Virtual free % Real free(MB) Virtual free(MB) Real total(MB) Virtual total(MB)
10/6/15 10:30 AM 7.6 88 7476.4 28845 98304 32768
10/6/15 10:31 AM 7.6 88 7472.8 28845 98304 32768
10/6/15 10:32 AM 7.6 88 7454.3 28845 98304 32768
10/6/15 10:33 AM 7.5 88 7412.8 28845 98304 32768
10/6/15 10:34 AM 7.5 88 7374.2 28845.5 98304 32768
10/6/15 10:35 AM 7.4 88 7303.3 28845.5 98304 32768
10/6/15 10:36 AM 7.3 88 7147.1 28845.5 98304 32768
10/6/15 10:37 AM 7.3 88 7210.1 28845.5 98304 32768
10/6/15 10:38 AM 7.3 88 7170 28845.5 98304 32768
10/6/15 10:39 AM 7.3 88 7128 28845.5 98304 32768
10/6/15 10:40 AM 7.2 88 7126.4 28845.5 98304 32768
10/6/15 10:41 AM 7.2 88 7104.1 28845.5 98304 32768
10/6/15 10:42 AM 7.2 88 7052 28845.5 98304 32768
10/6/15 10:43 AM 7 88 6888.5 28845.6 98304 32768
10/6/15 10:44 AM 7.1 88 7028.3 28845.6 98304 32768
10/6/15 10:45 AM 7 88 6914.8 28845 98304 32768
10/6/15 10:46 AM 7 88 6917.3 28844.5 98304 32768
10/6/15 10:47 AM 7 88 6886.8 28844.1 98304 32768
10/6/15 10:48 AM 6.9 88 6822.9 28843.3 98304 32768
10/6/15 10:49 AM 6.8 88 6668 28842 98304 32768
10/6/15 10:50 AM 6.7 88 6602.4 28841.1 98304 32768
10/6/15 10:51 AM 6.6 88 6463.4 28840 98304 32768
10/6/15 10:52 AM 6.3 88 6235.7 28838.4 98304 32768
10/6/15 10:53 AM 6.5 88 6397.8 28837.6 98304 32768
10/6/15 10:54 AM 6.5 88 6384.6 28837.4 98304 32768
10/6/15 10:55 AM 6.5 88 6340.9 28836.2 98304 32768
10/6/15 10:56 AM 6.2 88 6123.9 28836.2 98304 32768
10/6/15 10:57 AM 6.3 88 6228.7 28835.8 98304 32768
10/6/15 10:58 AM 6.2 88 6143.9 28834.9 98304 32768
10/6/15 10:59 AM 6.2 88 6111.3 28834.2 98304 32768
10/6/15 11:00 AM 6.1 88 6017.9 28833.1 98304 32768
10/6/15 11:01 AM 5.9 88 5837.6 28832.9 98304 32768
10/6/15 11:02 AM 6 88 5888.8 28832 98304 32768
10/6/15 11:03 AM 5.9 88 5848.5 28831.3 98304 32768
10/6/15 11:04 AM 5.7 88 5606.2 28827.8 98304 32768
10/6/15 11:05 AM 5.7 88 5555.2 28827.1 98304 32768
10/6/15 11:06 AM 5.6 88 5503.2 28826.3 98304 32768
10/6/15 11:07 AM 5.4 88 5271.7 28825.5 98304 32768
10/6/15 11:08 AM 5.5 88 5388.1 28824.7 98304 32768
10/6/15 11:09 AM 5.3 88 5239.6 28824.7 98304 32768
10/6/15 11:10 AM 5.5 88 5391.1 28824.1 98304 32768
10/6/15 11:11 AM 5.4 88 5312.4 28823.9 98304 32768
10/6/15 11:12 AM 5.4 88 5276.5 28823.4 98304 32768
10/6/15 11:13 AM 5.4 88 5325.3 28823.3 98304 32768
10/6/15 11:14 AM 5.3 88 5231.7 28822.8 98304 32768
10/6/15 11:15 AM 5.2 88 5157.4 28821.8 98304 32768
10/6/15 11:16 AM 5.1 88 5054 28820.6 98304 32768
10/6/15 11:17 AM 5.1 88 5017.2 28820.5 98304 32768
10/6/15 11:18 AM 5 88 4928.5 28819.5 98304 32768
10/6/15 11:19 AM 5.1 88 5049.6 28821.2 98304 32768
10/6/15 11:20 AM 5.1 88 5015.3 28820.4 98304 32768
10/6/15 11:21 AM 5 88 4935.5 28819.6 98304 32768
10/6/15 11:22 AM 4.9 87.9 4815.3 28818.4 98304 32768
10/6/15 11:23 AM 4.7 87.9 4594.5 28816 98304 32768
10/6/15 11:24 AM 4.6 87.9 4528.6 28815.4 98304 32768
10/6/15 11:25 AM 4.6 87.9 4550.3 28814.9 98304 32768
10/6/15 11:26 AM 4.6 87.9 4486.5 28814.3 98304 32768
10/6/15 11:27 AM 4.4 87.9 4361.3 28813.4 98304 32768
10/6/15 11:28 AM 4.4 87.9 4287 28812.4 98304 32768
10/6/15 11:29 AM 4.3 87.9 4251.7 28811.8 98304 32768
10/6/15 11:30 AM 4.3 87.9 4187.3 28811 98304 32768
10/6/15 11:31 AM 4.2 87.9 4097.9 28810.4 98304 32768
10/6/15 11:32 AM 4.2 87.9 4101.4 28809.9 98304 32768
10/6/15 11:33 AM 4.1 87.9 4062.2 28809.6 98304 32768
10/6/15 11:34 AM 4 87.9 3960.2 28808.4 98304 32768
10/6/15 11:35 AM 3.9 87.9 3858.5 28806.8 98304 32768
10/6/15 11:36 AM 3.9 87.9 3790.8 28805.9 98304 32768
10/6/15 11:37 AM 3.8 87.9 3774.6 28805.7 98304 32768
10/6/15 11:38 AM 3.6 87.9 3586.9 28805.1 98304 32768
10/6/15 11:39 AM 3.9 87.9 3838.9 28804.7 98304 32768
10/6/15 11:40 AM 3.8 87.9 3763 28804.7 98304 32768
10/6/15 11:41 AM 3.7 87.9 3661.5 28804 98304 32768
10/6/15 11:42 AM 3.8 88.1 3763 28864.2 98304 32768
10/6/15 11:43 AM 3.8 88.1 3753.1 28864.3 98304 32768
10/6/15 11:44 AM 3.7 88.1 3676.4 28864.3 98304 32768
10/6/15 11:45 AM 3.5 88.1 3405.6 28864.3 98304 32768
10/6/15 11:46 AM 3.5 88.1 3484.2 28864.4 98304 32768
10/6/15 11:47 AM 3.5 88.1 3477 28864.5 98304 32768
10/6/15 11:48 AM 3.6 88.1 3570.6 28864.5 98304 32768
10/6/15 11:49 AM 4.7 88.1 4646.2 28868.9 98304 32768
10/6/15 11:50 AM 34.5 98 33959.3 32123.8 98304 32768

LOG FILE
————–
Filename=alert_zsiims1.log
See the following:

Tue Oct 6 02:34:52 2015
Thread 1 advanced to log sequence 37413 (LGWR switch)
Current log# 1 seq# 37413 mem# 0: +V5DATA/zsiims/onlinelog/group_1.9844.799569609
Tue Oct 6 10:50:38 2015
ALTER SYSTEM ARCHIVE LOG
Tue Oct 6 10:50:39 2015
Thread 1 advanced to log sequence 37414 (LGWR switch)
Current log# 3 seq# 37414 mem# 0: +V5DATA/zsiims/onlinelog/group_3.9851.799569423
Tue Oct 6 11:21:49 2015
Errors in file /oracle/admin/zsiims/bdump/zsiims1_j001_10027156.trc:
ORA-12012: error on auto execute of job 54375
ORA-27468: “EXFSYS.RLM$SCHDNEGACTION” is locked by another process
Tue Oct 6 11:23:07 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=142
System State dumped to trace file /oracle/admin/zsiims/udump/zsiims1_ora_20971728.trc
Tue Oct 6 11:25:43 2015
Errors in file /oracle/admin/zsiims/bdump/zsiims1_j002_6815818.trc:
ORA-12012: error on auto execute of job 54374
ORA-27468: “EXFSYS.RLM$EVTCLEANUP” is locked by another process
Tue Oct 6 11:26:57 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=28
System State dumped to trace file /oracle/admin/zsiims/bdump/zsiims1_smon_15335466.trc
Tue Oct 6 11:35:26 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=193
System State dumped to trace file /oracle/admin/zsiims/udump/zsiims1_ora_32309462.trc
Tue Oct 6 11:41:27 2015
Shutting down instance: further logons disabled

LOG FILE
————–
Filename=alert_zsiims2.log
See the following:
Tue Oct 6 10:50:40 2015
Thread 2 advanced to log sequence 25153 (LGWR switch)
Current log# 5 seq# 25153 mem# 0: +V5DATA/zsiims/onlinelog/group_5.9853.799569371
Tue Oct 6 11:45:06 2015
IPC Send timeout detected.Sender: ospid 7798804
Receiver: inst 1 binc 4 ospid 6881304
Tue Oct 6 11:45:08 2015
Trace dumping is performing id=[cdmp_20151006114508]
Tue Oct 6 11:48:58 2015
Shutting down instance (abort)
License high water mark = 450
Instance terminated by USER, pid = 20578736
Tue Oct 6 11:49:50 2015
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0

Filename=zsiims1_ora_20971728A.trc ===> 发生问题后产生的第一个systemstate dump, 在Oct 6 11:23:07AM产生的,此时数据库的状况应该已经很糟糕了
See the following:

50几个session在等待latch: row cache objects

33: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=45472
37: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=18549
68: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=13886
71: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=46595
81: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=48525
106: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11855
118: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11183
122: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8913
132: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=12194
140: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12643
149: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12173
154: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=15022
156: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12143
160: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=20615
173: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11470
174: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11668
177: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11249
183: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=10725
184: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11485
200: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9970
204: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11019
207: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8356
208: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8811
215: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8914
219: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10169
230: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9156
241: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=7911
244: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8785
249: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=17542
255: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9209
264: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9247
276: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8368
288: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8211
310: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=7413
317: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=17869
331: waiting for ‘latch: row cache objects'[Rcache object=700000418a9adf8,] seq=5309
352: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2625
353: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10183
361: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5948
364: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5303
366: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=5300
378: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=760
381: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4788
394: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4758
401: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=109
424: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2924
429: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=3120
432: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2426
454: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1861
457: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1579
463: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1603
468: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1016
470: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1076
486: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=501
576: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=16368
606: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=30065

还有很多session在等待row cache lock

28: waiting for ‘row cache lock’ [Rcache object=70000064a573ef8,] seq=40578
40: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=59768
93: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2938
98: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2777
193: waiting for ‘row cache lock’ [Rcache object=700000621252470,] seq=966
199: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=107
214: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3501
220: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=86
232: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1148
239: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=12734
252: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2908
259: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1134
260: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=65
266: waiting for ‘row cache lock’ [Rcache object=700000550af2690,] seq=1791
268: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1288
274: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=821
278: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1357
287: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=637
289: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
293: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3363
302: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
319: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=98
320: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=366
323: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2618
327: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=167
328: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1137
330: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=31
332: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=53
333: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
334: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=368
337: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=202
341: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=56
342: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=18
343: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
345: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=359
349: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=562
350: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=24
351: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=407
358: waiting for ‘row cache lock’ [Rcache object=70000078b3d6290,] seq=28
359: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
363: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
365: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
371: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
373: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
375: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=89
379: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=605
383: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=523
385: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=595
387: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=243
390: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
391: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=355
393: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=20
395: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
397: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1080
404: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
409: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
410: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=60
411: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1101
413: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=95
414: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=948
416: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=315
417: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=516
419: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=102
420: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=607
421: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=831
426: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=42
433: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=58
437: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=421
442: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=44
445: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=185
447: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=35
450: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=55
456: waiting for ‘row cache lock’ [Rcache object=700000789bc8cc8,] seq=349
459: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=127
465: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=234
474: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=52
475: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=25
478: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=41
490: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=49
610: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=15070

分析发生问题时候的systemstate dump trace zsiims1_ora_20971728A.trc:

50几个session在等待latch: row cache objects, 大部分是和700000768fd8ea8有关的

33: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=45472
37: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=18549
68: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=13886
71: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=46595
81: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=48525
106: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11855
118: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11183
122: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8913
132: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=12194
140: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12643
149: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12173
154: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=15022
156: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12143
160: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=20615
173: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11470
174: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11668
177: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11249
183: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=10725
184: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11485
200: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9970
204: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11019
207: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8356
208: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8811
215: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8914
219: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10169
230: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9156
241: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=7911
244: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8785
249: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=17542
255: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9209
264: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9247
276: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8368
288: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8211
310: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=7413
317: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=17869
331: waiting for ‘latch: row cache objects'[Rcache object=700000418a9adf8,] seq=5309
352: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2625
353: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10183
361: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5948
364: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5303
366: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=5300
378: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=760
381: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4788
394: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4758
401: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=109
424: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2924
429: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=3120
432: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2426
454: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1861
457: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1579
463: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1603
468: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1016
470: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1076
486: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=501
576: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=16368
606: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=30065

以其中一个比较典型的process为例:

PROCESS 576:
—————————————-
SO: 70000078326f638, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=576, calls cur/top: 70000039983fd08/70000039983f788, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 109 0 4
last post received-location: kslpsr
last process to post me: 700000786279d80 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: 700000786279d80 1 6
(latch info) wait_event=0 bits=0
Location from where call was made: kghfrunp: clatch: wait:
waiting for 700000768fd8ea8 Child row cache objects level=4 child#=9
Location from where latch is held: kghfrunp: clatch: wait: ===============> 可以看到这个latch是因为kghfrunp相关的功能发起的,而这个功能是需要从shared pool中释放空间有关
Context saved from call: 0 ===========================>kghfrunp: (KGH: Ask client to free unpinned space)
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
207 (5, 1444101815, 2)
174 (5, 1444101815, 2)
322 (5, 1444101815, 2)

还有很多session在等待row cache lock, 最多的对象70000078e9f0d28

28: waiting for ‘row cache lock’ [Rcache object=70000064a573ef8,] seq=40578
40: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=59768
93: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2938
98: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2777
193: waiting for ‘row cache lock’ [Rcache object=700000621252470,] seq=966
199: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=107
214: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3501
220: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=86
232: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1148
239: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=12734
252: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2908
259: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1134
260: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=65
266: waiting for ‘row cache lock’ [Rcache object=700000550af2690,] seq=1791
268: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1288
274: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=821
278: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1357
287: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=637
289: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
293: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3363
302: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
319: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=98
320: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=366
323: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2618
327: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=167
328: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1137
330: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=31
332: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=53
333: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
334: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=368
337: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=202
341: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=56
342: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=18
343: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
345: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=359
349: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=562
350: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=24
351: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=407
358: waiting for ‘row cache lock’ [Rcache object=70000078b3d6290,] seq=28
359: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
363: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
365: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
371: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
373: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
375: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=89
379: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=605
383: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=523
385: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=595
387: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=243
390: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
391: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=355
393: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=20
395: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
397: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1080
404: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
409: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
410: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=60
411: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1101
413: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=95
414: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=948
416: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=315
417: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=516
419: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=102
420: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=607
421: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=831
426: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=42
433: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=58
437: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=421
442: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=44
445: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=185
447: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=35
450: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=55
456: waiting for ‘row cache lock’ [Rcache object=700000789bc8cc8,] seq=349
459: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=127
465: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=234
474: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=52
475: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=25
478: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=41
490: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=49
610: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=15070

以其中的process 610为例:

PROCESS 610:
—————————————-
SO: 700000784271270, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=610, calls cur/top: 700000387cb3b58/700000387cb35d8, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 109 0 4
last post received-location: kslpsr
last process to post me: 700000786279d80 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: 700000786279d80 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 7000007852bdfc8
O/S info: user: oracle, term: UNKNOWN, ospid: 39321988
OSD pid info: Unix process pid: 39321988, image: oracle@zsinsraca1
Short stack dump:
ksdxfstk+002c< -ksdxcb+04e4<-sspuser+0074<-000047DC<-sskgpwwait+0034<-skgpwwait+00bc<-ksliwat+06c0<-kslwaitns_timed+0024<-kskthbwt+022c<-kslwait+00f4<-kqrigt+05e0<-kqrpre1+0960<-kqrpre+001c<-ktusmous_online_undoseg+05f4<-ktusmaus_add_us+0300<-ktubnd+04ac<-ktuchg+055c<-ktbchg2+01f0<-kdtchg+0524<-kdtwrp+2540<-kdtInsRow+0a20<-insrow+043c<-insdrv+0428<-inscovexe+02b8<-insExecStmtExecIniEngine+005c<-insexe+02f8<-opiexe+2738<-opipls+185c<-opiodr+0ae0<-rpidrus+01bc<-skgmstack+00c8<-rpidru+0088<-rpiswu2+034c<-rpidrv+095c<-psddr0+02bc<-psdnal+01d0<-pevm_EXECC+01f8<-pfrinstr_EXECC+0070<-pfrrun_no_tool+005c<-pfrrun+1014<-plsql_run+06b4<-pricar+042c<-pricbr+0258<-prient2+03a8<-prient+0b0c<-kkxrpc+0250<-kporpc+01d4<-opiodr+0ae0<-ttcpip+1020<-opitsk+1124<-opiino+0990<-opiodr+0ae0<-opidrv+0484<-sou2o+0090<-opimai_real+01bc<-main+0098<-__start+0070 ---------------------------------------- SO: 70000078f2c84a0, type: 4, owner: 700000784271270, flag: INIT/-/-/0x00 (session) sid: 236 trans: 7000007619fcf08, creator: 700000784271270, flag: (100041) USR/- BSY/-/-/-/-/- DID: 0001-025F-00004ADB, short-term DID: 0001-0262-00000005 txn branch: 700000760262cd0 oct: 2, prv: 0, sql: 70000043a39dda8, psql: 7000006feac35a8, user: 88/ZSCLMINTF service name: zsiims O/S info: user: oracle, term: , ospid: 10027038, machine: zsinsraca2 program: oracle@zsinsraca2 (TNS V1-V3) application name: oracle@zsinsraca2 (TNS V1-V3), hash value=450405937 waiting for 'row cache lock' blocking sess=0x0 seq=15070 wait_time=0 seconds since wait started=1605 cache id=3, mode=0, request=3 ============> cache id为3, 即dc_rollback_segments上的争用
Dumping Session Wait History
for ‘row cache lock’ count=1 wait_time=1388976
cache id=3, mode=0, request=3

此时我们并不能简单的认为,这个问题和rollback/undo有关
因为此时LCK0进程也在等待和kghfrunp(释放shared pool中内存)有关的latch,而在RAC环境里, row cache lock/library cache lock这样的instance lock的获取和释放必须经过LCK0进程,如果LCK0异常,那么就会发生row cache lock的问题.

PROCESS 33:
—————————————-
SO: 70000078627a560, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=33, calls cur/top: 7000007853a1520/7000007853a1520, flag: (6) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 33
last post received-location: ksrpublish
last process to post me: 700000782261488 12 0
last post sent: 0 0 21
last post sent-location: ksbria
last process posted by me: 70000078627a560 1 6
(latch info) wait_event=0 bits=0
Location from where call was made: kqrbip:
waiting for 70000075f1cd020 Child row cache objects level=4 child#=16
Location from where latch is held: kghfrunp: clatch: nowait:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
33 (3, 3, 3)
431 (3, 1444101789, 3)
486 (3, 1444101789, 3)
492 (3, 1444101789, 3)
468 (3, 1444101789, 3)
321 (3, 1444101789, 3)
470 (3, 1444101789, 3)
66 (3, 1444101789, 3)
276 (3, 1444101789, 3)
388 (3, 1444101789, 3)
207 (3, 1444101789, 3)
205 (3, 1444101789, 3)
443 (3, 1444101789, 3)
waiter count=13
gotten 1588533458 times wait, failed first 46002888 sleeps 3053343
gotten 2849791 times nowait, failed: 4267742
possible holder pid = 75 ospid=32440736
on wait list for 70000075f1cd020
Process Group: DEFAULT, pseudo proc: 7000007852bdfc8
O/S info: user: oracle, term: UNKNOWN, ospid: 6881304
OSD pid info: Unix process pid: 6881304, image: oracle@zsinsraca1 (LCK0)
Short stack dump:
ksdxfstk+002c< -ksdxcb+04e4<-sspuser+0074<-000047DC<-sskgpwwait+0034<-skgpwwait+00bc<-kslges+054c<-kslgetl+033c<-kqrbip+03c8<-kqrbfr+01ec<-kqrbtm+00c8<-ksbcti+03c8<-ksbabs+03fc<-kclabs+00d8<-ksbrdp+04b4<-opirip+03fc<-opidrv+0458<-sou2o+0090<-opimai_real+0150<-main+0098<-__start+0070 ---------------------------------------- SO: 70000078f366290, type: 4, owner: 70000078627a560, flag: INIT/-/-/0x00 (session) sid: 1062 trans: 0, creator: 70000078627a560, flag: (51) USR/- BSY/-/-/-/-/- DID: 0000-0021-00000003, short-term DID: 0000-0000-00000000 txn branch: 0 oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS service name: SYS$BACKGROUND waiting for 'latch: row cache objects' blocking sess=0x0 seq=45472 wait_time=0 seconds since wait started=0 address=70000075f1cd020, number=c8, tries=1 Dumping Session Wait History for 'latch: row cache objects' count=1 wait_time=906249 address=70000075f1cd020, number=c8, tries=0 for 'latch: row cache objects' count=1 wait_time=27503 address=70000075f1cd020, number=c8, tries=2 for 'latch: row cache objects' count=1 wait_time=292982 address=70000075f1cd020, number=c8, tries=1 综合考虑,从现在得到的信息来看,很可能发生问题时shared pool的使用出现了问题,但是因为我们没有更具体的dba_hist_active_sess_history数据,并且并没有在发生问题的第一时间收集systemstate dump,我们可能无法更进一步的确认它了. LOG FILE -------------- Filename=awrrpt_1_25919_25920.html ===> 发生问题前1个小时的AWR report
See the following:

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 25919 06-Oct-15 09:00:26 224 6.6
End Snap: 25920 06-Oct-15 10:00:36 233 6.3
Elapsed: 60.17 (mins)
DB Time: 18.51 (mins)

Cache Sizes

Begin End
Buffer Cache: 21,776M 21,776M Std Block Size: 8K
Shared Pool Size: 8,752M 8,752M Log Buffer: 30,672K

SGA breakdown difference
ordered by Pool, Name
N/A value for Begin MB or End MB indicates the size of that Pool/Name was insignificant, or zero in that snapshot
Pool Name Begin MB End MB % Diff
shared CCursor 302.83 323.98 6.98
shared Cursor Stats 222.80 222.80 0.00
shared KGH: NO ACCESS 6,056.99 6,056.99 0.00 ===========> 将近6个GB是被KGH: NO ACCESS所使用
shared PCursor 252.15 274.83 9.00
shared free memory 5,239.86 4,815.81 -8.09
shared gcs resources 418.68 418.68 0.00
shared gcs shadows 384.49 384.49 0.00
shared kglsim heap 245.99 245.99 0.00
shared kglsim object batch 416.87 416.87 0.00
shared library cache 251.35 258.46 2.83
shared sql area 382.75 764.75 99.80

1. 发生问题前的awr report显示shared pool中大部分的内存都被KGH: NO ACCESS占用
2. 而发生问题后的systemstate dump显示非常多的进程都在等待跟kghfrunp功能有关的latch (LCK0也在等这样的latch, 导致了系统出现严重的row cache enqueue/row cache lock问题)
而这个kghfrunp功能是要从shared pool中释放空间,这说明shared pool当时不够用

8666117.8, patch 8666117 可以优化lck0在shared pool有压力下的工作,可以减缓row cache lock的问题
note 801787.1里 ,有以下的三个方式可以减少 KGH: NO ACCESS的大小:

a). 设置最小的buffer cache及shared pool的值
b). 减少shared pool resize的频率:设置 _memory_broker_stat_interval
c). 打patch 7189722