Skip to content

诊断事件:hang 诊断

2011 0428 /*luda*/
一. trc文件以及日志文件

在$ORACLE_BASE/admin/ORACLE_SID/下面有不同类别的跟踪文件
alert.log文件
system log文件 /var/adm/messages

二. hang situations
.what’s hang?
1.查看CPU的使用情况,作为参考指标,一般hang的cpu的使用率是不高的(至少oracle进程的cpu使用率很低)
2.查看进程是否在等待一些不存在的进程。
hang的诊断办法:

1.system/进程 诊断事件
2.从v$session_wait,v$lock,v$latch,v$latchholder
3.hanganalyze event

(1)
V$SESSION_WAITdisplays the resources or events for which active sessions are waiting.

在诊断数据库hang的时候,v$session_wait可以提供数据库级别有用的信息
从v$session_wait字段可以找到有用的信息

sid                  session id(v$session)
seq#                 sequeue number of the wait for this session
event                the event that the session is waiting for just finished for waiting for
wait_time            the time wait for the session
second_in_wait       the approximate time in seconds at the start of the wait state :
                                                                                  
                                                                                   0 – WAITING (the session is currently waiting)

                                                                                   -2 – WAITED UNKNOWN TIME (duration of last wait is unknown)

                                                                                   -1 – WAITED SHORT TIME (last wait <1/100th of a second)

                                                                                   >0 – WAITED KNOWN TIME (WAIT_TIME = duration of last wait)
示例:

SQL> set linesize 2000
SQL> select sid,seq#,event,wait_time,seconds_in_wait from v$session_wait;

       SID       SEQ# EVENT                                                             WAIT_TIME SECONDS_IN_WAIT
———- ———- —————————————————————- ———- —————
       138          1 jobq slave wait                                                           0              25
       149         68 Streams AQ: waiting for time management or cleanup tasks                  0          575891
       150         35 Streams AQ: qmn slave idle wait                                           0           65739
       151          6 Streams AQ: qmn coordinator idle wait                                     0         1863190
       155      43903 rdbms ipc message                                                         0              57
       156      51563 rdbms ipc message                                                         0             208
       159         31 SQL*Net message to client                                                -1               0
       160         10 rdbms ipc message                                                         0          634929
       161       5670 rdbms ipc message                                                         0            1552
       162      59394 rdbms ipc message                                                         0              26
       163         11 rdbms ipc message                                                         0          230558

       SID       SEQ# EVENT                                                             WAIT_TIME SECONDS_IN_WAIT
———- ———- —————————————————————- ———- —————
       164       5051 smon timer                                                                0            3419
       165      62808 rdbms ipc message                                                         0               0
       166      14978 rdbms ipc message                                                         0              21
       167      23334 rdbms ipc message                                                         0              21
       168         20 rdbms ipc message                                                         0           59201
       169      58551 rdbms ipc message                                                         0              24
       170          8 pmon timer                                                                0         1863202

(2)hanganaylyze event
hang诊断事件一般用在数据库hang住或者死锁的情况下
SQL> alter session set events ‘immediate trace name hanganalyze level 4’;

Session altered.

SQL> @gettrc

TRACE_FILE_NAME
—————————————————
/oracle/admin/znjtepp/udump/znjtepp_ora_18886.trc

马上就可以找到跟踪文件在udump目录下的 18886的文件
  –用hanganalyze 监听oracle – 60错误

    在init文件中加入
    event=“60 trace name hanganalyze level 5”
  –oracle debug的使用
  这里handanalyze有6个级别:
  10   全部的进程信息导出,伴随大量的数据,这个一般不被采用
  5    导出全部和等待事件有关的进程
  4    导出等待事件的分支节点
  3    导出被认为是hang的进程
  2    低限度的导出信息
  1    只导出少部分数据
 
loop的诊断办法

系统的诊断dump是非常有帮助于诊断oracle的loop
操作示范:
(1)系统状态转储
SQL> alter session set events ‘immediate trace name systemstate level 10’;

Session altered.
SQL> @gettrc

TRACE_FILE_NAME
——————————————————————————–
/oracle/admin/znjtepp/udump/znjtepp_ora_28508.trc

(2)进程状态转储
SQL> alter session set events ‘immediate trace name processstate level 10’;

Session altered.

SQL> @gettrc

TRACE_FILE_NAME
——————————————————————————–
/oracle/admin/znjtepp/udump/znjtepp_ora_28508.trc

或者

oradebug  setospid <pid>
oradebug dump systemstate 10
下来就是分析trc文件了,当然为了工作便捷,ass.awk可以快速列出当前的等待事件