诊断事件：hang 诊断

2011 0428 /*luda*/
一. trc文件以及日志文件

在$ORACLE_BASE/admin/ORACLE_SID/下面有不同类别的跟踪文件
alert.log文件
system log文件 /var/adm/messages

二. hang situations
.what’s hang？
1.查看CPU的使用情况，作为参考指标，一般hang的cpu的使用率是不高的（至少oracle进程的cpu使用率很低）
2.查看进程是否在等待一些不存在的进程。
hang的诊断办法：

1.system/进程诊断事件
2.从v$session_wait,v$lock,v$latch,v$latchholder
3.hanganalyze event

(1)
V$SESSION_WAITdisplays the resources or events for which active sessions are waiting.

在诊断数据库hang的时候，v$session_wait可以提供数据库级别有用的信息
从v$session_wait字段可以找到有用的信息

sid                  session id（v$session）
seq#                 sequeue number of the wait for this session
event                the event that the session is waiting for just finished for waiting for
wait_time            the time wait for the session
second_in_wait       the approximate time in seconds at the start of the wait state :

                                                                                   0 – WAITING (the session is currently waiting)

-2 – WAITED UNKNOWN TIME (duration of last wait is unknown)

-1 – WAITED SHORT TIME (last wait <1/100th of a second)

>0 – WAITED KNOWN TIME (WAIT_TIME = duration of last wait)
示例：

SQL> set linesize 2000
SQL> select sid,seq#,event,wait_time,seconds_in_wait from v$session_wait;

       SID       SEQ# EVENT                                                             WAIT_TIME SECONDS_IN_WAIT
———- ———- —————————————————————- ———- —————
       138          1 jobq slave wait                                                           0              25
       149         68 Streams AQ: waiting for time management or cleanup tasks                  0          575891
       150         35 Streams AQ: qmn slave idle wait                                           0           65739
       151          6 Streams AQ: qmn coordinator idle wait                                     0         1863190
       155      43903 rdbms ipc message                                                         0              57
       156      51563 rdbms ipc message                                                         0             208
       159         31 SQL*Net message to client                                                -1               0
       160         10 rdbms ipc message                                                         0          634929
       161       5670 rdbms ipc message                                                         0            1552
       162      59394 rdbms ipc message                                                         0              26
       163         11 rdbms ipc message                                                         0          230558

       SID       SEQ# EVENT                                                             WAIT_TIME SECONDS_IN_WAIT
———- ———- —————————————————————- ———- —————
       164       5051 smon timer                                                                0            3419
       165      62808 rdbms ipc message                                                         0               0
       166      14978 rdbms ipc message                                                         0              21
       167      23334 rdbms ipc message                                                         0              21
       168         20 rdbms ipc message                                                         0           59201
       169      58551 rdbms ipc message                                                         0              24
       170          8 pmon timer                                                                0         1863202

（2）hanganaylyze event
hang诊断事件一般用在数据库hang住或者死锁的情况下
SQL> alter session set events ‘immediate trace name hanganalyze level 4’;

Session altered.

SQL> @gettrc

TRACE_FILE_NAME
—————————————————
/oracle/admin/znjtepp/udump/znjtepp_ora_18886.trc

马上就可以找到跟踪文件在udump目录下的 18886的文件
–用hanganalyze 监听oracle – 60错误

    在init文件中加入
    event=“60 trace name hanganalyze level 5”
–oracle debug的使用
这里handanalyze有6个级别：
10   全部的进程信息导出，伴随大量的数据，这个一般不被采用
5    导出全部和等待事件有关的进程
4    导出等待事件的分支节点
3    导出被认为是hang的进程
2    低限度的导出信息
1    只导出少部分数据

loop的诊断办法

系统的诊断dump是非常有帮助于诊断oracle的loop
操作示范：
(1)系统状态转储
SQL> alter session set events ‘immediate trace name systemstate level 10’;

Session altered.
SQL> @gettrc

TRACE_FILE_NAME
——————————————————————————–
/oracle/admin/znjtepp/udump/znjtepp_ora_28508.trc

（2）进程状态转储
SQL> alter session set events ‘immediate trace name processstate level 10’;

Session altered.

SQL> @gettrc

TRACE_FILE_NAME
——————————————————————————–
/oracle/admin/znjtepp/udump/znjtepp_ora_28508.trc

或者

oradebug setospid <pid>
oradebug dump systemstate 10
下来就是分析trc文件了，当然为了工作便捷，ass.awk可以快速列出当前的等待事件

Oracle 恢复工具 Mdata 5.0.1 版本发布

近期文章

分类目录

扫码关注微信公众号:Oracle运维那些事获取定期发布的数据库运维的有趣事情!

近期活动

Oracle 恢复工具 Mdata 5.0.1 版本发布

近期文章

分类目录

扫码关注微信公众号:Oracle运维那些事 获取定期发布的数据库运维的有趣事情!

近期活动

扫码关注微信公众号:Oracle运维那些事获取定期发布的数据库运维的有趣事情!