Skip to content

hang

诊断事件:hang 诊断

2011 0428 /*luda*/
一. trc文件以及日志文件

在$ORACLE_BASE/admin/ORACLE_SID/下面有不同类别的跟踪文件
alert.log文件
system log文件 /var/adm/messages

二. hang situations
.what’s hang?
1.查看CPU的使用情况,作为参考指标,一般hang的cpu的使用率是不高的(至少oracle进程的cpu使用率很低)
2.查看进程是否在等待一些不存在的进程。
hang的诊断办法:

1.system/进程 诊断事件
2.从v$session_wait,v$lock,v$latch,v$latchholder
3.hanganalyze event

(1)
V$SESSION_WAITdisplays the resources or events for which active sessions are waiting.

在诊断数据库hang的时候,v$session_wait可以提供数据库级别有用的信息
从v$session_wait字段可以找到有用的信息

sid                  session id(v$session)
seq#                 sequeue number of the wait for this session
event                the event that the session is waiting for just finished for waiting for
wait_time            the time wait for the session
second_in_wait       the approximate time in seconds at the start of the wait state :
                                                                                  
                                                                                   0 – WAITING (the session is currently waiting)

                                                                                   -2 – WAITED UNKNOWN TIME (duration of last wait is unknown)

                                                                                   -1 – WAITED SHORT TIME (last wait <1/100th of a second)

                                                                                   >0 – WAITED KNOWN TIME (WAIT_TIME = duration of last wait)
示例:

SQL> set linesize 2000
SQL> select sid,seq#,event,wait_time,seconds_in_wait from v$session_wait;

       SID       SEQ# EVENT                                                             WAIT_TIME SECONDS_IN_WAIT
———- ———- —————————————————————- ———- —————
       138          1 jobq slave wait                                                           0              25
       149         68 Streams AQ: waiting for time management or cleanup tasks                  0          575891
       150         35 Streams AQ: qmn slave idle wait                                           0           65739
       151          6 Streams AQ: qmn coordinator idle wait                                     0         1863190
       155      43903 rdbms ipc message                                                         0              57
       156      51563 rdbms ipc message                                                         0             208
       159         31 SQL*Net message to client                                                -1               0
       160         10 rdbms ipc message                                                         0          634929
       161       5670 rdbms ipc message                                                         0            1552
       162      59394 rdbms ipc message                                                         0              26
       163         11 rdbms ipc message                                                         0          230558

       SID       SEQ# EVENT                                                             WAIT_TIME SECONDS_IN_WAIT
———- ———- —————————————————————- ———- —————
       164       5051 smon timer                                                                0            3419
       165      62808 rdbms ipc message                                                         0               0
       166      14978 rdbms ipc message                                                         0              21
       167      23334 rdbms ipc message                                                         0              21
       168         20 rdbms ipc message                                                         0           59201
       169      58551 rdbms ipc message                                                         0              24
       170          8 pmon timer                                                                0         1863202

(2)hanganaylyze event
hang诊断事件一般用在数据库hang住或者死锁的情况下
SQL> alter session set events ‘immediate trace name hanganalyze level 4’;

Session altered.

SQL> @gettrc

TRACE_FILE_NAME
—————————————————
/oracle/admin/znjtepp/udump/znjtepp_ora_18886.trc

马上就可以找到跟踪文件在udump目录下的 18886的文件
  –用hanganalyze 监听oracle – 60错误

    在init文件中加入
    event=“60 trace name hanganalyze level 5”
  –oracle debug的使用
  这里handanalyze有6个级别:
  10   全部的进程信息导出,伴随大量的数据,这个一般不被采用
  5    导出全部和等待事件有关的进程
  4    导出等待事件的分支节点
  3    导出被认为是hang的进程
  2    低限度的导出信息
  1    只导出少部分数据
 
loop的诊断办法

系统的诊断dump是非常有帮助于诊断oracle的loop
操作示范:
(1)系统状态转储
SQL> alter session set events ‘immediate trace name systemstate level 10’;

Session altered.
SQL> @gettrc

TRACE_FILE_NAME
——————————————————————————–
/oracle/admin/znjtepp/udump/znjtepp_ora_28508.trc

(2)进程状态转储
SQL> alter session set events ‘immediate trace name processstate level 10’;

Session altered.

SQL> @gettrc

TRACE_FILE_NAME
——————————————————————————–
/oracle/admin/znjtepp/udump/znjtepp_ora_28508.trc

或者

oradebug  setospid <pid>
oradebug dump systemstate 10
下来就是分析trc文件了,当然为了工作便捷,ass.awk可以快速列出当前的等待事件

BUG:shutdown immediate (hung住)

ora 10g 10201

BUG1:shutdown immediate 时系统hung住,经metalink查询:

Verify that temporary segments are decreasing
———————————————
To verify that the temporary segments are decreasing have an active session
available in Server Manager or SQLPLUS during the SHUTDOWN IMMEDIATE. Issue the following
query to ensure the database is not hanging, but is actually perform extent
cleanup:

SVRMGR/SQL> select count(block#) from fet$;
COUNT(BLOC
———-
7

SVRMGR/SQL> select count(block#) from uet$;
COUNT(BLOC
———-
402

After some time has elapsed, reissue the query and see that the values for fet$
have increased while the values or uet$ have decreased:

SVRMGR/SQL> select count(block#) from fet$;
COUNT(BLOC
———-
10

SVRMGR/SQL> select count(block#) from uet$;
COUNT(BLOC
———-
399

During shutdown the SMON process is cleaning up extents and updating the data
dictionary tables with the marked free extents. As the extents are marked as
freed, they are removed from the table for used extents, UET$ and placed on the
table for free extents, FET$.

How to Avoid creating many Temporary Extents
——————————————–
Once the database has shutdown cleanly, to avoid creating many temporary
extents change the initial and next extent sizes on temporary tablespaces
to a more appropriate size:

ALTER TABLESPACE DEFAULT STORAGE (INITIAL M/K NEXT M/K);

Note: If the temporary tablespace is of type TEMPORARY, then this change
will only affect temporary segments created after issuing the above
command. Any existing temporary segments already in the TEMPORARY tablespace
will not be affected till the instance is restarted. On shutdown, existing
temporary segments are dropped. If the TEMPORARY TABLESPACE is of type
PERMANENT, then cleanup is performed by SMON after completion of the process
using it.

Increasing the initial and next extent size will decrease the number of extents
that are allocated to temporary segments. Since there are fewer extents to
deallocate, the database should shutdown more speedily.

Take the following scenario:

A database was subject to large sorts with the following sort parameter in
the “init.ora” file:

– sort_area_size=1000000

The temporary tablespaces for this database were all created with initial and
next extents sized at 50k and the total database size was about 300mb.

Database sorts will utilize memory as much as possible based on the “init.ora”
parameter “sort_area_size”. Once this memory-based sort area is filled, the
database will utilize the temporary table space associated with the database
user to complete the sort operation. During a shutdown normal, the database
will attempt to clean up the temporary tablespaces.

If a small extent size is used, then a large number of extents will be created
for a large sort. The cleanup of the temporary tablespace takes much longer
with a large number of extents.

Note:
=====
You have to do a shutdown abort and then bring the database
back up to run the suggested queries.

For other reasons for slow/hung shutdown see also these notes:

Note 375935.1 – What To Do and Not To Do When ‘shutdown immediate’ Hangs
Note 428688.1 – Bug:5057695: Shutdown Immediate Very Slow To Close Database.

References:
===========Note:61997.1 SMON – Temporary Segment Cleanup and Free Space Coalescing

Search Words:
=============
hanging
shutdown