Skip to content

oracle 9i的一个小故障

客户在中午11点20分打电话非常紧急的说到,李工,快看看数据库,不知道为什么客户端登录不上去,看了数据库服务都很闲
第一反应就是数据库可能陷入死循环了,骂了一声,最近没少碰到数据库死循环的。
远程直接登录sqlplus~


[oracle@node1 21:56:26|~]sqlplus '/as sysdba'

SQL*Plus: Release 9.2.0.7.0 - Production on Fri May 11 ****** 2012

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 9.2.0.7.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL&

发现数据库可以登录,想想是不是监听网络有问题,逐测试一下服务名

[oracle@node1 22:11:52|/oracle/product/1020/network/admin]tnsping ecgolo

TNS Ping Utility for Linux: Version 9.2.0.7.0 - Production on 11-MAY-2012 ******

Copyright (c) 1997, 2004, Oracle. All rights reserved.

Used parameter files:

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521)) (CONNECT_DATA = (server = dedicated) (SID_NAME=ecgolo)))
OK (240 msec)

tnsping验证oracle的网络设置也没问题,那到底是哪里有问题?

测试一拔
sqlplus test/test@ecgolo

[oracle@node1 21:56:26|~]sqlplus test/test@egcolo

SQL*Plus: Release 9.2.0.7.0 - Production on Fri May 11 ******** 2012

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

发现登录时候挂在这里无法登录进实例,可以确实是oracle引起的问题,尝试关闭数据库,数据库关闭过程卡住,ok既然确定数据库有问题先看告警,结果发现
Fri May 11 11:28:32 2012
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=6
Fri May 11 11:50:56 2012
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 342

从上面可以看到在关闭数据库shudown immediate命令已经执行,但是同事也有相关的告警

>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=6

这个告警不可忽略,如果告警中出现row cache 队列锁那么往往造成的问题都是不可忽视的哦,那么这里的话出现了这个错误
很多人联想到的都是row cache,数据字典池出问题了,没错,我们的知识往往能把我们很快的定位到相关的地方,但是这里
我要说的是,判断一个故障…知识大多只能起到辅助作用,由于此库是此集团全国几十家五星酒店的业务数据库,一旦hang住
所有的入住手续既无法办理,所以在碰到这个故障时候我首先的态度还是以恢复业务为目的,发现这个告警可以再对近期的告警
进行排查,一般此告警都有伴随性的提示,结果真的发现如下:
Fri May 11 06:31:08 2012
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: ALTER DATABASE OPEN
Fri May 11 09:44:36 2012
Thread 1 advanced to log sequence 79384
Current log# 1 seq# 79384 mem# 0: /oadb/oradata/ecology/redo01.log
Fri May 11 09:45:35 2012
Thread 1 advanced to log sequence 79385
Current log# 3 seq# 79385 mem# 0: /oadb/oradata/ecology/redo03.log
Fri May 11 09:46:41 2012
ORACLE Instance ecology - Can not allocate log, archival required
Fri May 11 09:46:41 2012
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 79386
All online logs needed archivinga
Current log# 3 seq# 79385 mem# 0: /oadb/oradata/ecology/redo03.log

我想此时问题有点头绪了,客户重启了一次数据库,因为正在搬迁机房,因为是9i的数据库,在重启的过程中并没有修改log_archive_start为
TRUE,造成数据库在启动后运行一段时间之后就无法运行了,因为在线日志组都已经写满了,而且arch进程并不会自动去归档在线日志组,当切
换到一个轮回后就hang住了,具体内部的切换的涉及到的sql修改的内部表这里就不做描述了,我们直接来验证是否如我所判断的一样:

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_start boolean FALSE
SQL>

果然不出大头所料,修改之,并强制关闭数据库再次启动,尝试切换归档日志,成功。

ARC1: Evaluating archive log 2 thread 1 sequence 79386
ARC1: Beginning to archive log 2 thread 1 sequence 79386
Creating archive destination LOG_ARCHIVE_DEST_1: '/oadbarc/1_79386.dbf'
Fri May 11 12:08:30 2012
Successfully onlined Undo Tablespace 1.
Fri May 11 12:08:30 2012
SMON: enabling tx recovery
Fri May 11 12:08:30 2012
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open
Fri May 11 12:08:32 2012
ARC0: Completed archiving log 3 thread 1 sequence 79385
Fri May 11 12:08:32 2012
ARC1: Completed archiving log 2 thread 1 sequence 79386
Fri May 11 12:08:32 2012
SMON: Parallel transaction recovery tried
Fri May 11 12:08:40 2012
Thread 1 advanced to log sequence 79388
Current log# 3 seq# 79388 mem# 0: /oadb/oradata/ecology/redo03.log
Fri May 11 12:08:40 2012
ARC0: Evaluating archive log 1 thread 1 sequence 79387
ARC0: Beginning to archive log 1 thread 1 sequence 79387
Creating archive destination LOG_ARCHIVE_DEST_1: '/oadbarc/1_79387.dbf'
ARC0: Completed archiving log 1 thread 1 sequence 79387
Fri May 11 12:08:57 2012
Thread 1 advanced to log sequence 79389
Current log# 2 seq# 79389 mem# 0: /oadb/oradata/ecology/redo02.log
Fri May 11 12:08:57 2012
ARC0: Evaluating archive log 3 thread 1 sequence 79388
ARC0: Beginning to archive log 3 thread 1 sequence 79388
Creating archive destination LOG_ARCHIVE_DEST_1: '/oadbarc/1_79388.dbf'
ARC0: Completed archiving log 3 thread 1 sequence 79388