概述

某客户于2021/06/07凌晨01点43分左右，出现应用程序无法连接，前台业务无法办理等情况。在此情况下代维科技工程师在接到故障支持电话后，及时进行响应，通过远程的方式接入数据库进行故障排除，快速解决了问题使得业务恢复正常使用。

问题及相关日志分析

1. 数据库告警日志分析分析

通过对数据库告警日志查看，节点1在01:22:14出现ORA-29970错误，LMD进程无法响应，在01:22:29被LMHB进程将实例终止，实例终止后自动启动。节点2在01:29:31被LMS0进程将实例终止，实例终止后自动启动。

1节点：

Mon Jun 07 01:22:14 2021

LMD0 (ospid: 10613) has not called a wait for sub 0 secs.

LMD1 (ospid: 10617) has not called a wait for sub 0 secs.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmhb_10637.trc (incident=688375) (PDBNAME=CDB$ROOT):

ORA-29770: global enqueue process LMD0 (OSID 10613) is hung for more than 70 seconds

Incident details in: /u01/app/oracle/diag/rdbms/cdb/cdb1/incident/incdir_688375/cdb1_lmhb_10637_i688375.trc

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmhb_10637.trc (incident=688376) (PDBNAME=CDB$ROOT):

ORA-29770: global enqueue process LMD1 (OSID 10617) is hung for more than 70 seconds

Incident details in: /u01/app/oracle/diag/rdbms/cdb/cdb1/incident/incdir_688376/cdb1_lmhb_10637_i688376.trc

LOCK_DBGRP: GCR_SYSTEST debug event locked group GR+DB_CDB by memno 0

ERROR: Some process(s) is not making progress.

LMHB (ospid: 10637) is terminating the instance.

Please check LMHB trace file for more details.

Please also check the CPU load, I/O load and other system properties for anomalous behavior

ERROR: Some process(s) is not making progress.

LMHB (ospid: 10637): terminating the instance due to error 29770

Mon Jun 07 01:22:24 2021

System state dump requested by (instance=1, osid=10637 (LMHB)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_diag_10575_20210607012224.trc

Mon Jun 07 01:22:26 2021

License high water mark = 591

Mon Jun 07 01:22:29 2021

Instance terminated by LMHB, pid = 10637

Mon Jun 07 01:22:29 2021

USER (ospid: 21601): terminating the instance

Mon Jun 07 01:22:29 2021

Instance terminated by USER, pid = 21601

Mon Jun 07 01:22:32 2021

Starting ORACLE instance (normal) (OS id: 21660)

Mon Jun 07 01:22:32 2021

CLI notifier numLatches:37 maxDescs:3986

Mon Jun 07 01:22:32 2021

**********************************************************************

Mon Jun 07 01:22:32 2021

Dump of system resources acquired for SHARED GLOBAL AREA (SGA)

Mon Jun 07 01:22:32 2021

Per process system memlock (soft) limit = UNLIMITED

Mon Jun 07 01:22:32 2021

Expected per process system memlock (soft) limit to lock

SHARED GLOBAL AREA (SGA) into memory: 128G

Mon Jun 07 01:22:32 2021

Available system pagesizes:

4K, 2048K

Mon Jun 07 01:22:32 2021

Supported system pagesize(s):

Mon Jun 07 01:22:32 2021

PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s)

Mon Jun 07 01:22:32 2021

2048K 66823 65538 65538 NONE

Mon Jun 07 01:22:32 2021

Reason for not supporting certain system pagesizes:

Mon Jun 07 01:22:32 2021

4K – Large pagesizes only

Mon Jun 07 01:22:32 2021

**********************************************************************

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Initial number of CPU is 24

2节点：

Mon Jun 07 01:25:05 2021

Set master node info

Mon Jun 07 01:26:05 2021

Auto-tuning: Shutting down background process GTXb

Mon Jun 07 01:27:53 2021

IPC Send timeout detected. Sender: ospid 62152 [oracle@node1 (PING)]

Receiver: inst 1 binc 912546309 ospid 10589

Mon Jun 07 01:29:25 2021

LMD0 (ospid: 62168) received an instance eviction notification from instance 1 [2]

Mon Jun 07 01:29:26 2021

Received an instance abort message from instance 1

Mon Jun 07 01:29:26 2021

Received an instance abort message from instance 1

Mon Jun 07 01:29:26 2021

Please check instance 1 alert and LMON trace files for detail.

Mon Jun 07 01:29:26 2021

Please check instance 1 alert and LMON trace files for detail.

Mon Jun 07 01:29:26 2021

LMS0 (ospid: 62192): terminating the instance due to error 481

Mon Jun 07 01:29:26 2021

System state dump requested by (instance=2, osid=62192 (LMS0)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_diag_62123_20210607012926.trc

Mon Jun 07 01:29:26 2021

ORA-1092 : opitsk aborting process

Mon Jun 07 01:29:27 2021

License high water mark = 1251

Mon Jun 07 01:29:31 2021

Instance terminated by LMS0, pid = 62192

Mon Jun 07 01:29:31 2021

USER (ospid: 85262): terminating the instance

Mon Jun 07 01:29:31 2021

Instance terminated by USER, pid = 85262

Mon Jun 07 01:29:33 2021

Starting ORACLE instance (normal) (OS id: 85397)

Mon Jun 07 01:29:33 2021

CLI notifier numLatches:37 maxDescs:3986

Mon Jun 07 01:29:33 2021

**********************************************************************

Mon Jun 07 01:29:33 2021

Dump of system resources acquired for SHARED GLOBAL AREA (SGA)

Mon Jun 07 01:29:33 2021

Per process system memlock (soft) limit = UNLIMITED

Mon Jun 07 01:29:33 2021

Expected per process system memlock (soft) limit to lock

SHARED GLOBAL AREA (SGA) into memory: 128G

Mon Jun 07 01:29:33 2021

Available system pagesizes:

4K, 2048K

Mon Jun 07 01:29:33 2021

Supported system pagesize(s):

Mon Jun 07 01:29:33 2021

PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s)

Mon Jun 07 01:29:33 2021

2048K 67200 65538 65538 NONE

Mon Jun 07 01:29:33 2021

Reason for not supporting certain system pagesizes:

Mon Jun 07 01:29:33 2021

4K – Large pagesizes only

Mon Jun 07 01:29:33 2021

**********************************************************************

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Initial number of CPU is 24

查看更多日志，发现在关闭之前，出现了较多的ORA-04031错误。

Mon Jun 07 01:18:14 2021

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14795.trc (incident=691727) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^1069″,”kglseshtTable”)