Skip to content

概述

医院数据库监控平台显示连接HIS数据库1节点异常,通过远程的方式接入数据库进行故障排除,快速解决了问题使得业务恢复正常使用。

问题及相关日志分析

1. 检查监听

通过2021/05/18的巡检报告中发现监听中存在1.80的IP地址,而此次检查发现监听中无1.80的IP地址。

 

2. 检查集群状态

检查集群状态,发现集群中多个资源状态显示OFFLINE。

[grid@cxhisdb02 ~]$ crsctl stat res -t

——————————————————————————–

NAME           TARGET  STATE        SERVER                   STATE_DETAILS

——————————————————————————–

Local Resources

——————————————————————————–

ora.DATA.dg

ONLINE  OFFLINE      cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.LISTENER.lsnr

ONLINE  OFFLINE      cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.OCR.dg

ONLINE  OFFLINE      cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.SSD.dg

ONLINE  OFFLINE      cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.asm

ONLINE  OFFLINE      cxhisdb01

ONLINE  ONLINE       cxhisdb02                Started

ora.gsd

OFFLINE OFFLINE      cxhisdb01

OFFLINE OFFLINE      cxhisdb02

ora.net1.network

ONLINE  OFFLINE      cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.ons

ONLINE  OFFLINE      cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.registry.acfs

ONLINE  OFFLINE      cxhisdb01

ONLINE  ONLINE       cxhisdb02

——————————————————————————–

Cluster Resources

——————————————————————————–

ora.LISTENER_SCAN1.lsnr

1        ONLINE  ONLINE       cxhisdb02

ora.cvu

1        ONLINE  OFFLINE

ora.cxhisdb01.vip

1        ONLINE  OFFLINE

ora.cxhisdb02.vip

1        ONLINE  ONLINE       cxhisdb02

ora.hospital.db

1        ONLINE  OFFLINE

2        ONLINE  ONLINE       cxhisdb02                Open

ora.oc4j

1        ONLINE  OFFLINE

ora.scan1.vip

1        ONLINE  ONLINE       cxhisdb02

 

 

3. 检查集群日志

仅有CRSD服务异常,其他集群资源均正常,因此数据库未宕机。

2021-05-26 13:23:46.059:

[crsd(145215)]CRS-1006:The OCR location +OCR is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:23:46.068:

[crsd(145215)]CRS-1006:The OCR location +OCR is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:23:56.293:

[/u01/app/11.2.0/grid/bin/oraagent.bin(66885)]CRS-5822:Agent ‘/u01/app/11.2.0/grid/bin/oraagent_grid’ disconnected from server. Details at (:CRSAGF00117:) {0:21:18} in

/u01/app/11.2.0/grid/log/cxhisdb01/agent/crsd/oraagent_grid/oraagent_grid.log.

2021-05-26 13:23:56.294:

[/u01/app/11.2.0/grid/bin/oraagent.bin(31320)]CRS-5822:Agent ‘/u01/app/11.2.0/grid/bin/oraagent_oracle’ disconnected from server. Details at (:CRSAGF00117:) {0:19:5060

3} in /u01/app/11.2.0/grid/log/cxhisdb01/agent/crsd/oraagent_oracle/oraagent_oracle.log.

2021-05-26 13:23:56.461:

[/u01/app/11.2.0/grid/bin/orarootagent.bin(145347)]CRS-5822:Agent ‘/u01/app/11.2.0/grid/bin/orarootagent_root’ disconnected from server. Details at (:CRSAGF00117:) {0:

5:1568} in /u01/app/11.2.0/grid/log/cxhisdb01/agent/crsd/orarootagent_root/orarootagent_root.log.

2021-05-26 13:23:56.485:

[/u01/app/11.2.0/grid/bin/scriptagent.bin(145549)]CRS-5822:Agent ‘/u01/app/11.2.0/grid/bin/scriptagent_grid’ disconnected from server. Details at (:CRSAGF00117:) {0:9:

68} in /u01/app/11.2.0/grid/log/cxhisdb01/agent/crsd/scriptagent_grid/scriptagent_grid.log.

2021-05-26 13:23:56.651:

[ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’.

2021-05-26 13:23:58.540:

[crsd(5795)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:23:58.548:

[crsd(5795)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage

]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:23:58.964:

[ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’.

2021-05-26 13:24:00.374:

[crsd(5834)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:00.382:

[crsd(5834)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage

]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:01.010:

[ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’.

2021-05-26 13:24:02.447:

[crsd(5886)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:02.455:

[crsd(5886)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage

]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:03.068:

[ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’.

2021-05-26 13:24:04.457:

[crsd(5909)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:04.465:

[crsd(5909)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage

]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:05.102:

[ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’.

2021-05-26 13:24:06.492:

[crsd(5937)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:06.501:

[crsd(5937)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage

]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:07.132:

[ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’.

2021-05-26 13:24:08.517:

[crsd(5986)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:08.525:

[crsd(5986)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage

]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:09.162:

[ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’.

2021-05-26 13:24:10.544:

[crsd(6015)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:10.552:

[crsd(6015)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage

]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:11.193:

[ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’.

2021-05-26 13:24:12.581:

[crsd(6051)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:12.589:

[crsd(6051)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage

]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:13.223:

[ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’.

2021-05-26 13:24:14.614:

[crsd(6070)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:14.622:

[crsd(6070)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage

]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:15.253:

[ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’.

2021-05-26 13:24:16.643:

[crsd(6090)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:16.650:

[crsd(6090)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage

]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:24:17.284:

[ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’.

2021-05-26 13:24:17.284:

[ohasd(144192)]CRS-2771:Maximum restart attempts reached for resource ‘ora.crsd’; will not restart.

2021-05-26 13:24:17.315:

[ohasd(144192)]CRS-2769:Unable to failover resource ‘ora.crsd’.

 

4. 检查ASM日志

ASM日志中2021/05/26 12:19:57显示OCR仲裁盘有问题,13:23出现读写错误。

Wed May 26 12:19:57 2021

WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 2 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 3 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 4 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 2 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 3 in group 3.

WARNING: Waited 15 secs for write IO to PST disk 4 in group 3.

Wed May 26 12:19:57 2021

NOTE: process _b000_+asm1 (160488) initiating offline of disk 0.1409468596 (OCR_0000) with mask 0x7e in group 3

NOTE: process _b000_+asm1 (160488) initiating offline of disk 2.1409468594 (OCR_0002) with mask 0x7e in group 3

NOTE: process _b000_+asm1 (160488) initiating offline of disk 3.1409468595 (OCR_0003) with mask 0x7e in group 3

NOTE: process _b000_+asm1 (160488) initiating offline of disk 4.1409468592 (OCR_0004) with mask 0x7e in group 3

NOTE: checking PST: grp = 3

GMON checking disk modes for group 3 at 15 for pid 46, osid 160488

ERROR: no read quorum in group: required 3, found 1 disks

NOTE: checking PST for grp 3 done.

NOTE: initiating PST update: grp = 3, dsk = 0/0x5402c8b4, mask = 0x6a, op = clear

NOTE: initiating PST update: grp = 3, dsk = 2/0x5402c8b2, mask = 0x6a, op = clear

NOTE: initiating PST update: grp = 3, dsk = 3/0x5402c8b3, mask = 0x6a, op = clear

NOTE: initiating PST update: grp = 3, dsk = 4/0x5402c8b0, mask = 0x6a, op = clear

GMON updating disk modes for group 3 at 16 for pid 46, osid 160488

ERROR: no read quorum in group: required 3, found 1 disks

Wed May 26 12:19:57 2021

NOTE: cache dismounting (not clean) group 3/0xA242386E (OCR)

NOTE: messaging CKPT to quiesce pins Unix process pid: 160495, image: oracle@cxhisdb01 (B001)

Wed May 26 12:19:57 2021

NOTE: halting all I/Os to diskgroup 3 (OCR)

Wed May 26 12:19:57 2021

NOTE: LGWR doing non-clean dismount of group 3 (OCR)

NOTE: LGWR sync ABA=15.85 last written ABA 15.85

WARNING: Offline for disk OCR_0000 in mode 0x7f failed.

WARNING: Offline for disk OCR_0002 in mode 0x7f failed.

WARNING: Offline for disk OCR_0003 in mode 0x7f failed.

WARNING: Offline for disk OCR_0004 in mode 0x7f failed.

Wed May 26 12:19:58 2021

kjbdomdet send to inst 2

detach from dom 3, sending detach message to inst 2

Wed May 26 12:19:58 2021

NOTE: No asm libraries found in the system

Wed May 26 12:19:58 2021

List of instances:

1 2

Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4)

Global Resource Directory partially frozen for dirty detach

* dirty detach – domain 3 invalid = TRUE

2 GCS resources traversed, 0 cancelled

Dirty Detach Reconfiguration complete

Wed May 26 12:19:58 2021

WARNING: dirty detached from domain 3

NOTE: cache dismounted group 3/0xA242386E (OCR)

 

 

2021-05-26 13:23:46.059:

[crsd(145215)]CRS-1006:The OCR location +OCR is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

2021-05-26 13:23:46.068:

[crsd(145215)]CRS-1006:The OCR location +OCR is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log.

 

 

问题处理步骤

由于数据库仅1节点OCR磁盘组访问异常,2节点以及其他盘访问正常,因而导致1节点CRS资源异常,且数据库未宕机,处理步骤需启动CRS资源,并且重启监听。

1. 1节点挂载OCR磁盘组

[root@cxhisdb01 ~]# su – grid

[grid@cxhisdb01 ~]$ sqlplus / as sysasm

 

SQL*Plus: Release 11.2.0.4.0 Production on Thu May 27 12:12:17 2021

 

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

 

 

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production

With the Real Application Clusters and Automatic Storage Management options

 

SQL> alter diskgroup ocr mount;

 

Diskgroup altered.

 

SQL> exit

2. 启动CRS

[grid@cxhisdb01 ~]$ crsctl start res ora.crsd -init

CRS-2672: Attempting to start ‘ora.crsd’ on ‘cxhisdb01’

CRS-2676: Start of ‘ora.crsd’ on ‘cxhisdb01’ succeeded

3. 检查集群资源

[grid@cxhisdb01 ~]$ crsctl stat res -t

——————————————————————————–

NAME           TARGET  STATE        SERVER                   STATE_DETAILS

——————————————————————————–

Local Resources

——————————————————————————–

ora.DATA.dg

ONLINE  ONLINE       cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.LISTENER.lsnr

ONLINE  ONLINE       cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.OCR.dg

ONLINE  ONLINE       cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.SSD.dg

ONLINE  ONLINE       cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.asm

ONLINE  ONLINE       cxhisdb01                Started

ONLINE  ONLINE       cxhisdb02                Started

ora.gsd

OFFLINE OFFLINE      cxhisdb01

OFFLINE OFFLINE      cxhisdb02

ora.net1.network

ONLINE  ONLINE       cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.ons

ONLINE  ONLINE       cxhisdb01

ONLINE  ONLINE       cxhisdb02

ora.registry.acfs

ONLINE  ONLINE       cxhisdb01

ONLINE  ONLINE       cxhisdb02

——————————————————————————–

Cluster Resources

——————————————————————————–

ora.LISTENER_SCAN1.lsnr

1        ONLINE  ONLINE       cxhisdb02

ora.cvu

1        ONLINE  ONLINE       cxhisdb01

ora.cxhisdb01.vip

1        ONLINE  ONLINE       cxhisdb01

ora.cxhisdb02.vip

1        ONLINE  ONLINE       cxhisdb02

ora.hospital.db

1        ONLINE  ONLINE       cxhisdb01                Open

2        ONLINE  ONLINE       cxhisdb02                Open

ora.oc4j

1        ONLINE  ONLINE       cxhisdb01

ora.scan1.vip

1        ONLINE  ONLINE       cxhisdb02

4. 重启监听

[grid@cxhisdb01 ~]$ srvctl stop listener -n cxhisdb01

[grid@cxhisdb01 ~]$ srvctl start listener -n cxhisdb01

[grid@cxhisdb01 ~]$ lsnrctl status

 

LSNRCTL for Linux: Version 11.2.0.4.0 – Production on 27-MAY-2021 12:20:44

 

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))

STATUS of the LISTENER

————————

Alias                     LISTENER

Version                   TNSLSNR for Linux: Version 11.2.0.4.0 – Production

Start Date                27-MAY-2021 12:20:40

Uptime                    0 days 0 hr. 0 min. 4 sec

Trace Level               off

Security                  ON: Local OS Authentication

SNMP                      OFF

Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora

Listener Log File         /u01/app/grid/diag/tnslsnr/cxhisdb01/listener/alert/log.xml

Listening Endpoints Summary…

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.80)(PORT=1521)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.10)(PORT=1521)))

Services Summary…

Service “hospital” has 1 instance(s).

Instance “hospital1”, status READY, has 1 handler(s) for this service…

Service “hospitalXDB” has 1 instance(s).

Instance “hospital1”, status READY, has 1 handler(s) for this service…

The command completed successfully

[grid@cxhisdb01 ~]$ cd /u01/app/grid/diag/tnslsnr/cxhisdb01/listener/trace/

[grid@cxhisdb01 trace]$ tail -f listener.log

27-MAY-2021 12:20:51 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))(SERVICE_NAME=hospital)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.171)(PORT=62045)) * establish * hospital * 0

27-MAY-2021 12:20:51 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=ZY-603300-02-YS)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.9.58)(PORT=3676)) * establish * hospital * 0

27-MAY-2021 12:20:51 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=ZY-603300-02-YS)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.9.58)(PORT=3677)) * establish * hospital * 0

27-MAY-2021 12:20:55 * (CONNECT_DATA=(SERVICE_NAME=HOSPITAL)(CID=(PROGRAM=oracle)(HOST=lis-server)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.11)(PORT=23431)) * establish * HOSPITAL * 0

27-MAY-2021 12:20:55 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=JZ-EK-001)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.3.99)(PORT=4442)) * establish * hospital * 0

27-MAY-2021 12:20:55 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=JZ-EK-001)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.3.99)(PORT=4444)) * establish * hospital * 0

27-MAY-2021 12:20:56 * (CONNECT_DATA=(SID=hospital1)(CID=(PROGRAM=配置数据库.exe)(HOST=YYJQZZJFW)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.244)(PORT=58397)) * establish * hospital1 * 0

27-MAY-2021 12:20:56 * (CONNECT_DATA=(SERVICE_NAME=HOSPITAL)(CID=(PROGRAM=oracle)(HOST=lis-server)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.11)(PORT=23432)) * establish * HOSPITAL * 0

27-MAY-2021 12:20:56 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=ZY-603300-02-YS)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.9.58)(PORT=3681)) * establish * hospital * 0

27-MAY-2021 12:20:57 * service_update * hospital1 * 0

27-MAY-2021 12:20:58 * (CONNECT_DATA=(SERVICE_NAME=HOSPITAL)(CID=(PROGRAM=oracle)(HOST=lis-server)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.11)(PORT=23438)) * establish * HOSPITAL * 0

27-MAY-2021 12:20:59 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=JZ-EK-001)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.3.99)(PORT=4460)) * establish * hospital * 0

27-MAY-2021 12:21:00 * service_update * hospital1 * 0

Thu May 27 12:21:01 2021

27-MAY-2021 12:21:01 * (CONNECT_DATA=(SID=hospital1)(CID=(PROGRAM=配置数据库.exe)(HOST=YYJQZZJFW)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.244)(PORT=58398)) * establish * hospital1 * 0

27-MAY-2021 12:21:03 * service_update * hospital1 * 0

27-MAY-2021 12:21:03 * (CONNECT_DATA=(SERVICE_NAME=HOSPITAL)(CID=(PROGRAM=oracle)(HOST=lis-server)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.11)(PORT=23444)) * establish * HOSPITAL * 0

 

总结与后续处理建议

1. 问题总结

HIS数据库1节点访问OCR仲裁盘异常导致CRSD异常,进而引起1节点上多个集群资源offline,监听资源中相较于以前也少了192.2.1.80的IP,因此监控平台显示连接HIS数据库1节点异常,由于2节点和其他磁盘组无异常,因此数据库未发生宕机现象。

2. 处理操作

  • HIS数据库1节点手工mount OCR磁盘组
  • 启动CRS资源
  • 1节点重启监听资源

3. 后续建议

此次故障发生是由于集群资源访问OCR磁盘组异常导致,建议联系存储工程师,排查相关时间点存储运行情况,并定期做好存储巡检和状态监控工作。

Oracle voting disk 故障处理一例

概述

某医院客户于2021年11月5日上午9时左右出现HIS数据库连接异常的情况,在此情况下通过远程的方式接入数据库进行故障排除,快速解决了问题使得业务恢复正常使用。

问题及相关日志分析

1. 问题现状

前台业务反映HIS数据库时而连得上时而连不上,连不上时候报错ORA-12516,报错内容截图如下:

 

由于业务通过节点1的vip(192.2.1.10)连接数据库,测试节点1的vip连接,结果如下:

[oracle@cxhisdb01 ~]$ sqlplus a/b@192.2.1.10:1521/hospital

 

SQL*Plus: Release 11.2.0.4.0 Production on Fri Nov 5 09:45:15 2021

 

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

 

ERROR:

ORA-12516: TNS:listener could not find available handler with matching protocol

stack

 

2. 检查数据库alert日志

ORA-00020: maximum number of processes (4500) exceeded

ORA-20 errors will not be written to the alert log for

the next minute. Please look at trace files to see all

the ORA-20 errors.

Fri Nov 05 09:35:48 2021

Thread 1 advanced to log sequence 191580 (LGWR switch)

Current log# 1 seq# 191580 mem# 0: +SSD/hospital/onlinelog/redo01.log

Fri Nov 05 09:35:48 2021

LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_4

Fri Nov 05 09:35:48 2021

LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_3

Fri Nov 05 09:35:48 2021

LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_2

Fri Nov 05 09:35:48 2021

Archived Log entry 904316 added for thread 1 sequence 191579 ID 0x431aef8a dest 1:

Fri Nov 05 09:37:30 2021

ORA-00020: maximum number of processes (4500) exceeded

ORA-20 errors will not be written to the alert log for

the next minute. Please look at trace files to see all

the ORA-20 errors.

alert中出现大量的ORA-00020错误告警,数据库连接数达到4500上限。

3. 检查数据库参数和连接

SQL> show parameter process;

 

NAME                                 TYPE        VALUE

———————————— ———– ——————————

aq_tm_processes                      integer     1

cell_offload_processing              boolean     TRUE

db_writer_processes                  integer     10

gcs_server_processes                 integer     4

global_txn_processes                 integer     1

job_queue_processes                  integer     1000

log_archive_max_processes            integer     4

processes                            integer     4500

processor_group_name                 string

 

SQL> select inst_id,count(*) from gv$session group by inst_id;

 

INST_ID   COUNT(*)

———- ———-

1       4496

2        715

查询得知,1节点连接数达到4496,已经达到连接数4500的上限,因此前端业务出现连接不上的情况。

4. 定位异常连接

SQL> select program,machine,count(*)

2  from gv$session group by program,machine,sql_id order by 3;

 

 

PROGRAM                                          MACHINE       COUNT(*)

——————————– ————————–  ———-

w3wp.exe                         WORKGROUP\SXZZ_SERVER_CX            10

Medinfo.Quartz.DbJob.exe         WORKGROUP\QUANGUO-NB                10

yszj.exe                         WORKGROUP\MZ-3F-EBHYS7              10

yszj.exe                         WORKGROUP\MZ-2F-NK2YS4              10

yszj.exe                         WORKGROUP\MZ1F-EKYS7                10

emrproject.exe                   WORKGROUP\ZY1-2X-YS2                10

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-WKYS9               11

yszj.exe                         WORKGROUP\MZ-2F-FKYS13              11

emrjz.exe                        WORKGROUP\JZ-QJS-HS1                11

yzxt.exe                         WORKGROUP\ZY-604200-01-HS           11

yszj.exe                         WORKGROUP\MZ-2F-GKYS3               12

yszj.exe                         WORKGROUP\MZ-3F-KQYS6               12

yzxt.exe                         WORKGROUP\ZY-609400-02-HS           12

HisServer.exe                    WORKGROUP\YYGHSERV                  12

yszj.exe                         WORKGROUP\GRK-GBMZ-01               12

oracle@lis-server (TNS V1-V3)    lis-server                          12

yszj.exe                         WORKGROUP\MZ-2F-WKYS10              13

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-1F-EKYS6               13

yszj.exe                         WORKGROUP\MZ-NKJ-01                 13

yzxt.exe                         WORKGROUP\ZY-ZHNK1-HS-04            13

yszj.exe                         WORKGROUP\MZ-3F-YKYS2               13

yszj.exe                         WORKGROUP\M3-YK-YS1                 14

yszj.exe                         WORKGROUP\MZ-2F-FKYS-01             14

yszj.exe                         WORKGROUP\MZ-3F-YKYS4               14

yszj.exe                         WORKGROUP\MZ-2F-FKYS10              14

yszj.exe                         WORKGROUP\MZ-3F-TJWKYS1             15

yszj.exe                         WORKGROUP\MZ-3F-WKYS5               15

MAF.exe                          WORKGROUP\QUYUSC1                   16

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-3F-WKYS3               16

yzxt.exe                         WORKGROUP\HIS-CXYY                  16

yszj.exe                         WORKGROUP\MZ-2F-NK2YS05             16

yszj.exe                         WORKGROUP\MZ-2F-WKYS5               17

yszj.exe                         WORKGROUP\MZ-3F-WKYS4               17

yszj.exe                         WORKGROUP\MZ-3F-KQKYS2              17

w3wp.exe                         WORKGROUP\WX_DDJY                   17

MediII.Adapter.WinForm.Scanner_V WORKGROUP\YYQZ                      18

yszj.exe                         WORKGROUP\MZ-1F-EKYS4               18

yszj.exe                         WORKGROUP\MZ-2F-NK2YS6              18

emrproject.exe                   WORKGROUP\3020-CXYY                 19

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-GK24                19

yszj.exe                         WORKGROUP\ZY-609400-02-HS           19

yszj.exe                         WORKGROUP\MZ-GK-25                  20

emrproject.exe                   WORKGROUP\HIS-PC                    20

yszj.exe                         WORKGROUP\ZY-609400-02-YS           21

yszj.exe                         WORKGROUP\MZ-GUKE-YS04              21

yszj.exe                         WORKGROUP\MZ-2F-FKYS9               22

yszj.exe                         MSHOME\JZ-WKYS2                     22

yszj.exe                         WORKGROUP\MZ-2F-NK2YS-01            22

w3wp.exe                         WORKGROUP\YINYIQIANZHI1             23

yszj.exe                         WORKGROUP\MZ-2F-WKYS4               23

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-GKYS2               23

yszj.exe                         WORKGROUP\MZ-2F-FKYS6               24

yszj.exe                         WORKGROUP\MZ-2F-WKYS6               25

yszj.exe                         WORKGROUP\MZ-XEK7                   25

yszj.exe                         WORKGROUP\MZ-3F-EBYHYS2             25

yszj.exe                         WORKGROUP\MZ-1F-EKYS8               26

yszj.exe                         WORKGROUP\MZ-3F-ZYYS-8              26

yszj.exe                         WORKGROUP\MZ-2F-FKYS3               27

yszj.exe                         WORKGROUP\MZ-2F-WKYS2               27

yszj.exe                         WORKGROUP\MZ-2F-WKYS3               28

yszj.exe                         WORKGROUP\MZ-2F-NK2YS2              28

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-NK2-YS09            29

oracle@cxemrdb01 (TNS V1-V3)     cxemrdb01                           30

yszj.exe                         WORKGROUP\MZ-2F-NK2-9               30

yszj.exe                         WORKGROUP\MZ-3F-ZYYS2               30

yszj.exe                         WORKGROUP\MZ-3F-EBHYS6              30

yszj.exe                         WORKGROUP\MZ-2F-GKYS21              31

yszj.exe                         WORKGROUP\MZ-WAIKE-YS12             33

yzxt.exe                         WORKGROUP\ZY-7D-HS3                 33

yszj.exe                         WORKGROUP\MZ-2F-FKYS5               34

yszj.exe                         WORKGROUP\MZ-2F-NK1YS3              34

yszj.exe                         WORKGROUP\DESKTOP-VI7O6SC           36

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-WKYS1               38

yszj.exe                         WORKGROUP\MZ-NK2-07                 39

yszj.exe                         WORKGROUP\MZ-2F-FKYS11              42

yszj.exe                         WORKGROUP\MZ-2F-NK1YS5              43

yszj.exe                         WORKGROUP\JZ-EK-001                 45

yszj.exe                         WORKGROUP\LHMZ_YS2                  46

yszj.exe                         WORKGROUP\MZ-2F-NK1YS2              50

w3wp.exe                         WORKGROUP\HIS-YIYAO                 50

yszj.exe                         WORKGROUP\MZ-2F-WKYS11              56

yszj.exe                         WORKGROUP\LHMZ_YS1                  56

yszj.exe                         WORKGROUP\MZ-2F-NK1YS14             58

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-NK1YS13             62

yszj.exe                         WORKGROUP\JZ-NK-2                   64

yszj.exe                         WORKGROUP\MZ-2F-WKYS007             66

yszj.exe                         WORKGROUP\MZ-2F-FKYS4               66

yszj.exe                         WORKGROUP\MZ-2F-NKYS7               69

yszj.exe                         WORKGROUP\MZ-2F-NK1YS4             103

yszj.exe                         WORKGROUP\MZ-PAOZHEN-01            108

oracle@lis-server (TNS V1-V3)    lis-server                         136

oracle@cxemrdb01 (TNS V1-V3)     cxemrdb01                          395

查询得知,yszj.exe应用程序对HIS数据库建立了较高数量的连接,其中主机名为“MZ-2F”开头的客户端尤为明显。

 

 

 

 

 

 

 

 

 

 

 

 

三、问题处理步骤

1. kill session

查询yszj.exe应用程序的进程ID

SQL> select b.spid

2  from v$session a,v$process b

3  where a.paddr=b.addr

4  and a.program=’yszj.exe’;

5  and a.machine like ‘MZ-2F%’;

 

SPID

————————

70593

72017

76189

77305

78409

78481

78636

78676

79541

80183

80816

杀会话

[oracle@cxhisdb01 ~]$ kill -9 70593

[oracle@cxhisdb01 ~]$ kill -9 72017

[oracle@cxhisdb01 ~]$ kill -9 76189

[oracle@cxhisdb01 ~]$ kill -9 77305

[oracle@cxhisdb01 ~]$ kill -9 78409

[oracle@cxhisdb01 ~]$ kill -9 78481

[oracle@cxhisdb01 ~]$ kill -9 78636

[oracle@cxhisdb01 ~]$ kill -9 78676

[oracle@cxhisdb01 ~]$ kill -9 79541

[oracle@cxhisdb01 ~]$ kill -9 80183

[oracle@cxhisdb01 ~]$ kill -9 80816

执行kill session后,效果不明显,因为会话查杀后,又有新的会话连接进来。

2. 客户沟通

通过与客户沟通了解到,YZSJ.EXE程序新上了检查报告互认功能,会导致程序连接不释放问题,导致程序到数据库连接数不断增加,从而数据库连接数达到上限。

3. 释放进程

客户安排前台业务程序主机重启,释放连接进程,业务端主机重启后,检查连接情况如下:

SQL> select inst_id,count(*) from gv$session group by inst_id;

 

INST_ID   COUNT(*)

———- ———-

1       3427

2        681

 

SQL> select program,machine,count(*)

2  from gv$session  where program=’yszj.exe’ group by program,machine order by 3;

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\XZ-YGK-07                     1

yszj.exe        WORKGROUP\YJ-NKJ-MZFT                   1

yszj.exe        WORKGROUP\MZ-3F-WKYS4                   2

yszj.exe        WORKGROUP\RLS                           2

yszj.exe        WORKGROUP\MZ-3F-KQKYS2                  2

yszj.exe        WORKGROUP\MZ-2F-GKYS3                   2

yszj.exe        WORKGROUP\MZ-FWT-01                     2

yszj.exe        WORKGROUP\FDHJ54365                     2

yszj.exe        WORKGROUP\MZ-2F-WKYS2                   2

yszj.exe        WORKGROUP\MZ-3F-KQK7                    2

yszj.exe        WORKGROUP\MZ-2F-WKYS9                   2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-NK1-YS6                 2

yszj.exe        WORKGROUP\MZ-2F-FKYS9                   2

yszj.exe        WORKGROUP\MZ-2F-GKYS2                   2

yszj.exe        WORKGROUP\MZ-2F-WKYS6                   2

yszj.exe        WORKGROUP\CXYY-YZ2                      2

yszj.exe        WORKGROUP\MZ-2F-FKYS13                  2

yszj.exe        WORKGROUP\MZ-KQK-YS-1                   2

yszj.exe        WORKGROUP\MZ-2F-NK1YS13                 2

yszj.exe        WORKGROUP\ZY-XYJHZX-01-YS               2

yszj.exe        WORKGROUP\MZ-2F-WKYS8                   2

yszj.exe        WORKGROUP\MZ-2F-FKYS4                   2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-NK1YS4                  2

yszj.exe        WORKGROUP\MZ-2F-WKYS3                   2

yszj.exe        WORKGROUP\MZ-2F-NKYS7                   2

yszj.exe        WORKGROUP\MZ-3F-WKYS5                   2

yszj.exe        WORKGROUP\MZ-2F-WKYS007                 2

yszj.exe        WORKGROUP\GAMZ-01                       2

yszj.exe        WORKGROUP\MZ-WJ3                        2

yszj.exe        WORKGROUP\MZ-3F-PFYS3                   2

yszj.exe        WORKGROUP\ZY-609400-03-HS               2

yszj.exe        WORKGROUP\MZ-2F-WKYS1                   2

yszj.exe        WORKGROUP\ZY1-4X-YS5                    2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-FKYS7                   2

yszj.exe        WORKGROUP\MZ-3F-TJWKYS1                 2

yszj.exe        WORKGROUP\ZY-601700-05-YS               2

yszj.exe        WORKGROUP\MZ-PICC-02                    2

yszj.exe        WORKGROUP\JZ-EK-001                     2

yszj.exe        WORKGROUP\FSK-SSZX-02                   2

yszj.exe        WORKGROUP\ZY1-5D-YS4                    2

yszj.exe        WORKGROUP\MZ-3F-KFYJT                   2

yszj.exe        WORKGROUP\ZY-XSR-YS5                    2

yszj.exe        WORKGROUP\MZ-ZY-YS9                     2

yszj.exe        WORKGROUP\MZ-3F-EBYHYS2                 2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-FKYS15                  2

yszj.exe        WORKGROUP\MZ-2F-FK12                    2

yszj.exe        WORKGROUP\MZ-2F-NK1YS5                  2

yszj.exe        WORKGROUP\MZ-WK-SKZK                    2

yszj.exe        WORKGROUP\MZ-2F-NK1YS-17                2

yszj.exe        WORKGROUP\MZ-2F-FKYS10                  2

yszj.exe        WORKGROUP\MZ-2F-GK24                    2

yszj.exe        WORKGROUP\MZ-2F-NK2YS10                 2

yszj.exe        WORKGROUP\MZ-KOUQIANG-Y11               2

yszj.exe        WORKGROUP\JZ-EK-YS-06                   2

yszj.exe        WORKGROUP\MZ-NKJ-01                     2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-NK2-9                   2

yszj.exe        WORKGROUP\MZ-3F-ZYYS2                   2

yszj.exe        WORKGROUP\MZ-3F-EBHYS6                  2

yszj.exe        MSHOME\JZ-QJ-04                         2

yszj.exe        WORKGROUP\MZ-3F-YKYS2                   2

yszj.exe        WORKGROUP\MZ-3F-ZY-YS3-                 2

yszj.exe        WORKGROUP\MZ-2F-NK1YS2                  2

yszj.exe        WORKGROUP\MZ-KOUQIANG-Y13               2

yszj.exe        WORKGROUP\MZ-3F-ZYYS10                  2

yszj.exe        WORKGROUP\JZ-EBH-01                     2

yszj.exe        WORKGROUP\JZ-GK-YS1                     2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-NK1-YS-017                 2

yszj.exe        WORKGROUP\MZ1F-EKYS7                    2

yszj.exe        WORKGROUP\JZ-NK-2                       2

yszj.exe        WORKGROUP\MZ-2F-FKYS11                  2

yszj.exe        WORKGROUP\MZ-2F-FKYS6                   2

yszj.exe        WORKGROUP\MZ-2F-FKYS-01                 2

yszj.exe        WORKGROUP\MZ-2F-NK1YS3                  2

yszj.exe        WORKGROUP\ZY2-11F-YS2                   2

yszj.exe        WORKGROUP\MZ-WAIKE-YS12                 2

yszj.exe        WORKGROUP\MZ-2F-NK2-YS09                2

yszj.exe        WORKGROUP\MZ-FWT-03                     2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-NK2YS6                  2

yszj.exe        WORKGROUP\MZ-1F-EKYS6                   2

yszj.exe        WORKGROUP\JZ_XIONGTONG                  2

yszj.exe        WORKGROUP\MZ-3F-KQYS6                   2

yszj.exe        WORKGROUP\MZ-3F-YK13                    2

yszj.exe        WORKGROUP\ZY-TXBQ-YYK                   2

yszj.exe        WORKGROUP\LXZL-205                      2

yszj.exe        WORKGROUP\MZ-NK-11-YS                   2

yszj.exe        WORKGROUP\MZ-3F-PFK-2                   2

yszj.exe        WORKGROUP\MZ-NK-YS8                     2

yszj.exe        WORKGROUP\MZ-2F-NK2YS2                  2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-3F-KQKYS4                  2

yszj.exe        WORKGROUP\MZ-2F-NK2YS-01                2

yszj.exe        WORKGROUP\MZ-2F-NK1YS14                 2

yszj.exe        WORKGROUP\MZ-3F-ZYYS1                   2

yszj.exe        WORKGROUP\FRMZ-YS-03                    2

yszj.exe        WORKGROUP\MZ-NKZJ-01                    2

yszj.exe        WORKGROUP\MZ-2F-NK2YS3                  2

yszj.exe        WORKGROUP\MZ-2F-NK2YS05                 2

yszj.exe        WORKGROUP\MZ-XEK7                       2

yszj.exe        WORKGROUP\MZ-ZJNK-YS8-A                 2

yszj.exe        WORKGROUP\MZ-ZHFW-05                    2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-FKYS3                   2

yszj.exe        WORKGROUP\ZY2-6F-ZLNKYS01               2

yszj.exe        WORKGROUP\MZ-FWT-04                     2

yszj.exe        WORKGROUP\FRMZ_YS2                      2

yszj.exe        WORKGROUP\MZ-3F-WKYS3                   2

yszj.exe        WORKGROUP\JZ-QJS-YS3                    2

yszj.exe        WORKGROUP\ZY-703700-05-YS               2

yszj.exe        WORKGROUP\TSZX-02                       3

yszj.exe        WORKGROUP\MZ-3F-KQXFS1                  3

yszj.exe        WORKGROUP\DESKTOP-VI7O6SC               3

yszj.exe        WORKGROUP\ZY-600400-06-YS               3

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\FSK-DPS-08                    4

yszj.exe        WORKGROUP\MZ-3F-ZYYS5                   4

yszj.exe        MSHOME\JZ-WKYS01                        4

yszj.exe        WORKGROUP\SF-YWS-YS1                    5

yszj.exe        WORKGROUP\MZ-3F-ZYYS4                   5

yszj.exe        WORKGROUP\ZY1-1X-YS                     5

yszj.exe        WorkGroup\USER-2021SXCVV                5

yszj.exe        WORKGROUP\MZ2-NJZX-001                  5

yszj.exe        WORKGROUP\ZY-GK3-HS4                    6

yszj.exe        WORKGROUP\MZ-GB-CDMZ                    6

yszj.exe        WORKGROUP\MZ-2F-NK2YS11                 8

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-3L-ZYNK-007                9

yszj.exe        WORKGROUP\MZ-3F-ZJK1                   11

yszj.exe        WORKGROUP\MZ-3F-KQKYS5                 14

yszj.exe        WORKGROUP\HIS-PC                       15

yszj.exe        WORKGROUP\MZ-2F-WKYS10                 16

yszj.exe        WORKGROUP\GRK-GBMZ-01                  16

yszj.exe        WORKGROUP\MZ-PIFU-YS3                  18

yszj.exe        WORKGROUP\ZY-609400-02-YS              21

yszj.exe        WORKGROUP\MZ-3F-KQKYS3                 22

yszj.exe        WORKGROUP\MZ-PAOZHEN-01                23

yszj.exe        WORKGROUP\MZ-GUKE-YS04                 23

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-FR-EK-1                   24

yszj.exe        WORKGROUP\ZY-609400-02-HS              25

yszj.exe        WORKGROUP\MZ-3F-PFKYS24                25

yszj.exe        WORKGROUP\MZ-2F-WKYS5                  34

yszj.exe        WORKGROUP\MZ-3F-YKYS4                  34

yszj.exe        WORKGROUP\MZ-2F-WKYS4                  35

yszj.exe        WORKGROUP\MZ-1F-EKYS8                  35

yszj.exe        MSHOME\JZ-WKYS2                        35

yszj.exe        WORKGROUP\MZ-3F-ZYYS-8                 37

yszj.exe        WORKGROUP\MZ-1F-EKYS4                  45

yszj.exe        WORKGROUP\LHMZ_YS2                     64

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\LHMZ_YS1                     78

yszj.exe        WORKGROUP\MZ-2F-WKYS11                 87

 

145 rows selected.

通过查询得知,前端业务主机重启后,释放了一部分进程,节点1连接数已经下降至3400+,前端业务连接正常,未发现ORA-12516错误。

 

 

总结与后续处理建议

1. 问题总结

YZSJ.EXE应用程序通过192.2.1.10(节点1的VIP)连接HIS数据库,由于YZSJ.EXE程序新上了检查报告互认功能,有程序连接不释放问题,导致程序到数据库连接数不断增加,数据库连接数(4500)达到上限,从而导致前端业务连接数据库失败,出现ORA-12516:TNS:监听程序找不到符合协议堆栈要求的可用处理程序。

2. 处理操作

  • 查杀会话,释放进程资源 ==>效果不明显,会话杀掉后程序会自动重连
  • 重启业务前端主机,释放资源 ==>暂时缓解,数据库连接数已经下降至3400+
  • 修复exe应用程序  ==>彻底解决,预计中午业务研发部门对yszj.exe做修复

3. 后续建议

此次故障发生的根本原因是由于YZSJ.EXE程序更新了检查报告互认功能导致的程序连接不释放导致数据库连接数达到上限,从而导致前端业务无法连接数据库,针对此次发生的故障,建议点如下:

  • 业务测试,业务更新上线前,在开发库或测试环境做充分的业务功能性测试和性能测试;
  • 调整数据库连接数参数,当前数据库的process参数为4500,可根据实际需要适当调整增大数据库的process参数,调整此参数需要重启数据库生效,因此需要安排停机时间。

 

Oracle 连接异常处理故障一例

概述

某医院与2021/10/22上午10点48分左右,出现应用程序无法响应,前台业务无法办理等情况。通过远程的方式接入数据库进行故障排除,快速解决了问题使得业务恢复正常使用。

问题及相关日志分析

1. 查看数据库日志

Fri Oct 22 10:27:29 2021

Errors in file /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/trace/emrdb1_ora_98357.trc  (incident=677598):

ORA-04031: ?T·¨·??? 4000 ×??úμ?12?í?ú′? (“shared pool”,”SELECT /*+ index(BL_BINGLIJS…”,”sga heap(4,0)”,”kglsim heap”)

Incident details in: /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/incident/incdir_677598/emrdb1_ora_98357_i677598.trc

Fri Oct 22 10:27:31 2021

Errors in file /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/trace/emrdb1_ora_46120.trc  (incident=675174):

ORA-04031: ?T·¨·??? 3896 ×??úμ?12?í?ú′? (“shared pool”,”select jilunr from bl_bingli…”,”sga heap(6,0)”,”kglsim object batch”)

Incident details in: /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/incident/incdir_675174/emrdb1_ora_46120_i675174.trc

Fri Oct 22 10:27:31 2021

Dumping diagnostic data in directory=[cdmp_20211022102731], requested by (instance=1, osid=98357), summary=[incident=677598].

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Fri Oct 22 10:27:33 2021

Sweep [inc][677598]: completed

Sweep [inc][675174]: completed

Sweep [inc2][677598]: completed

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Fri Oct 22 10:27:36 2021

Errors in file /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/trace/emrdb1_ora_85236.trc  (incident=675246):

ORA-04031: ?T·¨·??? 3896 ×??úμ?12?í?ú′? (“shared pool”,”update BL_BINGLIJLNR set Val…”,”sga heap(3,0)”,”kglsim object batch”)

Incident details in: /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/incident/incdir_675246/emrdb1_ora_85236_i675246.trc

Fri Oct 22 10:27:37 2021

Errors in file /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/trace/emrdb1_ora_81262.trc  (incident=675606):

ORA-04031: unable to allocate 4000 bytes of shared memory (“shared pool”,”DELETE BL_BINGANSY WHERE BIN…”,”sga heap(1,0)”,”kglsim heap”)

Incident details in: /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/incident/incdir_675606/emrdb1_ora_81262_i675606.trc

Fri Oct 22 10:27:40 2021

Errors in file /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/trace/emrdb1_ora_36572.trc  (incident=674934):

ORA-04031: ?T·¨·??? 3896 ×??úμ?12?í?ú′? (“shared pool”,”update BL_BINGCHENGJLNR set …”,”sga heap(2,0)”,”kglsim object batch”)

Incident details in: /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/incident/incdir_674934/emrdb1_ora_36572_i674934.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

 

通过数据库alert日志发现,数据库在10:27开始发生了大量的ORA-04031错误。

2. 查看操作系统日志

Oct 22 00:06:07 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 00:36:08 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 01:06:08 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 01:21:51 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 01:36:17 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 02:08:17 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 02:49:54 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 03:09:42 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:1:7 —  1 2002.

Oct 22 03:22:03 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 03:22:03 cxemrdb01 rhsmd: In order for Subscription Manager to provide your system with updates, your system must be registered with the Customer Portal. Please enter your Red Hat login t

o ensure your system is up-to-date.

Oct 22 03:35:22 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:0:12 —  1 2002.

Oct 22 03:46:37 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:0:7 —  1 2002.

Oct 22 03:52:03 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 03:56:21 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:1:6 —  1 2002.

Oct 22 03:57:05 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:1:9 —  1 2002.

Oct 22 03:57:48 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:0:14 —  1 2002.

Oct 22 04:00:52 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 04:22:04 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 04:54:28 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 05:25:57 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 05:59:49 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 06:35:21 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 06:46:06 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 07:17:35 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 07:50:07 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 08:20:08 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 09:02:08 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 09:14:12 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 09:35:29 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 10:14:02 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 10:54:50 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 11:26:03 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 11:52:33 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 11:56:03 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 12:26:04 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 12:56:20 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 13:27:57 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 13:59:17 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 14:24:03 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 14:32:22 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 15:04:22 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 15:38:23 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 16:12:00 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 16:42:24 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 16:58:54 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 17:14:01 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 17:44:01 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

操作系统中未发现内存不足等相关报错和告警信息。

3. 监听日志信息分析 

每小时的请求连接数

930 22-OCT-2021 00

797 22-OCT-2021 01

715 22-OCT-2021 02

719 22-OCT-2021 03

631 22-OCT-2021 04

671 22-OCT-2021 05

683 22-OCT-2021 06

1220 22-OCT-2021 07

1789 22-OCT-2021 08

1619 22-OCT-2021 09

2157 22-OCT-2021 10

8656 22-OCT-2021 11

654 22-OCT-2021 12

1085 22-OCT-2021 13

920 22-OCT-2021 14

855 22-OCT-2021 15

954 22-OCT-2021 16

417 22-OCT-2021 17

63 22-OCT-2021 18

10:00 – 10:59内每分钟连接数统计

97 22-OCT-2021 10:00

15 22-OCT-2021 10:01

29 22-OCT-2021 10:02

8 22-OCT-2021 10:03

12 22-OCT-2021 10:04

54 22-OCT-2021 10:05

13 22-OCT-2021 10:06

21 22-OCT-2021 10:07

13 22-OCT-2021 10:08

22 22-OCT-2021 10:09

50 22-OCT-2021 10:10

13 22-OCT-2021 10:11

6 22-OCT-2021 10:12

16 22-OCT-2021 10:13

21 22-OCT-2021 10:14

42 22-OCT-2021 10:15

18 22-OCT-2021 10:16

15 22-OCT-2021 10:17

19 22-OCT-2021 10:18

18 22-OCT-2021 10:19

45 22-OCT-2021 10:20

2 22-OCT-2021 10:21

12 22-OCT-2021 10:22

15 22-OCT-2021 10:23

17 22-OCT-2021 10:24

58 22-OCT-2021 10:25

18 22-OCT-2021 10:26

19 22-OCT-2021 10:27

26 22-OCT-2021 10:28

41 22-OCT-2021 10:29

109 22-OCT-2021 10:30

54 22-OCT-2021 10:31

62 22-OCT-2021 10:32

58 22-OCT-2021 10:33

45 22-OCT-2021 10:34

76 22-OCT-2021 10:35

37 22-OCT-2021 10:36

33 22-OCT-2021 10:37

45 22-OCT-2021 10:38

36 22-OCT-2021 10:39

79 22-OCT-2021 10:40

30 22-OCT-2021 10:41

41 22-OCT-2021 10:42

34 22-OCT-2021 10:43

55 22-OCT-2021 10:44

53 22-OCT-2021 10:45

28 22-OCT-2021 10:46

48 22-OCT-2021 10:47

65 22-OCT-2021 10:48

54 22-OCT-2021 10:49

72 22-OCT-2021 10:50

43 22-OCT-2021 10:51

30 22-OCT-2021 10:52

29 22-OCT-2021 10:53

20 22-OCT-2021 10:54

76 22-OCT-2021 10:55

28 22-OCT-2021 10:56

16 22-OCT-2021 10:57

27 22-OCT-2021 10:58

19 22-OCT-2021 10:59

 

查询得知,数据库未发现有短时间的大量连接。

4. 查看数据库运行信息

SQL> select inst_id,instance_name,to_char(startup_time,’yyyy-mm-dd hh24:mi:ss’) from gv$instance;

 

INST_ID INSTANCE_NAME    TO_CHAR(STARTUP_TIM

———- —————- ——————-

1 emrdb1           2021-10-22 11:11:06

2 emrdb2           2019-12-25 17:39:15

查询得知,emrdb实例子2019年12月25日启动一来,至今运行月1年10个月的时间,1节点实例今日处理故障时重启,在重启前时间与2节点一致。

5. 查看AWR报告

 

如上图,数据库中存在大量的无绑定变量的低效SQL,均占用1MB+的内存空间,列举几个SQL:

8mfvdk3v7xcnu

select * from bl_webwenshujl t left join v_zy_bingrenjbxx b on t.bingrenid = b.BINGRENID where b.DANGQIANBQ = ‘600700’ and b.CHUYUANRQ is null and (( t.mobanid = ‘CXZX003′ and t.zongfen > 44 and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX003’)) or ( t.mobanid = ‘CXZX008′ and t.zongfen > 11 and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX008’)))

 

gjv7strrc2mr6

select * from bl_webwenshujl t left join v_zy_bingrenjbxx b on t.bingrenid = b.BINGRENID where b.DANGQIANBQ = ‘600800’ and b.CHUYUANRQ is null and (( t.mobanid = ‘CXZX003′ and t.zongfen > 44 and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX003’)) or ( t.mobanid = ‘CXZX008′ and t.zongfen > 11 and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX008’)))

 

7gc2xtvupk227

select * from bl_webwenshujl t left join v_zy_bingrenjbxx b on t.bingrenid = b.BINGRENID where t.mobanid = ‘CXZX002’ and t.zongfen < 13 and b.DANGQIANBQ = ‘601800’ and b.CHUYUANRQ is null and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX002′)

 

3nh5uf64xy8sa

select * from bl_webwenshujl t left join v_zy_bingrenjbxx b on t.bingrenid = b.BINGRENID where t.mobanid = ‘CXZX002’ and t.zongfen < 13 and b.DANGQIANBQ = ‘600800’ and b.CHUYUANRQ is null and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX002′)

 

6. 维护人员处理动作

由于检查只有1节点实例出现ORA-04031错误,而2节点实例正常,尝试手工刷新shared pool,但刷新失败。

 

为了尽快恢复业务,保证业务正常运行,对1节点实例进行了重启

Fri Oct 22 11:04:04 2021

Shutting down instance (immediate)

Stopping background process SMCO

Shutting down instance: further logons disabled

Stopping background process QMNC

Fri Oct 22 11:04:06 2021

Stopping background process CJQ0

Stopping background process MMNL

Stopping background process MMON

License high water mark = 1049

Fri Oct 22 11:05:02 2021

All dispatchers and shared servers shutdown

ALTER DATABASE CLOSE NORMAL

Fri Oct 22 11:05:05 2021

SMON: disabling tx recovery

Stopping background process RCBG

SMON: disabling cache recovery

 

总结与后续处理建议

1. 问题总结

  • 数据库2节点实例运行正常,1节点实例出现了ORA-04031错误;
  • 数据库自2019/12/25运行至今已达1年10个月之久,内存中存在大量的内存碎片;
  • 数据库中存在较多的无绑定变量SQL,无绑定变量的SQL均需要硬解析,消耗shared pool内存资源;
  • 重启数据库后,内存资源释放,内存刷新,内存碎片问题消失,业务恢复正常。

7. 处理操作

此次处理过程中,通过手工刷新shared pool失败,进而通过重启1节点实例的方式释放内存资源、重整内存碎片后,ORA-04031问题得以解决。

典型的Ora-04031错误 – 记某人民医院核心系统故障处理纪要

对于在云数据库 (OCI) 创建时 (虚拟机还没有生成) 的问题,只提供第一步的信息就可以。

 

对于其他的问题,需要提供步骤一到步骤三的所有信息。

1. 从 OCI 的控制台搜集数据库(OCI) 的相关信息:

a. Cloud Account Name
b. Region:
Region will be available on the top corner of every page in OCI console
c. Availability Domain:
Availability Domain can be found next to the database system name in OCI console
d. Tenancy ID:
In OCI console, Navigate to Administration –> Tenancy Details, Look OCID under the Tenancy Information
e. Compartment OCID:
In OCI console, Navigate to Identity –> Compartments. Look OCID under the compartment name
f. Database system OCID:
In OCI console, Navigate to Menu –> Bare Metal, VM, and Exadata, Look for Database System OCID near the database system name
g. Database OCID:
In OCI console, navigate to Database System page and click on the problematic database system. In the Database System page, look for the Database OCID under the “Databases”

 

2. 如果数据库 (OCI) 对应的虚拟机可以访问,那么用 root 用户登陆虚拟机,搜集基础架构、代理、数据库和任务详细信息:

基础架构详细信息

 

# sudo -s

# curl -s http://169.254.169.254/opc/v1/instance/ | egrep -v “user_data|ssh_authorized_keys|timeCreated”

代理详细信息

# sudo -s

# rpm -qa | grep dcs
# initctl status initdcsagent
# initctl status initdcsadmin

集群、数据库详细信息

# sudo -s

# Replace the grid home and issue below command
# /u01/app/12.2.0.1/grid/bin/crsctl check crs

# sudo su – oracle

# sqlplus / as sysdba

— Run below SQLs in SQLPLUS prompt

select status from v$instance;
select name, open_mode from v$database;
select banner from v$version where banner like ‘Oracle Database%’;

— if Multitenant

show pdbs;

任务详细信息

# sudo -s

# /opt/oracle/dcs/bin/dbcli describe-component > /tmp/dcs_job_details.log
# /opt/oracle/dcs/bin/dbcli list-databases >> /tmp/dcs_job_details.log
# /opt/oracle/dcs/bin/dbcli list-jobs >> /tmp/dcs_job_details.log
# /opt/oracle/dcs/bin/dbcli describe-job -i <faild_job_ID> >> /tmp/dcs_job_details.log

上传 /tmp/dcs_job_details.log

3. 使用 opc 用户登陆到数据库 (OCI) 对应的虚拟机,并执行如下命令:

# sudo /opt/oracle/dcs/bin/diagcollector.py

Sample Output:
=============[oci@ludatou ~]$ sudo /opt/oracle/dcs/bin/diagcollector.py
Log files collected to :/tmp/dcsdiag/diagLogs-1526004897.zip

Logs are being collected to:
/ludatou/ocidiag/diagLogs-4758698722.zip

该命令会生成 /tmp/dcsdiag/diagLogs-xxxxx.zip 的文件。

用如下命令上传这个文件到 SR:
Note 1547088.2 – How to upload large files to Oracle Support

# curl -T <path_and_filename>” -u “<userID>” https://transport.oracle.com/upload/issue/<sr-number>/

For example:
curl -T /ludatou/ocidiag/diagLogs-4758698722.zip -u ********@oracle.com https://transport.oracle.com/upload/issue/

Oracle cloud 上的云资源诊断信息收集

有一些用户的海外系统上到oracle的云,在海外oracle云上的备份设置以及相关信息查询加入到了工作常态,下面是描述如何在oracle database cloud上检查数据库的备份信息

 

1. 使用’opc’用户登录到主机

2. 切换到’root’用户并使用下面的任意一个命令:

a) 要检查备份的进度,使用下面的bkup_api的命令:
$ /var/opt/oracle/bkup_api/bkup_api bkup_status 

b) 使用bkup_api命令列出可用的备份

$ /var/opt/oracle/bkup_api/bkup_api recover_list

-或者-

c) 使用dbaascli命令列出正常的备份

$ dbaascli orec –args -list

-或者-

d) 使用dbaascli命令来列出长期保留的备份

$dbaascli orec –args -keep -list

3. 检查备份产生的obkup.log

/home/oracle/bkup/<SID>/log/obkup.log

4. 切换至’oracle’用户并连接到RMAN

RMAN target /RMAN> list backup;

-或者-

RMAN> list backup summary;

5. 关于处理了多少bytes的数据,备份持续的时间等更多细节,可以以SYS登陆数据库并检查动态视图V$RMAN_BACKUP_JOB_DETAILS

如何在Oracle Database Cloud上检查数据库备份的信息