概述
医院数据库监控平台显示连接HIS数据库1节点异常,通过远程的方式接入数据库进行故障排除,快速解决了问题使得业务恢复正常使用。
问题及相关日志分析
1. 检查监听
通过2021/05/18的巡检报告中发现监听中存在1.80的IP地址,而此次检查发现监听中无1.80的IP地址。
2. 检查集群状态
检查集群状态,发现集群中多个资源状态显示OFFLINE。
[grid@cxhisdb02 ~]$ crsctl stat res -t
——————————————————————————– NAME TARGET STATE SERVER STATE_DETAILS ——————————————————————————– Local Resources ——————————————————————————– ora.DATA.dg ONLINE OFFLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.LISTENER.lsnr ONLINE OFFLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.OCR.dg ONLINE OFFLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.SSD.dg ONLINE OFFLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.asm ONLINE OFFLINE cxhisdb01 ONLINE ONLINE cxhisdb02 Started ora.gsd OFFLINE OFFLINE cxhisdb01 OFFLINE OFFLINE cxhisdb02 ora.net1.network ONLINE OFFLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.ons ONLINE OFFLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.registry.acfs ONLINE OFFLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ——————————————————————————– Cluster Resources ——————————————————————————– ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE cxhisdb02 ora.cvu 1 ONLINE OFFLINE ora.cxhisdb01.vip 1 ONLINE OFFLINE ora.cxhisdb02.vip 1 ONLINE ONLINE cxhisdb02 ora.hospital.db 1 ONLINE OFFLINE 2 ONLINE ONLINE cxhisdb02 Open ora.oc4j 1 ONLINE OFFLINE ora.scan1.vip 1 ONLINE ONLINE cxhisdb02 |
3. 检查集群日志
仅有CRSD服务异常,其他集群资源均正常,因此数据库未宕机。
2021-05-26 13:23:46.059:
[crsd(145215)]CRS-1006:The OCR location +OCR is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:23:46.068: [crsd(145215)]CRS-1006:The OCR location +OCR is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:23:56.293: [/u01/app/11.2.0/grid/bin/oraagent.bin(66885)]CRS-5822:Agent ‘/u01/app/11.2.0/grid/bin/oraagent_grid’ disconnected from server. Details at (:CRSAGF00117:) {0:21:18} in /u01/app/11.2.0/grid/log/cxhisdb01/agent/crsd/oraagent_grid/oraagent_grid.log. 2021-05-26 13:23:56.294: [/u01/app/11.2.0/grid/bin/oraagent.bin(31320)]CRS-5822:Agent ‘/u01/app/11.2.0/grid/bin/oraagent_oracle’ disconnected from server. Details at (:CRSAGF00117:) {0:19:5060 3} in /u01/app/11.2.0/grid/log/cxhisdb01/agent/crsd/oraagent_oracle/oraagent_oracle.log. 2021-05-26 13:23:56.461: [/u01/app/11.2.0/grid/bin/orarootagent.bin(145347)]CRS-5822:Agent ‘/u01/app/11.2.0/grid/bin/orarootagent_root’ disconnected from server. Details at (:CRSAGF00117:) {0: 5:1568} in /u01/app/11.2.0/grid/log/cxhisdb01/agent/crsd/orarootagent_root/orarootagent_root.log. 2021-05-26 13:23:56.485: [/u01/app/11.2.0/grid/bin/scriptagent.bin(145549)]CRS-5822:Agent ‘/u01/app/11.2.0/grid/bin/scriptagent_grid’ disconnected from server. Details at (:CRSAGF00117:) {0:9: 68} in /u01/app/11.2.0/grid/log/cxhisdb01/agent/crsd/scriptagent_grid/scriptagent_grid.log. 2021-05-26 13:23:56.651: [ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’. 2021-05-26 13:23:58.540: [crsd(5795)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:23:58.548: [crsd(5795)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:23:58.964: [ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’. 2021-05-26 13:24:00.374: [crsd(5834)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:00.382: [crsd(5834)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:01.010: [ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’. 2021-05-26 13:24:02.447: [crsd(5886)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:02.455: [crsd(5886)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:03.068: [ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’. 2021-05-26 13:24:04.457: [crsd(5909)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:04.465: [crsd(5909)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:05.102: [ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’. 2021-05-26 13:24:06.492: [crsd(5937)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:06.501: [crsd(5937)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:07.132: [ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’. 2021-05-26 13:24:08.517: [crsd(5986)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:08.525: [crsd(5986)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:09.162: [ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’. 2021-05-26 13:24:10.544: [crsd(6015)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:10.552: [crsd(6015)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:11.193: [ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’. 2021-05-26 13:24:12.581: [crsd(6051)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:12.589: [crsd(6051)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:13.223: [ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’. 2021-05-26 13:24:14.614: [crsd(6070)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:14.622: [crsd(6070)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:15.253: [ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’. 2021-05-26 13:24:16.643: [crsd(6090)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:16.650: [crsd(6090)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:24:17.284: [ohasd(144192)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘cxhisdb01’. 2021-05-26 13:24:17.284: [ohasd(144192)]CRS-2771:Maximum restart attempts reached for resource ‘ora.crsd’; will not restart. 2021-05-26 13:24:17.315: [ohasd(144192)]CRS-2769:Unable to failover resource ‘ora.crsd’. |
4. 检查ASM日志
ASM日志中2021/05/26 12:19:57显示OCR仲裁盘有问题,13:23出现读写错误。
Wed May 26 12:19:57 2021
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3. WARNING: Waited 15 secs for write IO to PST disk 2 in group 3. WARNING: Waited 15 secs for write IO to PST disk 3 in group 3. WARNING: Waited 15 secs for write IO to PST disk 4 in group 3. WARNING: Waited 15 secs for write IO to PST disk 0 in group 3. WARNING: Waited 15 secs for write IO to PST disk 2 in group 3. WARNING: Waited 15 secs for write IO to PST disk 3 in group 3. WARNING: Waited 15 secs for write IO to PST disk 4 in group 3. Wed May 26 12:19:57 2021 NOTE: process _b000_+asm1 (160488) initiating offline of disk 0.1409468596 (OCR_0000) with mask 0x7e in group 3 NOTE: process _b000_+asm1 (160488) initiating offline of disk 2.1409468594 (OCR_0002) with mask 0x7e in group 3 NOTE: process _b000_+asm1 (160488) initiating offline of disk 3.1409468595 (OCR_0003) with mask 0x7e in group 3 NOTE: process _b000_+asm1 (160488) initiating offline of disk 4.1409468592 (OCR_0004) with mask 0x7e in group 3 NOTE: checking PST: grp = 3 GMON checking disk modes for group 3 at 15 for pid 46, osid 160488 ERROR: no read quorum in group: required 3, found 1 disks NOTE: checking PST for grp 3 done. NOTE: initiating PST update: grp = 3, dsk = 0/0x5402c8b4, mask = 0x6a, op = clear NOTE: initiating PST update: grp = 3, dsk = 2/0x5402c8b2, mask = 0x6a, op = clear NOTE: initiating PST update: grp = 3, dsk = 3/0x5402c8b3, mask = 0x6a, op = clear NOTE: initiating PST update: grp = 3, dsk = 4/0x5402c8b0, mask = 0x6a, op = clear GMON updating disk modes for group 3 at 16 for pid 46, osid 160488 ERROR: no read quorum in group: required 3, found 1 disks Wed May 26 12:19:57 2021 NOTE: cache dismounting (not clean) group 3/0xA242386E (OCR) NOTE: messaging CKPT to quiesce pins Unix process pid: 160495, image: oracle@cxhisdb01 (B001) Wed May 26 12:19:57 2021 NOTE: halting all I/Os to diskgroup 3 (OCR) Wed May 26 12:19:57 2021 NOTE: LGWR doing non-clean dismount of group 3 (OCR) NOTE: LGWR sync ABA=15.85 last written ABA 15.85 WARNING: Offline for disk OCR_0000 in mode 0x7f failed. WARNING: Offline for disk OCR_0002 in mode 0x7f failed. WARNING: Offline for disk OCR_0003 in mode 0x7f failed. WARNING: Offline for disk OCR_0004 in mode 0x7f failed. Wed May 26 12:19:58 2021 kjbdomdet send to inst 2 detach from dom 3, sending detach message to inst 2 Wed May 26 12:19:58 2021 NOTE: No asm libraries found in the system Wed May 26 12:19:58 2021 List of instances: 1 2 Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4) Global Resource Directory partially frozen for dirty detach * dirty detach – domain 3 invalid = TRUE 2 GCS resources traversed, 0 cancelled Dirty Detach Reconfiguration complete Wed May 26 12:19:58 2021 WARNING: dirty detached from domain 3 NOTE: cache dismounted group 3/0xA242386E (OCR)
2021-05-26 13:23:46.059: [crsd(145215)]CRS-1006:The OCR location +OCR is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. 2021-05-26 13:23:46.068: [crsd(145215)]CRS-1006:The OCR location +OCR is inaccessible. Details in /u01/app/11.2.0/grid/log/cxhisdb01/crsd/crsd.log. |
问题处理步骤
由于数据库仅1节点OCR磁盘组访问异常,2节点以及其他盘访问正常,因而导致1节点CRS资源异常,且数据库未宕机,处理步骤需启动CRS资源,并且重启监听。
1. 1节点挂载OCR磁盘组
[root@cxhisdb01 ~]# su – grid
[grid@cxhisdb01 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Thu May 27 12:12:17 2021
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup ocr mount;
Diskgroup altered.
SQL> exit |
2. 启动CRS
[grid@cxhisdb01 ~]$ crsctl start res ora.crsd -init
CRS-2672: Attempting to start ‘ora.crsd’ on ‘cxhisdb01’ CRS-2676: Start of ‘ora.crsd’ on ‘cxhisdb01’ succeeded |
3. 检查集群资源
[grid@cxhisdb01 ~]$ crsctl stat res -t
——————————————————————————– NAME TARGET STATE SERVER STATE_DETAILS ——————————————————————————– Local Resources ——————————————————————————– ora.DATA.dg ONLINE ONLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.LISTENER.lsnr ONLINE ONLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.OCR.dg ONLINE ONLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.SSD.dg ONLINE ONLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.asm ONLINE ONLINE cxhisdb01 Started ONLINE ONLINE cxhisdb02 Started ora.gsd OFFLINE OFFLINE cxhisdb01 OFFLINE OFFLINE cxhisdb02 ora.net1.network ONLINE ONLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.ons ONLINE ONLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ora.registry.acfs ONLINE ONLINE cxhisdb01 ONLINE ONLINE cxhisdb02 ——————————————————————————– Cluster Resources ——————————————————————————– ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE cxhisdb02 ora.cvu 1 ONLINE ONLINE cxhisdb01 ora.cxhisdb01.vip 1 ONLINE ONLINE cxhisdb01 ora.cxhisdb02.vip 1 ONLINE ONLINE cxhisdb02 ora.hospital.db 1 ONLINE ONLINE cxhisdb01 Open 2 ONLINE ONLINE cxhisdb02 Open ora.oc4j 1 ONLINE ONLINE cxhisdb01 ora.scan1.vip 1 ONLINE ONLINE cxhisdb02 |
4. 重启监听
[grid@cxhisdb01 ~]$ srvctl stop listener -n cxhisdb01
[grid@cxhisdb01 ~]$ srvctl start listener -n cxhisdb01 [grid@cxhisdb01 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.4.0 – Production on 27-MAY-2021 12:20:44
Copyright (c) 1991, 2013, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER))) STATUS of the LISTENER ———————— Alias LISTENER Version TNSLSNR for Linux: Version 11.2.0.4.0 – Production Start Date 27-MAY-2021 12:20:40 Uptime 0 days 0 hr. 0 min. 4 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora Listener Log File /u01/app/grid/diag/tnslsnr/cxhisdb01/listener/alert/log.xml Listening Endpoints Summary… (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.80)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.10)(PORT=1521))) Services Summary… Service “hospital” has 1 instance(s). Instance “hospital1”, status READY, has 1 handler(s) for this service… Service “hospitalXDB” has 1 instance(s). Instance “hospital1”, status READY, has 1 handler(s) for this service… The command completed successfully [grid@cxhisdb01 ~]$ cd /u01/app/grid/diag/tnslsnr/cxhisdb01/listener/trace/ [grid@cxhisdb01 trace]$ tail -f listener.log 27-MAY-2021 12:20:51 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))(SERVICE_NAME=hospital)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.171)(PORT=62045)) * establish * hospital * 0 27-MAY-2021 12:20:51 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=ZY-603300-02-YS)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.9.58)(PORT=3676)) * establish * hospital * 0 27-MAY-2021 12:20:51 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=ZY-603300-02-YS)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.9.58)(PORT=3677)) * establish * hospital * 0 27-MAY-2021 12:20:55 * (CONNECT_DATA=(SERVICE_NAME=HOSPITAL)(CID=(PROGRAM=oracle)(HOST=lis-server)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.11)(PORT=23431)) * establish * HOSPITAL * 0 27-MAY-2021 12:20:55 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=JZ-EK-001)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.3.99)(PORT=4442)) * establish * hospital * 0 27-MAY-2021 12:20:55 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=JZ-EK-001)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.3.99)(PORT=4444)) * establish * hospital * 0 27-MAY-2021 12:20:56 * (CONNECT_DATA=(SID=hospital1)(CID=(PROGRAM=配置数据库.exe)(HOST=YYJQZZJFW)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.244)(PORT=58397)) * establish * hospital1 * 0 27-MAY-2021 12:20:56 * (CONNECT_DATA=(SERVICE_NAME=HOSPITAL)(CID=(PROGRAM=oracle)(HOST=lis-server)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.11)(PORT=23432)) * establish * HOSPITAL * 0 27-MAY-2021 12:20:56 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=ZY-603300-02-YS)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.9.58)(PORT=3681)) * establish * hospital * 0 27-MAY-2021 12:20:57 * service_update * hospital1 * 0 27-MAY-2021 12:20:58 * (CONNECT_DATA=(SERVICE_NAME=HOSPITAL)(CID=(PROGRAM=oracle)(HOST=lis-server)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.11)(PORT=23438)) * establish * HOSPITAL * 0 27-MAY-2021 12:20:59 * (CONNECT_DATA=(SERVICE_NAME=hospital)(CID=(PROGRAM=e:\zjhis\电子病历PB9\emrproject.exe)(HOST=JZ-EK-001)(USER=his))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.3.99)(PORT=4460)) * establish * hospital * 0 27-MAY-2021 12:21:00 * service_update * hospital1 * 0 Thu May 27 12:21:01 2021 27-MAY-2021 12:21:01 * (CONNECT_DATA=(SID=hospital1)(CID=(PROGRAM=配置数据库.exe)(HOST=YYJQZZJFW)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.244)(PORT=58398)) * establish * hospital1 * 0 27-MAY-2021 12:21:03 * service_update * hospital1 * 0 27-MAY-2021 12:21:03 * (CONNECT_DATA=(SERVICE_NAME=HOSPITAL)(CID=(PROGRAM=oracle)(HOST=lis-server)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.2.1.11)(PORT=23444)) * establish * HOSPITAL * 0 |
总结与后续处理建议
1. 问题总结
HIS数据库1节点访问OCR仲裁盘异常导致CRSD异常,进而引起1节点上多个集群资源offline,监听资源中相较于以前也少了192.2.1.80的IP,因此监控平台显示连接HIS数据库1节点异常,由于2节点和其他磁盘组无异常,因此数据库未发生宕机现象。
2. 处理操作
- HIS数据库1节点手工mount OCR磁盘组
- 启动CRS资源
- 1节点重启监听资源
3. 后续建议
此次故障发生是由于集群资源访问OCR磁盘组异常导致,建议联系存储工程师,排查相关时间点存储运行情况,并定期做好存储巡检和状态监控工作。
Oracle voting disk 故障处理一例