概述
某医院客户于2021年11月5日上午9时左右出现HIS数据库连接异常的情况,在此情况下通过远程的方式接入数据库进行故障排除,快速解决了问题使得业务恢复正常使用。
问题及相关日志分析
1. 问题现状
前台业务反映HIS数据库时而连得上时而连不上,连不上时候报错ORA-12516,报错内容截图如下:
由于业务通过节点1的vip(192.2.1.10)连接数据库,测试节点1的vip连接,结果如下:
[oracle@cxhisdb01 ~]$ sqlplus a/b@192.2.1.10:1521/hospital
SQL*Plus: Release 11.2.0.4.0 Production on Fri Nov 5 09:45:15 2021
Copyright (c) 1982, 2013, Oracle. All rights reserved.
ERROR: ORA-12516: TNS:listener could not find available handler with matching protocol stack |
2. 检查数据库alert日志
ORA-00020: maximum number of processes (4500) exceeded
ORA-20 errors will not be written to the alert log for the next minute. Please look at trace files to see all the ORA-20 errors. Fri Nov 05 09:35:48 2021 Thread 1 advanced to log sequence 191580 (LGWR switch) Current log# 1 seq# 191580 mem# 0: +SSD/hospital/onlinelog/redo01.log Fri Nov 05 09:35:48 2021 LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_4 Fri Nov 05 09:35:48 2021 LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_3 Fri Nov 05 09:35:48 2021 LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_2 Fri Nov 05 09:35:48 2021 Archived Log entry 904316 added for thread 1 sequence 191579 ID 0x431aef8a dest 1: Fri Nov 05 09:37:30 2021 ORA-00020: maximum number of processes (4500) exceeded ORA-20 errors will not be written to the alert log for the next minute. Please look at trace files to see all the ORA-20 errors. |
alert中出现大量的ORA-00020错误告警,数据库连接数达到4500上限。
3. 检查数据库参数和连接
SQL> show parameter process;
NAME TYPE VALUE ———————————— ———– —————————— aq_tm_processes integer 1 cell_offload_processing boolean TRUE db_writer_processes integer 10 gcs_server_processes integer 4 global_txn_processes integer 1 job_queue_processes integer 1000 log_archive_max_processes integer 4 processes integer 4500 processor_group_name string
SQL> select inst_id,count(*) from gv$session group by inst_id;
INST_ID COUNT(*) ———- ———- 1 4496 2 715 |
查询得知,1节点连接数达到4496,已经达到连接数4500的上限,因此前端业务出现连接不上的情况。
4. 定位异常连接
SQL> select program,machine,count(*)
2 from gv$session group by program,machine,sql_id order by 3;
…
PROGRAM MACHINE COUNT(*) ——————————– ————————– ———- w3wp.exe WORKGROUP\SXZZ_SERVER_CX 10 Medinfo.Quartz.DbJob.exe WORKGROUP\QUANGUO-NB 10 yszj.exe WORKGROUP\MZ-3F-EBHYS7 10 yszj.exe WORKGROUP\MZ-2F-NK2YS4 10 yszj.exe WORKGROUP\MZ1F-EKYS7 10 emrproject.exe WORKGROUP\ZY1-2X-YS2 10
PROGRAM MACHINE COUNT(*) ——————————– ————————– ———- yszj.exe WORKGROUP\MZ-2F-WKYS9 11 yszj.exe WORKGROUP\MZ-2F-FKYS13 11 emrjz.exe WORKGROUP\JZ-QJS-HS1 11 yzxt.exe WORKGROUP\ZY-604200-01-HS 11 yszj.exe WORKGROUP\MZ-2F-GKYS3 12 yszj.exe WORKGROUP\MZ-3F-KQYS6 12 yzxt.exe WORKGROUP\ZY-609400-02-HS 12 HisServer.exe WORKGROUP\YYGHSERV 12 yszj.exe WORKGROUP\GRK-GBMZ-01 12 oracle@lis-server (TNS V1-V3) lis-server 12 yszj.exe WORKGROUP\MZ-2F-WKYS10 13
PROGRAM MACHINE COUNT(*) ——————————– ————————– ———- yszj.exe WORKGROUP\MZ-1F-EKYS6 13 yszj.exe WORKGROUP\MZ-NKJ-01 13 yzxt.exe WORKGROUP\ZY-ZHNK1-HS-04 13 yszj.exe WORKGROUP\MZ-3F-YKYS2 13 yszj.exe WORKGROUP\M3-YK-YS1 14 yszj.exe WORKGROUP\MZ-2F-FKYS-01 14 yszj.exe WORKGROUP\MZ-3F-YKYS4 14 yszj.exe WORKGROUP\MZ-2F-FKYS10 14 yszj.exe WORKGROUP\MZ-3F-TJWKYS1 15 yszj.exe WORKGROUP\MZ-3F-WKYS5 15 MAF.exe WORKGROUP\QUYUSC1 16
PROGRAM MACHINE COUNT(*) ——————————– ————————– ———- yszj.exe WORKGROUP\MZ-3F-WKYS3 16 yzxt.exe WORKGROUP\HIS-CXYY 16 yszj.exe WORKGROUP\MZ-2F-NK2YS05 16 yszj.exe WORKGROUP\MZ-2F-WKYS5 17 yszj.exe WORKGROUP\MZ-3F-WKYS4 17 yszj.exe WORKGROUP\MZ-3F-KQKYS2 17 w3wp.exe WORKGROUP\WX_DDJY 17 MediII.Adapter.WinForm.Scanner_V WORKGROUP\YYQZ 18 yszj.exe WORKGROUP\MZ-1F-EKYS4 18 yszj.exe WORKGROUP\MZ-2F-NK2YS6 18 emrproject.exe WORKGROUP\3020-CXYY 19
PROGRAM MACHINE COUNT(*) ——————————– ————————– ———- yszj.exe WORKGROUP\MZ-2F-GK24 19 yszj.exe WORKGROUP\ZY-609400-02-HS 19 yszj.exe WORKGROUP\MZ-GK-25 20 emrproject.exe WORKGROUP\HIS-PC 20 yszj.exe WORKGROUP\ZY-609400-02-YS 21 yszj.exe WORKGROUP\MZ-GUKE-YS04 21 yszj.exe WORKGROUP\MZ-2F-FKYS9 22 yszj.exe MSHOME\JZ-WKYS2 22 yszj.exe WORKGROUP\MZ-2F-NK2YS-01 22 w3wp.exe WORKGROUP\YINYIQIANZHI1 23 yszj.exe WORKGROUP\MZ-2F-WKYS4 23
PROGRAM MACHINE COUNT(*) ——————————– ————————– ———- yszj.exe WORKGROUP\MZ-2F-GKYS2 23 yszj.exe WORKGROUP\MZ-2F-FKYS6 24 yszj.exe WORKGROUP\MZ-2F-WKYS6 25 yszj.exe WORKGROUP\MZ-XEK7 25 yszj.exe WORKGROUP\MZ-3F-EBYHYS2 25 yszj.exe WORKGROUP\MZ-1F-EKYS8 26 yszj.exe WORKGROUP\MZ-3F-ZYYS-8 26 yszj.exe WORKGROUP\MZ-2F-FKYS3 27 yszj.exe WORKGROUP\MZ-2F-WKYS2 27 yszj.exe WORKGROUP\MZ-2F-WKYS3 28 yszj.exe WORKGROUP\MZ-2F-NK2YS2 28
PROGRAM MACHINE COUNT(*) ——————————– ————————– ———- yszj.exe WORKGROUP\MZ-2F-NK2-YS09 29 oracle@cxemrdb01 (TNS V1-V3) cxemrdb01 30 yszj.exe WORKGROUP\MZ-2F-NK2-9 30 yszj.exe WORKGROUP\MZ-3F-ZYYS2 30 yszj.exe WORKGROUP\MZ-3F-EBHYS6 30 yszj.exe WORKGROUP\MZ-2F-GKYS21 31 yszj.exe WORKGROUP\MZ-WAIKE-YS12 33 yzxt.exe WORKGROUP\ZY-7D-HS3 33 yszj.exe WORKGROUP\MZ-2F-FKYS5 34 yszj.exe WORKGROUP\MZ-2F-NK1YS3 34 yszj.exe WORKGROUP\DESKTOP-VI7O6SC 36
PROGRAM MACHINE COUNT(*) ——————————– ————————– ———- yszj.exe WORKGROUP\MZ-2F-WKYS1 38 yszj.exe WORKGROUP\MZ-NK2-07 39 yszj.exe WORKGROUP\MZ-2F-FKYS11 42 yszj.exe WORKGROUP\MZ-2F-NK1YS5 43 yszj.exe WORKGROUP\JZ-EK-001 45 yszj.exe WORKGROUP\LHMZ_YS2 46 yszj.exe WORKGROUP\MZ-2F-NK1YS2 50 w3wp.exe WORKGROUP\HIS-YIYAO 50 yszj.exe WORKGROUP\MZ-2F-WKYS11 56 yszj.exe WORKGROUP\LHMZ_YS1 56 yszj.exe WORKGROUP\MZ-2F-NK1YS14 58
PROGRAM MACHINE COUNT(*) ——————————– ————————– ———- yszj.exe WORKGROUP\MZ-2F-NK1YS13 62 yszj.exe WORKGROUP\JZ-NK-2 64 yszj.exe WORKGROUP\MZ-2F-WKYS007 66 yszj.exe WORKGROUP\MZ-2F-FKYS4 66 yszj.exe WORKGROUP\MZ-2F-NKYS7 69 yszj.exe WORKGROUP\MZ-2F-NK1YS4 103 yszj.exe WORKGROUP\MZ-PAOZHEN-01 108 oracle@lis-server (TNS V1-V3) lis-server 136 oracle@cxemrdb01 (TNS V1-V3) cxemrdb01 395 |
查询得知,yszj.exe应用程序对HIS数据库建立了较高数量的连接,其中主机名为“MZ-2F”开头的客户端尤为明显。
三、问题处理步骤
1. kill session
查询yszj.exe应用程序的进程ID
SQL> select b.spid 2 from v$session a,v$process b 3 where a.paddr=b.addr 4 and a.program=’yszj.exe’; 5 and a.machine like ‘MZ-2F%’;
SPID ———————— … 70593 72017 76189 77305 78409 78481 78636 78676 79541 80183 80816 … 杀会话 … [oracle@cxhisdb01 ~]$ kill -9 70593 [oracle@cxhisdb01 ~]$ kill -9 72017 [oracle@cxhisdb01 ~]$ kill -9 76189 [oracle@cxhisdb01 ~]$ kill -9 77305 [oracle@cxhisdb01 ~]$ kill -9 78409 [oracle@cxhisdb01 ~]$ kill -9 78481 [oracle@cxhisdb01 ~]$ kill -9 78636 [oracle@cxhisdb01 ~]$ kill -9 78676 [oracle@cxhisdb01 ~]$ kill -9 79541 [oracle@cxhisdb01 ~]$ kill -9 80183 [oracle@cxhisdb01 ~]$ kill -9 80816 … |
执行kill session后,效果不明显,因为会话查杀后,又有新的会话连接进来。
2. 客户沟通
通过与客户沟通了解到,YZSJ.EXE程序新上了检查报告互认功能,会导致程序连接不释放问题,导致程序到数据库连接数不断增加,从而数据库连接数达到上限。
3. 释放进程
客户安排前台业务程序主机重启,释放连接进程,业务端主机重启后,检查连接情况如下:
SQL> select inst_id,count(*) from gv$session group by inst_id;
INST_ID COUNT(*) ———- ———- 1 3427 2 681
SQL> select program,machine,count(*) 2 from gv$session where program=’yszj.exe’ group by program,machine order by 3;
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\XZ-YGK-07 1 yszj.exe WORKGROUP\YJ-NKJ-MZFT 1 yszj.exe WORKGROUP\MZ-3F-WKYS4 2 yszj.exe WORKGROUP\RLS 2 yszj.exe WORKGROUP\MZ-3F-KQKYS2 2 yszj.exe WORKGROUP\MZ-2F-GKYS3 2 yszj.exe WORKGROUP\MZ-FWT-01 2 yszj.exe WORKGROUP\FDHJ54365 2 yszj.exe WORKGROUP\MZ-2F-WKYS2 2 yszj.exe WORKGROUP\MZ-3F-KQK7 2 yszj.exe WORKGROUP\MZ-2F-WKYS9 2
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\MZ-2F-NK1-YS6 2 yszj.exe WORKGROUP\MZ-2F-FKYS9 2 yszj.exe WORKGROUP\MZ-2F-GKYS2 2 yszj.exe WORKGROUP\MZ-2F-WKYS6 2 yszj.exe WORKGROUP\CXYY-YZ2 2 yszj.exe WORKGROUP\MZ-2F-FKYS13 2 yszj.exe WORKGROUP\MZ-KQK-YS-1 2 yszj.exe WORKGROUP\MZ-2F-NK1YS13 2 yszj.exe WORKGROUP\ZY-XYJHZX-01-YS 2 yszj.exe WORKGROUP\MZ-2F-WKYS8 2 yszj.exe WORKGROUP\MZ-2F-FKYS4 2
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\MZ-2F-NK1YS4 2 yszj.exe WORKGROUP\MZ-2F-WKYS3 2 yszj.exe WORKGROUP\MZ-2F-NKYS7 2 yszj.exe WORKGROUP\MZ-3F-WKYS5 2 yszj.exe WORKGROUP\MZ-2F-WKYS007 2 yszj.exe WORKGROUP\GAMZ-01 2 yszj.exe WORKGROUP\MZ-WJ3 2 yszj.exe WORKGROUP\MZ-3F-PFYS3 2 yszj.exe WORKGROUP\ZY-609400-03-HS 2 yszj.exe WORKGROUP\MZ-2F-WKYS1 2 yszj.exe WORKGROUP\ZY1-4X-YS5 2
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\MZ-2F-FKYS7 2 yszj.exe WORKGROUP\MZ-3F-TJWKYS1 2 yszj.exe WORKGROUP\ZY-601700-05-YS 2 yszj.exe WORKGROUP\MZ-PICC-02 2 yszj.exe WORKGROUP\JZ-EK-001 2 yszj.exe WORKGROUP\FSK-SSZX-02 2 yszj.exe WORKGROUP\ZY1-5D-YS4 2 yszj.exe WORKGROUP\MZ-3F-KFYJT 2 yszj.exe WORKGROUP\ZY-XSR-YS5 2 yszj.exe WORKGROUP\MZ-ZY-YS9 2 yszj.exe WORKGROUP\MZ-3F-EBYHYS2 2
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\MZ-2F-FKYS15 2 yszj.exe WORKGROUP\MZ-2F-FK12 2 yszj.exe WORKGROUP\MZ-2F-NK1YS5 2 yszj.exe WORKGROUP\MZ-WK-SKZK 2 yszj.exe WORKGROUP\MZ-2F-NK1YS-17 2 yszj.exe WORKGROUP\MZ-2F-FKYS10 2 yszj.exe WORKGROUP\MZ-2F-GK24 2 yszj.exe WORKGROUP\MZ-2F-NK2YS10 2 yszj.exe WORKGROUP\MZ-KOUQIANG-Y11 2 yszj.exe WORKGROUP\JZ-EK-YS-06 2 yszj.exe WORKGROUP\MZ-NKJ-01 2
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\MZ-2F-NK2-9 2 yszj.exe WORKGROUP\MZ-3F-ZYYS2 2 yszj.exe WORKGROUP\MZ-3F-EBHYS6 2 yszj.exe MSHOME\JZ-QJ-04 2 yszj.exe WORKGROUP\MZ-3F-YKYS2 2 yszj.exe WORKGROUP\MZ-3F-ZY-YS3- 2 yszj.exe WORKGROUP\MZ-2F-NK1YS2 2 yszj.exe WORKGROUP\MZ-KOUQIANG-Y13 2 yszj.exe WORKGROUP\MZ-3F-ZYYS10 2 yszj.exe WORKGROUP\JZ-EBH-01 2 yszj.exe WORKGROUP\JZ-GK-YS1 2
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\MZ-NK1-YS-017 2 yszj.exe WORKGROUP\MZ1F-EKYS7 2 yszj.exe WORKGROUP\JZ-NK-2 2 yszj.exe WORKGROUP\MZ-2F-FKYS11 2 yszj.exe WORKGROUP\MZ-2F-FKYS6 2 yszj.exe WORKGROUP\MZ-2F-FKYS-01 2 yszj.exe WORKGROUP\MZ-2F-NK1YS3 2 yszj.exe WORKGROUP\ZY2-11F-YS2 2 yszj.exe WORKGROUP\MZ-WAIKE-YS12 2 yszj.exe WORKGROUP\MZ-2F-NK2-YS09 2 yszj.exe WORKGROUP\MZ-FWT-03 2
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\MZ-2F-NK2YS6 2 yszj.exe WORKGROUP\MZ-1F-EKYS6 2 yszj.exe WORKGROUP\JZ_XIONGTONG 2 yszj.exe WORKGROUP\MZ-3F-KQYS6 2 yszj.exe WORKGROUP\MZ-3F-YK13 2 yszj.exe WORKGROUP\ZY-TXBQ-YYK 2 yszj.exe WORKGROUP\LXZL-205 2 yszj.exe WORKGROUP\MZ-NK-11-YS 2 yszj.exe WORKGROUP\MZ-3F-PFK-2 2 yszj.exe WORKGROUP\MZ-NK-YS8 2 yszj.exe WORKGROUP\MZ-2F-NK2YS2 2
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\MZ-3F-KQKYS4 2 yszj.exe WORKGROUP\MZ-2F-NK2YS-01 2 yszj.exe WORKGROUP\MZ-2F-NK1YS14 2 yszj.exe WORKGROUP\MZ-3F-ZYYS1 2 yszj.exe WORKGROUP\FRMZ-YS-03 2 yszj.exe WORKGROUP\MZ-NKZJ-01 2 yszj.exe WORKGROUP\MZ-2F-NK2YS3 2 yszj.exe WORKGROUP\MZ-2F-NK2YS05 2 yszj.exe WORKGROUP\MZ-XEK7 2 yszj.exe WORKGROUP\MZ-ZJNK-YS8-A 2 yszj.exe WORKGROUP\MZ-ZHFW-05 2
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\MZ-2F-FKYS3 2 yszj.exe WORKGROUP\ZY2-6F-ZLNKYS01 2 yszj.exe WORKGROUP\MZ-FWT-04 2 yszj.exe WORKGROUP\FRMZ_YS2 2 yszj.exe WORKGROUP\MZ-3F-WKYS3 2 yszj.exe WORKGROUP\JZ-QJS-YS3 2 yszj.exe WORKGROUP\ZY-703700-05-YS 2 yszj.exe WORKGROUP\TSZX-02 3 yszj.exe WORKGROUP\MZ-3F-KQXFS1 3 yszj.exe WORKGROUP\DESKTOP-VI7O6SC 3 yszj.exe WORKGROUP\ZY-600400-06-YS 3
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\FSK-DPS-08 4 yszj.exe WORKGROUP\MZ-3F-ZYYS5 4 yszj.exe MSHOME\JZ-WKYS01 4 yszj.exe WORKGROUP\SF-YWS-YS1 5 yszj.exe WORKGROUP\MZ-3F-ZYYS4 5 yszj.exe WORKGROUP\ZY1-1X-YS 5 yszj.exe WorkGroup\USER-2021SXCVV 5 yszj.exe WORKGROUP\MZ2-NJZX-001 5 yszj.exe WORKGROUP\ZY-GK3-HS4 6 yszj.exe WORKGROUP\MZ-GB-CDMZ 6 yszj.exe WORKGROUP\MZ-2F-NK2YS11 8
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\MZ-3L-ZYNK-007 9 yszj.exe WORKGROUP\MZ-3F-ZJK1 11 yszj.exe WORKGROUP\MZ-3F-KQKYS5 14 yszj.exe WORKGROUP\HIS-PC 15 yszj.exe WORKGROUP\MZ-2F-WKYS10 16 yszj.exe WORKGROUP\GRK-GBMZ-01 16 yszj.exe WORKGROUP\MZ-PIFU-YS3 18 yszj.exe WORKGROUP\ZY-609400-02-YS 21 yszj.exe WORKGROUP\MZ-3F-KQKYS3 22 yszj.exe WORKGROUP\MZ-PAOZHEN-01 23 yszj.exe WORKGROUP\MZ-GUKE-YS04 23
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\MZ-FR-EK-1 24 yszj.exe WORKGROUP\ZY-609400-02-HS 25 yszj.exe WORKGROUP\MZ-3F-PFKYS24 25 yszj.exe WORKGROUP\MZ-2F-WKYS5 34 yszj.exe WORKGROUP\MZ-3F-YKYS4 34 yszj.exe WORKGROUP\MZ-2F-WKYS4 35 yszj.exe WORKGROUP\MZ-1F-EKYS8 35 yszj.exe MSHOME\JZ-WKYS2 35 yszj.exe WORKGROUP\MZ-3F-ZYYS-8 37 yszj.exe WORKGROUP\MZ-1F-EKYS4 45 yszj.exe WORKGROUP\LHMZ_YS2 64
PROGRAM MACHINE COUNT(*) ————— —————————— ———- yszj.exe WORKGROUP\LHMZ_YS1 78 yszj.exe WORKGROUP\MZ-2F-WKYS11 87
145 rows selected. |
通过查询得知,前端业务主机重启后,释放了一部分进程,节点1连接数已经下降至3400+,前端业务连接正常,未发现ORA-12516错误。
总结与后续处理建议
1. 问题总结
YZSJ.EXE应用程序通过192.2.1.10(节点1的VIP)连接HIS数据库,由于YZSJ.EXE程序新上了检查报告互认功能,有程序连接不释放问题,导致程序到数据库连接数不断增加,数据库连接数(4500)达到上限,从而导致前端业务连接数据库失败,出现ORA-12516:TNS:监听程序找不到符合协议堆栈要求的可用处理程序。
2. 处理操作
- 查杀会话,释放进程资源 ==>效果不明显,会话杀掉后程序会自动重连
- 重启业务前端主机,释放资源 ==>暂时缓解,数据库连接数已经下降至3400+
- 修复exe应用程序 ==>彻底解决,预计中午业务研发部门对yszj.exe做修复
3. 后续建议
此次故障发生的根本原因是由于YZSJ.EXE程序更新了检查报告互认功能导致的程序连接不释放导致数据库连接数达到上限,从而导致前端业务无法连接数据库,针对此次发生的故障,建议点如下:
- 业务测试,业务更新上线前,在开发库或测试环境做充分的业务功能性测试和性能测试;
- 调整数据库连接数参数,当前数据库的process参数为4500,可根据实际需要适当调整增大数据库的process参数,调整此参数需要重启数据库生效,因此需要安排停机时间。