Skip to content

Database - 11. page

问题时间点,数据库主要为enq: TM – contention等待事件;

节点1:

节点2:

TOP SQL语句的分析

数据库有一条SQL执行时间长,经沟通确认为OGG中添加表附加日志的操作产生的:BEGIN DBMS_CAPTURE_ADM.PREPARE_TABLE_INSTANTIATION(table_name => ‘”CBS”.”NEXF_APPROPRIATE”‘, supplemental_logging => ‘none’); END;

 ASH报告中异常发生的时间段

从ASH性能报告中,可以看到enq: TM – contention等待事件发生在17:20-17:45分左右:

 等待事件对应的对象信息

查看ASH报告,可以发现异常的TM等待对应的表是:CBS.NEXF_APPROPRIATE

对应的SQL语句是:INSERT INTO tran_jrnl VALUES(:id, :fiscal_date, :request_id, :respond_id, :tran_date, :tran_code, :channel_id, :teller_id, :branch_id, :resp_code, :resp_mesg, :status, :original_jrnl_id, :actual_teller_id, :service_name)

CBS.NEXF_APPROPRIATE表与 tran_jrnl 表存在主外键关联关系。

阻塞问题的进一步分析

 enq: TM – contention阻塞原理的分析

通过前面分析,可以发现从ASH报告中找出的TOP SQL对象之间存在主外键约束关系,具体为:

  • CBS用户的NEXF_APPROPRIATE表的REG_JRNL列,使用外键约束,对应的是tran_jrnl表的主键ID列。
  • 外键约束列上未创建索引。

DBMS_CAPTURE_ADM.PREPARE_TABLE_INSTANTIATION与TM等待的关系

OGG执行的操作,在数据库中对应的SQL是BEGIN DBMS_CAPTURE_ADM.PREPARE_TABLE_INSTANTIATION(table_name => ‘”CBS”.”NEXF_APPROPRIATE”‘, supplemental_logging => ‘none’); END;

ORACLE官方文档中(High ‘enq: TM – contention’ Wait Event Causing Session Hangs in an Oracle Streams Environment (Doc ID 740728.1)),对于OGG附加日志时执行的语句:BEGIN DBMS_CAPTURE_ADM.PREPARE_TABLE_INSTANTIATION(table_name => ‘”CBS”.”NEXF_APPROPRIATE”‘, supplemental_logging => ‘none’); END; 的说明,此语句语句本身会引起TM锁;

总结

通过对数据库性能数据的分析,可以发现问题时段数据库出现严重的enq: TM – contention锁等待;结合ASH性能报告中可以对应找出出现异常等待事件的时对象为CBS.NEXF_APPROPRIATE表与 tran_jrnl 表,这两张表存在主外键关联关系;

因此结合ORACLE官方文档中(High ‘enq: TM – contention’ Wait Event Causing Session Hangs in an Oracle Streams Environment (Doc ID 740728.1)),对于OGG附加日志时执行的语句:BEGIN DBMS_CAPTURE_ADM.PREPARE_TABLE_INSTANTIATION(table_name => ‘”CBS”.”NEXF_APPROPRIATE”‘, supplemental_logging => ‘none’); END; 的说明,此语句语句本身会引起TM锁;同时NEXF_APPROPRIATE表与 tran_jrnl 表存在主外键关联关系且外建列上无索引,tran_jrnl 表做为业务流水表会存在较多的INSERT操作,加剧了问题发生时的TM锁争用。

某行业务Oracle数据库卡顿简要分析

概述

 

某客户出现应用程序无法连接,前台业务无法办理等情况,登入后台发现数据库异常关闭

问题及相关日志分析

1. 数据库告警日志分析分析

通过对数据库告警日志查看,显示数据库1节点和2节点实例分别在14:33:59和14:40:41相继关闭。

1节点:

Thu Jan 28 14:40:41 2022

LMS1 (ospid: 80131) received an instance eviction notification from instance 2 [2]

Thu Jan 28 14:40:42 2022

Instance Critical Process (pid: 14, ospid: 80113, LMON) died unexpectedly

PMON (ospid: 80061): terminating the instance due to error 481

Thu Jan 28 14:40:42 2022

System state dump requested by (instance=1, osid=80061 (PMON)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_diag_22810_20220128144042.trc

Thu Jan 28 14:40:44 2022

License high water mark = 2257

Thu Jan 28 14:40:47 2022

Instance terminated by PMON, pid = 80061

Thu Jan 28 14:40:48 2022

USER (ospid: 12966): terminating the instance

 

2节点:

Thu Jan 28 14:33:59 2022

Instance Critical Process (pid: 16, ospid: 91944, LMD1) died unexpectedly

PMON (ospid: 91885): terminating the instance due to error 482

Thu Jan 28 14:33:59 2022

ORA-1092 : opitsk aborting process

Thu Jan 28 14:33:59 2022

System state dump requested by (instance=2, osid=91885 (PMON)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_diag_96171_20220128143359.trc

Thu Jan 28 14:34:01 2022

License high water mark = 1949

Thu Jan 28 14:34:04 2022

Instance terminated by PMON, pid = 91885

查看更多日志,发现在关闭之前,出现了较多的ORA-04031错误。

1节点:

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_8372.trc  (incident=656535) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^2051″,”kglseshtTable”)

Thu Jan 28 14:38:23 2022

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_8392.trc  (incident=656543) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^2147″,”kglseshtTable”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Thu Jan 28 14:38:23 2022

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_p033_97352.trc  (incident=652943) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^3061″,”kglseshtTable”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Thu Jan 28 14:38:23 2022

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_p02w_90573.trc  (incident=651431) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^3538″,”kglseshtTable”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_p033_97352.trc  (incident=652944) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate  bytes of shared memory (“”,””,””,””)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_p02w_90573.trc  (incident=651432) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate  bytes of shared memory (“”,””,””,””)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmon_80113.trc  (incident=668994) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmon_80113.trc  (incident=668995) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmon_80113.trc  (incident=669000) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmon_80113.trc  (incident=669001) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Thu Jan 28 14:38:31 2022

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_p032_97348.trc  (incident=652071) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^1124″,”kglseshtTable”)

 

2节点:

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lmd0_91940.trc  (incident=521915) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lmd1_91944.trc  (incident=521926) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lmd1_91944.trc  (incident=521927) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lmd1_91944.trc  (incident=541065) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Thu Jan 28 14:31:40 2022

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_ora_77682.trc  (incident=524416) (PDBNAME=YDHL):

ORA-04031: 脦脼路篓路脰脜盲 8504 脳脰陆脷碌脛鹿虏脧铆脛脷麓忙 (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lmd1_91944.trc  (incident=541066) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_rmv1_92030.trc  (incident=522082) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Thu Jan 28 14:31:48 2022

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lck1_91968.trc  (incident=521960) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Thu Jan 28 14:31:49 2022

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_ora_90698.trc  (incident=524824) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^2953″,”kglseshtTable”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

2. AWR报告信息分析

 

通过分析得知,数据库服务器物理内存256GB,SGA分配了40G,由于AMM,shared pool已经达到30GB,Buffer Cache只有9G,该数据库内存分配存在严重问题。

 

总结与后续处理建议

1. 问题总结

移动护理数据库内存参数配置存在较严重的问题,物理内存256G,SGA分配40G,shared pool、buffer cache均为自动配置,没有手工分配,因此,需要对内存参数进行调整,增加SGA参数值,手工固定shared pool和buffer cache等值。

3. 处理操作

由于部分内存参数为静态参数,因此修改参数就要停机。我方于2022年1月29日晚上22:00:00开始对移动护理进行内存参数优化的操作,为保证对业务的影响达到最小,此次采用轮流优化和重启1节点和2节点实例的方式。

检查操作系统大页配置:

1节点:

[oracle@node0 ~]$ cat /proc/meminfo |grep HugePage

HugePages_Total:   67200

HugePages_Free:     1378

HugePages_Rsvd:       93

HugePages_Surp:        0

2节点:

[oracle@node1 ~]$ cat /proc/meminfo |grep HugePage

HugePages_Total:   67200

HugePages_Free:     1378

HugePages_Rsvd:       93

HugePages_Surp:        0

 

大页在此之前已经有过预配置,约131G,此次计划配置内存参数为:

SGA 128GB

Shared_pool 20GB

Buffer_Cache 40GB

1节点:

alter system set sga_max_size=128G scope=spfile sid=’cdb1′;

alter system set sga_target=128G scope=spfile sid=’cdb1′;

alter system set shared_pool_size=20G scope=spfile sid=’cdb1′;

alter system set db_cache_size=40G scope=spfile sid=’cdb1′;

 

alter system switch logfile;

alter system checkpoint;

shutdown immediate;

startup;

 

alter pluggable database HEALTH open read write;

alter pluggable database YDHL open read write;

alter pluggable database MANDALA open read write;

alter pluggable database TUTEORCL open read write;

alter pluggable database DOCARE open read write;

alter pluggable database DOCARE5 open read write;

alter pluggable database BAGL open read write;

alter pluggable database EMPI open read write;

alter pluggable database LIS_OLD open read write;

alter pluggable database NCSYS open read write;

alter pluggable database ICU open read write;

alter pluggable database ORCL open read write;

 

2节点:

1节点:

alter system set sga_max_size=128G scope=spfile sid=’cdb2′;

alter system set sga_target=128G scope=spfile sid=’cdb2′;

alter system set shared_pool_size=20G scope=spfile sid=’cdb2′;

alter system set db_cache_size=40G scope=spfile sid=’cdb2′;

 

alter system switch logfile;

alter system checkpoint;

shutdown immediate;

startup;

 

alter pluggable database HEALTH open read write;

alter pluggable database YDHL open read write;

alter pluggable database MANDALA open read write;

alter pluggable database TUTEORCL open read write;

alter pluggable database DOCARE open read write;

alter pluggable database DOCARE5 open read write;

alter pluggable database BAGL open read write;

alter pluggable database EMPI open read write;

alter pluggable database LIS_OLD open read write;

alter pluggable database NCSYS open read write;

alter pluggable database ICU open read write;

alter pluggable database ORCL open read write;

在启动过程中,观察数据库告警日志,确认大页已经生效:

**********************************************************************

Fri Jan 29 22:04:03 2022

Dump of system resources acquired for SHARED GLOBAL AREA (SGA)

 

Fri Jan 29 22:04:03 2022

Per process system memlock (soft) limit = 227G

Fri Jan 29 22:04:03 2022

Expected per process system memlock (soft) limit to lock

SHARED GLOBAL AREA (SGA) into memory: 128G

Fri Jan 29 22:04:03 2022

Available system pagesizes:

4K, 2048K

Fri Jan 29 22:04:03 2022

Supported system pagesize(s):

Fri Jan 29 22:04:03 2022

PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)

Fri Jan 29 22:04:03 2022

2048K            66823           65538           65538        NONE

Fri Jan 29 22:04:03 2022

Reason for not supporting certain system pagesizes:

Fri Jan 29 22:04:03 2022

4K – Large pagesizes only

Fri Jan 29 22:04:03 2022

**********************************************************************

为保证后续更好的故障定位和问题跟踪,此次内存参数优化调整过程中,我方还对AWR相关配置进行了优化,调整AWR信息保留至30天,采集频率调整至30分钟。

execute dbms_workload_repository.modify_snapshot_settings(interval=>30,retention=>43200);

 

PL/SQL procedure successfully completed.

 

SQL> select snap_interval,retention from dba_hist_wr_control;

 

SNAP_INTERVAL                       RETENTION

———————————– ———————————–

+00000 00:30:00.0                   +00030 00:00:00.0

Oracle 内存配置不合理导致数据库异常关闭分析一例

概述

 

某客户出现应用程序响应慢,前台业务办理困难等情况。通过远程的方式接入数据库进行故障排除

问题及相关日志分析

1. 数据库等待事件分析

进入数据库中查询得到数据库中存在大量的db file sequential read、gc buffer busy acquire、read by other session等待事件和资源争用

EVENT

—————————————————————-

db file sequential read

gc buffer busy acquire

read by other session

2. AWR报告信息分析

由上图得知,在10:00~11:00,节点2总体负载较高。

 

 

 

数据库中出现了较高的IO类等等待事件,在TOP SQL中,SQL ID为a5ab3mmfmppj6的SQL语句占用了较多的资源,

通过对该SQL的有史以来执行计划查看,该SQL共有3个执行计划:

其中1785204928执行计划为最优执行计划。

 

总结与后续处理建议

1. 问题总结

该SQL语句最近发生了执行计划改变,改变的执行计划效率较差,导致SQL语句执行时间、占用资源等都较高。

3. 处理操作

运行coe_xfr_sql_profile.SQL,将SQL ID为a5ab3mmfmppj6的SQL语句绑定1785204928执行计划

4. 后续建议

通过此次的故障,建议对数据库进行深度的全方位优化,涉及内容包括但不限于如下:

序号 内容 描述
1 业务梳理 对各业务的运行时间、持续时长等做相关梳理
2 内核参数优化 对数据库所在服务器内核参数进行优化
3 数据库参数优化 对数据库PFILE/SPFILE/PROFILE等参数进行优化
4 数据库segment优化 对数据库中表、索引等对象进行优化,如碎片整理、HWM处理等。
5 数据库segment结构优化 对数据库中表、索引等结构进行优化,如表结构优化(heap table、partition table)、索引优化(global、local)等
Oracle SQL 执行计划变更导致性能慢快速分析一例

概述

 

某客户电子发票ISMS业务程序异从2022/2/20日早上8:52分至9:07分左右使用异常,之后自动恢复正常。通过远程方式接入数据库环境进行故障排查,通过对相关日志等信息的深入分析,找出问题点并提出解决方案

 

告警中的表空间使用率高问题分析

1. HIS数据库节点2在问题时间段的日志信息

节点2:系统自动扩大数据文件记录

2022-02-20T08:51:50.076038+08:00

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

2022-02-20T08:52:12.250967+08:00

HISDB(3):Resize operation completed for file# 50, old size 1048576K, new size 1069056K

2022-02-20T08:52:12.623885+08:00

HISDB(3):Resize operation completed for file# 50, old size 1069056K, new size 1110016K

2022-02-20T08:56:42.517712+08:00

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

2022-02-20T08:56:42.736925+08:00

HISDB(3):Resize operation completed for file# 49, old size 1048576K, new size 1069056K

2022-02-20T08:56:42.743704+08:00

HISDB(3):Resize operation completed for file# 50, old size 1110016K, new size 1130496K

 

2. HIS数据库节点1在问题时间段的日志信息

 

节点1:9:17分人工扩大数据文件日志记录

2022-02-20T09:17:43.482573+08:00

HISDB(3): alter database datafile ‘+DATADG/CDB/HISDB/DATAFILE/isms_data01’ resize 2G

2022-02-20T09:17:43.726081+08:00

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

2022-02-20T09:17:44.483661+08:00

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

2022-02-20T09:17:44.556208+08:00

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

2022-02-20T09:17:45.556743+08:00

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

2022-02-20T09:17:45.638312+08:00

HISDB(3):Resize operation completed for file# 49, old size 1069056K, new size 2097152K

HISDB(3):Completed:  alter database datafile ‘+DATADG/CDB/HISDB/DATAFILE/isms_data01’ resize 2G

2022-02-20T09:17:45.663855+08:00

 

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

2022-02-20T09:17:58.864830+08:00

HISDB(3):alter database datafile ‘+DATADG/CDB/HISDB/DATAFILE/isms_idx01’ resize 2G

2022-02-20T09:17:59.466571+08:00

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.

2022-02-20T09:18:00.541385+08:00

 

3. 表空间相关属性信息

查看数据文件属性,支持自动扩展

SQL> select file_id,file_name,bytes/1024/1024 mb,AUTOEXTENSIBLE,status,maxbytes/1024/1024  from dba_data_files where tablespace_name=’ISMS_DATA’;

 

FILE_ID FILE_NAME                                                  MB AUT STATUS    MAXBYTES/1024/1024

———- ————————————————– ———- — ——— ——————

49 +DATADG/CDB/HISDB/DATAFILE/isms_data01                   2048 YES AVAILABLE         32767.9844

 

SQL> select file_id,file_name,bytes/1024/1024 mb,AUTOEXTENSIBLE,status,maxbytes/1024/1024  from dba_data_files where tablespace_name=’ISMS_IDX’;

 

FILE_ID FILE_NAME                                                  MB AUT STATUS    MAXBYTES/1024/1024

———- ————————————————– ———- — ——— ——————

50 +DATADG/CDB/HISDB/DATAFILE/isms_idx01                    2048 YES AVAILABLE         32767.9844

表空间使用率-2022/2/20 13:45分记录

TABLESPACE_NAME                            Total(M) usage_of_tablespace% left_space(M) Extent_man

—————————————- ———- ——————– ————- ———-

ISMS_IDX                                       2048                   51      1009.125 LOCAL

ISMS_DATA                                      2048                   49          1054 LOCAL

4. 表空间问题分析描述

微信告警的表空间问题的分析:

1.监控报警截图8:52分报ISMS_DATA及ISMS_IDX表空间使用率高,截图中数据文件大小为1GB。

2.检查表空间及数据文件的属性信息,数据文件支持自动扩大(最大32GB)。

3.查看数据库日志-节点2中,在08:52:12及08:56:42均有自动扩大的记录

4.查看数据库日志-节点1,在9:17分人工将数据文件扩大到2GB。

 

表空间告警问题分析总结:

由于ISMS_DATA及ISMS_IDX表空间的属性支持自动扩大(最大32GB),告警时大小为1GB。在数据库日志中08:52:12及08:56:42均有数据库系统自动扩大空间的记录,因此不存在表空间不足导致应用程序无法使用的问题。

 

 

数据库性能分析

对数据库在问题时段段的性能进行分析,结合监控软件中的数据,可以发现当时存在较多的TX锁等待事件,对应的SQL是对应表名:ISMS.ISMS_TICKET_STOCK (TABLE)

对应语句是:

SELECT TICKET_DEPT_CODE, TICKET_DICTIONARY_CODE, TICKET_VERSION, TICKET_START_NUMBER,

TICKET_END_NUMBER, STOCK_NUMBER, UNIT_CODE, STOCK_CREATE_DATE, TICKET_STOCK_CODE

FROM ISMS_TICKET_STOCK WHERE TICKET_STOCK_CODE=:1 FOR UPDATE

SQL执行计划没有问题,查询条件的TICKET_STOCK_CODE字段是惟一属性。

1. 数据库ASH报告中的对应的行锁等待事件

2. 数据库ASH报告中的对应的锁等待SQL语句

对应的SQL语句:select * FROM ISMS_TICKET_STOCK

WHERE   TICKET_STOCK_CODE=:1    FOR UPDATE

 

 

 

3. SQL语句执行计划等分析

SQL信息:

SQL> set linesize 180 pagesize 1800

SQL> select sql_id,PLAN_HASH_VALUE,CHILD_number,EXECUTIONS,BUFFER_GETS/EXECUTIONS,ELAPSED_TIME/EXECUTIONS/1000 from v$sql where sql_id=’7yjz8nhdaad7k’;

 

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS BUFFER_GETS/EXECUTIONS ELAPSED_TIME/EXECUTIONS/1000

————- ————— ———— ———- ———————- —————————-

7yjz8nhdaad7k      2932473141            0       9869             10.4900193                   6640.86119

 

SQL> col plan_table_output for a150

SQL> set long 9000

SQL> select * from table(dbms_xplan.display_cursor(‘7yjz8nhdaad7k’,0,’advanced’));

 

PLAN_TABLE_OUTPUT

——————————————————————————————————————————————————

SQL_ID  7yjz8nhdaad7k, child number 0

————————————-

SELECT       TICKET_DEPT_CODE, TICKET_DICTIONARY_CODE, TICKET_VERSION,

TICKET_START_NUMBER,   TICKET_END_NUMBER, STOCK_NUMBER,   UNIT_CODE,

STOCK_CREATE_DATE, TICKET_STOCK_CODE      FROM ISMS_TICKET_STOCK

WHERE   TICKET_STOCK_CODE=:1    FOR UPDATE

 

Plan hash value: 2932473141

 

—————————————————————————————————–

| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |

—————————————————————————————————–

|   0 | SELECT STATEMENT             |                      |       |       |     1 (100)|          |

|   1 |  FOR UPDATE                  |                      |       |       |            |          |

|   2 |   TABLE ACCESS BY INDEX ROWID| ISMS_TICKET_STOCK    |     1 |    57 |     1   (0)| 00:00:01 |

|*  3 |    INDEX UNIQUE SCAN         | ISMS_TICKET_STOCK_PK |     1 |       |     0   (0)|          |

 

 

 

4. SQL历史执行信息

 

 

JBOSS中间件问题排查

1. JBOSS中间件报错日志

应用程序维护人员提供的JBOSS中间件日志报错中显示的问题为不能从JBOSS中间件连接池获取链接来连上数据库:

主要报错是:

IJ000453: Unable to get managed connection for java:jboss/datasources/ISMS:

IJ000655: No managed connections available within configured blocking timeout (30000 [ms]) 这些错误说明连接池达到最大值,没有连接可用。

相关日志:

2022-02-20 08:52:30,275 ERROR [com.th.supcom.framework.dubbo.extensions.rpc.filter.ExceptionFilter] (DubboServerHandler-172.20.2.12:20880-thread-143) Got unchecked and undeclared exception which called by 172.20.2.139. service: com.th.supcom.isms.storage.stock.service.IsmsTicketStockOutService, method: eInvoiceSaveUseOfOutDocu, exception: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:jboss/datasources/ISMS: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:jboss/datasources/ISMS

at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80) [spring-jdbc-3.2.13.RELEASE.jar:3.2.13.RELEASE]

at org.mybatis.spring.SqlSessionUtils.getSqlSession(SqlSessionUtils.java:116) [mybatis-spring-1.0.1.jar:1.0.1]

at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:333) [mybatis-spring-1.0.1.jar:1.0.1]

at com.sun.proxy.$Proxy32.selectList(Unknown Source)

at org.mybatis.spring.SqlSessionTemplate.selectList(SqlSessionTemplate.java:189) [mybatis-spring-1.0.1.jar:1.0.1]

at org.apache.ibatis.binding.MapperMethod.executeForList(MapperMethod.java:100) [mybatis-3.0.5.jar:3.0.5]

at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:70) [mybatis-3.0.5.jar:3.0.5]

at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:38) [mybatis-3.0.5.jar:3.0.5]

 

……………………

Caused by: java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:jboss/datasources/ISMS

at org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:147) [ironjacamar-jdbc-1.0.17.Final-redhat-1.jar:1.0.17.Final-redhat-1]

at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111) [spring-jdbc-3.2.13.RELEASE.jar:3.2.13.RELEASE]

at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77) [spring-jdbc-3.2.13.RELEASE.jar:3.2.13.RELEASE]

… 55 more

Caused by: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:jboss/datasources/ISMS

at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.getManagedConnection(AbstractConnectionManager.java:390)

at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.getManagedConnection(TxConnectionManagerImpl.java:368)

at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:464)

at org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:139) [ironjacamar-jdbc-1.0.17.Final-redhat-1.jar:1.0.17.Final-redhat-1]

… 57 more

Caused by: javax.resource.ResourceException: IJ000655: No managed connections available within configured blocking timeout (30000 [ms])

 

 

6. 检查JBOSS中间件连接池的配置信息

发现最大连接池参数设置为50,即允许最多50个。

 

 

 

 

 

 

 

总结与后续处理建议

通过对数据库相关日志以及性能分析,对JBOSS中间件在问题时间点的报错的分析,可以得出如下信息:

1. 微信告警的表空间问题的分析:

监控报警截图8:52分报ISMS_DATA及ISMS_IDX表空间使用率高,截图中数据文件大小为1GB

检查表空间及数据文件的属性信息,数据文件支持自动扩大(最大32GB)。

查看数据库日志-节点2中,在08:52:12及08:56:42均有自动扩大的记录

查看数据库日志-节点1,在9:17分人工将数据文件扩大到2GB。

表空间告警问题分析总结:

由于ISMS_DATA及ISMS_IDX表空间的属性支持自动扩大(最大32GB),告警时大小为1GB。在数据库日志中08:52:12及08:56:42均有自动扩大的记录,因此表空间使用率高不会导致应用程序出现无法使用问题。而当时空间使用量增加较快,说明当时应用程序对于ISMS_DATA有较多的数据写入/修改等动作。

2. 数据库性能分析锁等待的SQL语句

监控软件中显示的问题在8:52至9:07分存在较多TX锁

对应表名:ISMS用户的ISMS_TICKET_STOCK (TABLE)

对应语句是:SELECT …… FROM ISMS_TICKET_STOCK WHERE TICKET_STOCK_CODE=:1 FOR UPDATE

此SQL执行计划及执行速度没有问题,查询条件的TICKET_STOCK_CODE字段是惟一属性。SQL执行速度较慢平均需要33秒,主要是由于应用程序未及时提交事务引起的锁不释放问题。这点从3.4章节的SQLHC报告截图中可以清楚的反映出来。同时这条SQL语句的执行频率在今天上午的问题时间段也较高,达到几千次;而从数据库的历史性能数据中未显示之前有如此高的执行频率,已经将此应用程序执行SQL语句频率高的问题点交由应用软件厂商排查。

3. JBOSS中间件的问题

应用程序维护人员提供的JBOSS中间件日志报错中显示的问题为不能从JBOSS中间件连接池获取链接来连上数据库:主要报错是:

IJ000453: Unable to get managed connection for java:jboss/datasources/ISMS:

IJ000655: No managed connections available within configured blocking timeout (30000 [ms]) 这些错误说明连接池达到最大值,没有连接可用。

 

检查JBOSS中间件连接池相关配置,发现max-pool-size最大连接池参数设置为50,即允许最多50个。

5. 问题发生的原因

应用程序执行问题SQL的次数异常增加(基于SQL执行频率)或程序异常(基于问题SQL不释放锁);

比平时多的并发连接导致了JBOSS连接池用满(JBOSS连接最大50),新的应用程序进程无法从连接池分配链接;

程序报错无法连接到数据库。

 

6. 下一步排查及处理方向

因此,接下来需要及时排查的问题为:

  1. 20日上午问题点时应用程序执行问题SQL的次数异常增加的原因排查。
  2. JBOSS连接池配置max-pool-size当前为50,建议增加到200.并根据后续程序运行情况及时监测、调整。

 

7. 后续运维建议

  1. 及时关注数据库运行状态及告警,这一点会将第三方运维人员也加入告警群组来实现更早、更及时的处理数据库告警信息。
  2. 对应用程序、中间件的各项配置,要再进行排查并结合单位实际业务需求进行调整,特别是连接数等配置,确保够用。

3.定期进行数据库、中间件、应用程序等的巡检维护、性能优化等动作,确保各项配置能够满足不断发展的业务需求。

业务暴增导致Oracle出现性能故障分析一例

概述

 

某客户于2021/06/07凌晨01点43分左右,出现应用程序无法连接,前台业务无法办理等情况。在此情况下代维科技工程师在接到故障支持电话后,及时进行响应,通过远程的方式接入数据库进行故障排除,快速解决了问题使得业务恢复正常使用。

问题及相关日志分析

1. 数据库告警日志分析分析

通过对数据库告警日志查看,节点1在01:22:14出现ORA-29970错误,LMD进程无法响应,在01:22:29被LMHB进程将实例终止,实例终止后自动启动。节点2在01:29:31被LMS0进程将实例终止,实例终止后自动启动。

1节点:

Mon Jun 07 01:22:14 2021

LMD0 (ospid: 10613) has not called a wait for sub 0 secs.

LMD1 (ospid: 10617) has not called a wait for sub 0 secs.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmhb_10637.trc  (incident=688375) (PDBNAME=CDB$ROOT):

ORA-29770: global enqueue process LMD0 (OSID 10613) is hung for more than 70 seconds

Incident details in: /u01/app/oracle/diag/rdbms/cdb/cdb1/incident/incdir_688375/cdb1_lmhb_10637_i688375.trc

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmhb_10637.trc  (incident=688376) (PDBNAME=CDB$ROOT):

ORA-29770: global enqueue process LMD1 (OSID 10617) is hung for more than 70 seconds

Incident details in: /u01/app/oracle/diag/rdbms/cdb/cdb1/incident/incdir_688376/cdb1_lmhb_10637_i688376.trc

LOCK_DBGRP: GCR_SYSTEST debug event locked group GR+DB_CDB by memno 0

ERROR: Some process(s) is not making progress.

LMHB (ospid: 10637) is terminating the instance.

Please check LMHB trace file for more details.

Please also check the CPU load, I/O load and other system properties for anomalous behavior

ERROR: Some process(s) is not making progress.

LMHB (ospid: 10637): terminating the instance due to error 29770

Mon Jun 07 01:22:24 2021

System state dump requested by (instance=1, osid=10637 (LMHB)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_diag_10575_20210607012224.trc

Mon Jun 07 01:22:26 2021

License high water mark = 591

Mon Jun 07 01:22:29 2021

Instance terminated by LMHB, pid = 10637

Mon Jun 07 01:22:29 2021

USER (ospid: 21601): terminating the instance

Mon Jun 07 01:22:29 2021

Instance terminated by USER, pid = 21601

Mon Jun 07 01:22:32 2021

Starting ORACLE instance (normal) (OS id: 21660)

Mon Jun 07 01:22:32 2021

CLI notifier numLatches:37 maxDescs:3986

Mon Jun 07 01:22:32 2021

**********************************************************************

Mon Jun 07 01:22:32 2021

Dump of system resources acquired for SHARED GLOBAL AREA (SGA)

 

Mon Jun 07 01:22:32 2021

Per process system memlock (soft) limit = UNLIMITED

Mon Jun 07 01:22:32 2021

Expected per process system memlock (soft) limit to lock

SHARED GLOBAL AREA (SGA) into memory: 128G

Mon Jun 07 01:22:32 2021

Available system pagesizes:

4K, 2048K

Mon Jun 07 01:22:32 2021

Supported system pagesize(s):

Mon Jun 07 01:22:32 2021

PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)

Mon Jun 07 01:22:32 2021

2048K            66823           65538           65538        NONE

Mon Jun 07 01:22:32 2021

Reason for not supporting certain system pagesizes:

Mon Jun 07 01:22:32 2021

4K – Large pagesizes only

Mon Jun 07 01:22:32 2021

**********************************************************************

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Initial number of CPU is 24

 

2节点:

Mon Jun 07 01:25:05 2021

Set master node info

Mon Jun 07 01:26:05 2021

Auto-tuning: Shutting down background process GTXb

Mon Jun 07 01:27:53 2021

IPC Send timeout detected. Sender: ospid 62152 [oracle@node1 (PING)]

Receiver: inst 1 binc 912546309 ospid 10589

Mon Jun 07 01:29:25 2021

LMD0 (ospid: 62168) received an instance eviction notification from instance 1 [2]

Mon Jun 07 01:29:26 2021

Received an instance abort message from instance 1

Mon Jun 07 01:29:26 2021

Received an instance abort message from instance 1

Mon Jun 07 01:29:26 2021

Please check instance 1 alert and LMON trace files for detail.

Mon Jun 07 01:29:26 2021

Please check instance 1 alert and LMON trace files for detail.

Mon Jun 07 01:29:26 2021

LMS0 (ospid: 62192): terminating the instance due to error 481

Mon Jun 07 01:29:26 2021

System state dump requested by (instance=2, osid=62192 (LMS0)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_diag_62123_20210607012926.trc

Mon Jun 07 01:29:26 2021

ORA-1092 : opitsk aborting process

Mon Jun 07 01:29:27 2021

License high water mark = 1251

Mon Jun 07 01:29:31 2021

Instance terminated by LMS0, pid = 62192

Mon Jun 07 01:29:31 2021

USER (ospid: 85262): terminating the instance

Mon Jun 07 01:29:31 2021

Instance terminated by USER, pid = 85262

Mon Jun 07 01:29:33 2021

Starting ORACLE instance (normal) (OS id: 85397)

Mon Jun 07 01:29:33 2021

CLI notifier numLatches:37 maxDescs:3986

Mon Jun 07 01:29:33 2021

**********************************************************************

Mon Jun 07 01:29:33 2021

Dump of system resources acquired for SHARED GLOBAL AREA (SGA)

 

Mon Jun 07 01:29:33 2021

Per process system memlock (soft) limit = UNLIMITED

Mon Jun 07 01:29:33 2021

Expected per process system memlock (soft) limit to lock

SHARED GLOBAL AREA (SGA) into memory: 128G

Mon Jun 07 01:29:33 2021

Available system pagesizes:

4K, 2048K

Mon Jun 07 01:29:33 2021

Supported system pagesize(s):

Mon Jun 07 01:29:33 2021

PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)

Mon Jun 07 01:29:33 2021

2048K            67200           65538           65538        NONE

Mon Jun 07 01:29:33 2021

Reason for not supporting certain system pagesizes:

Mon Jun 07 01:29:33 2021

4K – Large pagesizes only

Mon Jun 07 01:29:33 2021

**********************************************************************

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Initial number of CPU is 24

查看更多日志,发现在关闭之前,出现了较多的ORA-04031错误。

Mon Jun 07 01:18:14 2021

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14795.trc  (incident=691727) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^1069″,”kglseshtTable”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmd0_10613.trc  (incident=707486) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Mon Jun 07 01:18:16 2021

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14832.trc  (incident=691688) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^320″,”kglseshtTable”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Mon Jun 07 01:18:16 2021

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14837.trc  (incident=691735) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^1234″,”kglseshtTable”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Mon Jun 07 01:18:18 2021

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14912.trc  (incident=691871) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^159″,”kglseshtTable”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmd0_10613.trc  (incident=707487) (PDBNAME=CDB$ROOT):

ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

DDE: Problem Key ‘ORA 4031’ was completely flood controlled (0x6)

Further messages for this problem key will be suppressed for up to 10 minutes

Mon Jun 07 01:18:33 2021

Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_m000_15479.trc:

ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^2147″,”kglseshtTable”)

 

 

2. AWR报告信息分析

 

拉取了宕机前半小时的AWR报告,从AWR报告中得知,shared pool中ges enqueues和ges resource dynamic分别达到了18G和16G。

3. Oracle官方参考文献

 

根据Oracle官方文档记载:ORA-04031 Errors Occurring with High “ges resource dynamic” & “ges enqueues” Memory Usage In The Shared Pool (Doc ID 2063751.1),该问题是Oracle的BUG,该BUG出现于Oracle RAC 12.1.0.1至12.1.0.2,此问题在Oracle RAC 12.2中被修复,移动护理数据库版本为Oracle RAC 12.1.0.2。

 

 

4. 数据库硬解析比例高的SQL语句

SQL_TEXT

———————————————————————————————————————————————————————————————————————————————————-

unshared count

————–

SELECT A.PATIENT_ID, A.VISIT_ID,A.ORDER_NO,A.ORDER_SUB_NO,TO_CHAR(T.BED_NO) BED_NO,TO_CHAR(START_DATE_TIME,’YYYY-MM-DD HH24:MI:SS’) AS START_DATE_TIME, TO_   FROM  V_EMR_TIME,’YYYY-MM-DD HH24:MI:SS’) AS STOP_DATE_TIME

ORDERS_PERFORMAN  AND  A.PATIENT_ID=’690086500ID’,’7′)

8′              AND (A.STOP_DATE_TIME IS NULL OR A.STOP_DATE_TIME >= TO_DATE(‘2021-06-07’||’ 00:00:00′,’YYYY-MM-DD HH24:MI:SS’)) AND A.START_DATE_TIME <=

TO_DATE(‘2021-06-07 07:52:53’, ‘YYYY-MM-DD HH24:MI:SS’)

35162

 

update KYEEHIS.hl7_send_records set type = ‘1’ where rec_id = ‘1003839362’

31808

 

AND T.BARCODE<>’1# IS NOT NULL2021-06-07′, ‘YYYY-MM-DD’)

ORDER BY T.OPER_TIME  DESC日(Q12)’, ‘1/12小时’, ‘1/8小时’)

14151

 

SELECT RECORD_TIME,ALL_S     WHERE PATIENT_ID=’1110234022′ AND VISIT_ID=’3′ AND PG_CODE=’8′ –AND RECORD_TIME BETWEEN TO_DATE(‘2021-06-07 00:00:00′,’YYYY-MM-DD HH24:MI:SS’) AND TO_DATE(‘2021-06-07 23:59:59

‘,’YYYY-MM-DD HH24:MI:SS     ORDER BY RECORD_TIME DESC

7829

 

select distinct barcode,’1’ exec_fg,nurse_name,nurse oper_nurse,” oper_time,order_exec_id,order_id,order_no,order_text,patient_id,visit_id                         where patient_id = ‘6900_exec_log

767269’                  and order_exec_id = ‘288132019’

6180

 

SELECT T.PATIENT_ID,            TO_DATE(TO_CHAR(T.EVENT_DATE,’YYYY-MM-DD’)||’ ‘|| TO_CHAR(T.TIME_POINT, ‘HH24:MI:SS’),

‘YYY      WHAND T.PATIENT_ID = ’40TAMP_CODEP_DICT K

10479126’                       AND (T.STAMP_CODE = ‘10003083’ OR T.STAMP_CODE = ‘10003084’)

5894

 

select t.* from KYEEHIS.docs_ord and order_no = ‘49712239’47427′

and exec_time = TO_DATE(‘2021-06-07 09:00:00′,’YYYY-MM-DD HH24:MI:SS’)

5070

 

SELECT REC_ID, PATIENT_ID, VISIT_ID, MODULE_CODE, MODULE_NAME, CREATE_TIME, ENBALE_VALUE FROM DOCS_OPERATION_MODULE_REC WHERE PATIENT_ID=’4010110870′ AND VISIT_ID=’3′ AND MODULE_CODE=’19’ AND ENBALE_VALUE=’Y’

4849

 

SELECT PATIENT_ID,TO_CHAR(VISIT_ID) AS VISIT_ID,INP_NO,WARD_CODE,WARD_NAME,DEPT_CODE,DEPT_NAME,PATIENT_NAME,SEX,TO_CHAR(BED_NO) AS BED_NO, BED_LABEL,DATE_OF_BIRTH,DECODE(NURSE_LEVEL,’特级护理’,’0′,’一级护理’,’1′,’二级护理’,’2′,’三级护理’,’3′) NURSE_L

EVEL,STATUS,CHARGE_TYPE,ADMISSION_DATE,DEPTED_TIME,DIAGNOSIS,MEAL,ALLERGY,ADDRESS, LEAVE_TIME,DOCTOR_IN_CHARGE ,       DOCTOR_NAME,       NURSE_IN_CHARGE,  NURSE_NAME,H_CODE,TYPE FROM KYEEHIS.v_his_pats_discharge T WHERE T.PATIENT_ID=’6901544170

‘ AND T.VISIT_ID=5 UNION  SELECT PATIENT_ID,TO_CHAR(VISIT_ID) AS VISIT_ID,INP_NO,WARD_CODE,WARD_NAME,DEPT_CODE,DEPT_NAME,PATIENT_NAME,SEX,TO_CHAR(BED_NO) AS BED_NO, BED_LABEL,DATE_OF_BIRTH,DECODE(NURSE_LEVEL,’特级护理’,’0′,’一级护理’,’1′,’二级护理’,’

2′,’三级护理’,’3′) NURSE_LEVEL,STATUS,CHARGE_TYPE,ADMISSION_DATE,DEPTED_TIME,DIAGNOSIS,MEAL,ALLERGY,ADDRESS,SYSDATE AS LEAVE_TIME,DOCTOR_IN_CHARGE ,          DOCTOR_NAME,       NURSE_IN_CHARGE,       NURSE_NAME,H_CODE,TYPE FROM KYEEHIS.V_HIS_PATS_IN_HOS

4139

 

select d.item_name,d.item_value  and t.patient_id = ‘6900510081’_eval_report_detail_rec d

and t.visit_id = ‘3’          and d.enabled_value = ‘Y’sionRecords’

4061

 

 

10 rows selected.


三、总结与后续处理建议

1. 问题总结

移动护理数据库版本是ORACLE RAC 12.1.0.2.0,本次故障发生前出现内存不足相关ORA-04031以及后台进程LMD无影响等报错信息,并最终被LMHB监测进程终止了数据库实例的运行。检查数据库后台TRACE及AWR信息报告,可以发现问题发生前shared pool内在中主要有组件ges resource dynamic和ges enqueues占用了35GB内存;

参照ORACLE官方文档信息该版本存在BUG 21373473,会导致shared pool中的ges resource dynamic和ges enqueues等组件占用内存不断增多,会导致数据库LMD后台进程无响应,从而导致实例终止宕机,这与本次问题的理解及后台TRACE吻合,因此判断为此BUG导致的数据库宕机。

同时数据库中存在硬解析比例高的问题,在2.4章节中已经将数据库中TOP 10的硬解析语句进行了抓取提供,需要软件开发商配合对此类SQL进行优化及使用SQL绑定变量等改写。

2. 问题详细分析过程

1.故障时间状态:在2021年6月7日凌晨00:30 – 01:00,移动护理数据库shared pool内存池中的ges enqueues和ges resource dynamic组件分别达到了18GB和16.8GB;

移动护理数据库在正常运行状态时ges enqueues和ges resource dynamic分别大约为3G和1G。

 

2.故障相关报错信息:问题时段这两个内存组件占用的内存显著异常,导致Oracle内存池剩余内存不足,引起数据库中进程出现ORA-04031的内存错误及后台进程LMD进程无响应(ORA-29770: global enqueue process LMD1 (OSID 10617) is hung for more than 70 seconds)。

报错信息为:ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^1234″,”kglseshtTable”)

 

3.报错信息与BUG的对应匹配:以上信息与ORA-04031 Errors Occurring with High “ges resource dynamic” & “ges enqueues” Memory Usage In The Shared Pool (Doc ID 2063751.1)及ORA-4031 due to Huge “ges resource dynamic” and “ges enqueues” and Instance Terminates: ORA-29770: ‘SGA: allocation forcing component growth’ (Doc ID 1951758.1)文档中的BUG 21373473描述吻合:ORA-04031 errors occurring in a RAC instance with the trace file showing high memory usage for “ges resource dynamic” and “ges enqueues” memory in the shared pool.  This can cause LMD processes to become unresponsive leading to an instance termination.

 

4.数据库实例被LMHB进程终止的原因:从11gR2版本,新增加LMHB进程,通过LMHB进程来定期监控lmon/lms/lmd/lck0等进程,如果LMHB进程发现某一个重要后台进程一段时间内(默认70秒)未更新自己的状态信息,会进行分析,如必要则通过重启本地节点的方式来解决问题。本次即为lmd进程超过70秒无响应后,LMHB对数据库进程进行了KILL,从而引起数据库宕机。

 

5. 后续处理及建议

  1. BUG问题的解决

根据Oracle官方参考文献,解决该问题有两种方式:

  1. 解决方案1

修改数据库参数,_GES_DIRECT_FREE_RES_TYPE=”CTARAHDXBB”,该参数是静态参数,需要重启数据库生效,可采取2个节点轮流修改的方法,减少业务停机时间,推荐采用此方案。

  1. 解决方案2

安装数据库补丁,补丁号Patch:21260431,需要安排业务停机时间安装补丁。

 

  1. 数据库硬解析比例高的解决

在2.4章节中已经将数据库中TOP 10的硬解析语句进行了抓取提供,需要软件开发商配合对此类SQL进行优化及使用SQL绑定变量等改写,从而提升系统性能,提高系统的稳定性。

ORA-29970 故障处理一例