概述
某客户出现应用程序无法连接,前台业务无法办理等情况,登入后台发现数据库异常关闭
问题及相关日志分析
1. 数据库告警日志分析分析
通过对数据库告警日志查看,显示数据库1节点和2节点实例分别在14:33:59和14:40:41相继关闭。
1节点:
Thu Jan 28 14:40:41 2022
LMS1 (ospid: 80131) received an instance eviction notification from instance 2 [2]
Thu Jan 28 14:40:42 2022
Instance Critical Process (pid: 14, ospid: 80113, LMON) died unexpectedly
PMON (ospid: 80061): terminating the instance due to error 481
Thu Jan 28 14:40:42 2022
System state dump requested by (instance=1, osid=80061 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_diag_22810_20220128144042.trc
Thu Jan 28 14:40:44 2022
License high water mark = 2257
Thu Jan 28 14:40:47 2022
Instance terminated by PMON, pid = 80061
Thu Jan 28 14:40:48 2022
USER (ospid: 12966): terminating the instance
2节点:
Thu Jan 28 14:33:59 2022
Instance Critical Process (pid: 16, ospid: 91944, LMD1) died unexpectedly
PMON (ospid: 91885): terminating the instance due to error 482
Thu Jan 28 14:33:59 2022
ORA-1092 : opitsk aborting process
Thu Jan 28 14:33:59 2022
System state dump requested by (instance=2, osid=91885 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_diag_96171_20220128143359.trc
Thu Jan 28 14:34:01 2022
License high water mark = 1949
Thu Jan 28 14:34:04 2022
Instance terminated by PMON, pid = 91885
查看更多日志,发现在关闭之前,出现了较多的ORA-04031错误。
1节点:
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_8372.trc (incident=656535) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^2051″,”kglseshtTable”)
Thu Jan 28 14:38:23 2022
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_8392.trc (incident=656543) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^2147″,”kglseshtTable”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jan 28 14:38:23 2022
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_p033_97352.trc (incident=652943) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^3061″,”kglseshtTable”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jan 28 14:38:23 2022
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_p02w_90573.trc (incident=651431) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^3538″,”kglseshtTable”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_p033_97352.trc (incident=652944) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate bytes of shared memory (“”,””,””,””)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_p02w_90573.trc (incident=651432) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate bytes of shared memory (“”,””,””,””)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmon_80113.trc (incident=668994) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmon_80113.trc (incident=668995) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmon_80113.trc (incident=669000) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmon_80113.trc (incident=669001) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jan 28 14:38:31 2022
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_p032_97348.trc (incident=652071) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^1124″,”kglseshtTable”)
2节点:
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lmd0_91940.trc (incident=521915) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lmd1_91944.trc (incident=521926) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lmd1_91944.trc (incident=521927) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lmd1_91944.trc (incident=541065) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jan 28 14:31:40 2022
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_ora_77682.trc (incident=524416) (PDBNAME=YDHL):
ORA-04031: 脦脼路篓路脰脜盲 8504 脳脰陆脷碌脛鹿虏脧铆脛脷麓忙 (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lmd1_91944.trc (incident=541066) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_rmv1_92030.trc (incident=522082) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jan 28 14:31:48 2022
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lck1_91968.trc (incident=521960) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(5,0)”,”ges big msg pool”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jan 28 14:31:49 2022
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_ora_90698.trc (incident=524824) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^2953″,”kglseshtTable”)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2. AWR报告信息分析
通过分析得知,数据库服务器物理内存256GB,SGA分配了40G,由于AMM,shared pool已经达到30GB,Buffer Cache只有9G,该数据库内存分配存在严重问题。
总结与后续处理建议
1. 问题总结
移动护理数据库内存参数配置存在较严重的问题,物理内存256G,SGA分配40G,shared pool、buffer cache均为自动配置,没有手工分配,因此,需要对内存参数进行调整,增加SGA参数值,手工固定shared pool和buffer cache等值。
3. 处理操作
由于部分内存参数为静态参数,因此修改参数就要停机。我方于2022年1月29日晚上22:00:00开始对移动护理进行内存参数优化的操作,为保证对业务的影响达到最小,此次采用轮流优化和重启1节点和2节点实例的方式。
检查操作系统大页配置:
1节点:
[oracle@node0 ~]$ cat /proc/meminfo |grep HugePage
HugePages_Total: 67200
HugePages_Free: 1378
HugePages_Rsvd: 93
HugePages_Surp: 0
2节点:
[oracle@node1 ~]$ cat /proc/meminfo |grep HugePage
HugePages_Total: 67200
HugePages_Free: 1378
HugePages_Rsvd: 93
HugePages_Surp: 0
大页在此之前已经有过预配置,约131G,此次计划配置内存参数为:
SGA 128GB
Shared_pool 20GB
Buffer_Cache 40GB
1节点:
alter system set sga_max_size=128G scope=spfile sid=’cdb1′;
alter system set sga_target=128G scope=spfile sid=’cdb1′;
alter system set shared_pool_size=20G scope=spfile sid=’cdb1′;
alter system set db_cache_size=40G scope=spfile sid=’cdb1′;
alter system switch logfile;
alter system checkpoint;
shutdown immediate;
startup;
alter pluggable database HEALTH open read write;
alter pluggable database YDHL open read write;
alter pluggable database MANDALA open read write;
alter pluggable database TUTEORCL open read write;
alter pluggable database DOCARE open read write;
alter pluggable database DOCARE5 open read write;
alter pluggable database BAGL open read write;
alter pluggable database EMPI open read write;
alter pluggable database LIS_OLD open read write;
alter pluggable database NCSYS open read write;
alter pluggable database ICU open read write;
alter pluggable database ORCL open read write;
2节点:
1节点:
alter system set sga_max_size=128G scope=spfile sid=’cdb2′;
alter system set sga_target=128G scope=spfile sid=’cdb2′;
alter system set shared_pool_size=20G scope=spfile sid=’cdb2′;
alter system set db_cache_size=40G scope=spfile sid=’cdb2′;
alter system switch logfile;
alter system checkpoint;
shutdown immediate;
startup;
alter pluggable database HEALTH open read write;
alter pluggable database YDHL open read write;
alter pluggable database MANDALA open read write;
alter pluggable database TUTEORCL open read write;
alter pluggable database DOCARE open read write;
alter pluggable database DOCARE5 open read write;
alter pluggable database BAGL open read write;
alter pluggable database EMPI open read write;
alter pluggable database LIS_OLD open read write;
alter pluggable database NCSYS open read write;
alter pluggable database ICU open read write;
alter pluggable database ORCL open read write;
在启动过程中,观察数据库告警日志,确认大页已经生效:
**********************************************************************
Fri Jan 29 22:04:03 2022
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)
Fri Jan 29 22:04:03 2022
Per process system memlock (soft) limit = 227G
Fri Jan 29 22:04:03 2022
Expected per process system memlock (soft) limit to lock
SHARED GLOBAL AREA (SGA) into memory: 128G
Fri Jan 29 22:04:03 2022
Available system pagesizes:
4K, 2048K
Fri Jan 29 22:04:03 2022
Supported system pagesize(s):
Fri Jan 29 22:04:03 2022
PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s)
Fri Jan 29 22:04:03 2022
2048K 66823 65538 65538 NONE
Fri Jan 29 22:04:03 2022
Reason for not supporting certain system pagesizes:
Fri Jan 29 22:04:03 2022
4K – Large pagesizes only
Fri Jan 29 22:04:03 2022
**********************************************************************
为保证后续更好的故障定位和问题跟踪,此次内存参数优化调整过程中,我方还对AWR相关配置进行了优化,调整AWR信息保留至30天,采集频率调整至30分钟。
execute dbms_workload_repository.modify_snapshot_settings(interval=>30,retention=>43200);
PL/SQL procedure successfully completed.
SQL> select snap_interval,retention from dba_hist_wr_control;
SNAP_INTERVAL RETENTION
———————————– ———————————–
+00000 00:30:00.0 +00030 00:00:00.0