Skip to content

概述

某医院客户于2021年11月5日上午9时左右出现HIS数据库连接异常的情况,在此情况下通过远程的方式接入数据库进行故障排除,快速解决了问题使得业务恢复正常使用。

问题及相关日志分析

1. 问题现状

前台业务反映HIS数据库时而连得上时而连不上,连不上时候报错ORA-12516,报错内容截图如下:

 

由于业务通过节点1的vip(192.2.1.10)连接数据库,测试节点1的vip连接,结果如下:

[oracle@cxhisdb01 ~]$ sqlplus a/b@192.2.1.10:1521/hospital

 

SQL*Plus: Release 11.2.0.4.0 Production on Fri Nov 5 09:45:15 2021

 

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

 

ERROR:

ORA-12516: TNS:listener could not find available handler with matching protocol

stack

 

2. 检查数据库alert日志

ORA-00020: maximum number of processes (4500) exceeded

ORA-20 errors will not be written to the alert log for

the next minute. Please look at trace files to see all

the ORA-20 errors.

Fri Nov 05 09:35:48 2021

Thread 1 advanced to log sequence 191580 (LGWR switch)

Current log# 1 seq# 191580 mem# 0: +SSD/hospital/onlinelog/redo01.log

Fri Nov 05 09:35:48 2021

LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_4

Fri Nov 05 09:35:48 2021

LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_3

Fri Nov 05 09:35:48 2021

LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_2

Fri Nov 05 09:35:48 2021

Archived Log entry 904316 added for thread 1 sequence 191579 ID 0x431aef8a dest 1:

Fri Nov 05 09:37:30 2021

ORA-00020: maximum number of processes (4500) exceeded

ORA-20 errors will not be written to the alert log for

the next minute. Please look at trace files to see all

the ORA-20 errors.

alert中出现大量的ORA-00020错误告警,数据库连接数达到4500上限。

3. 检查数据库参数和连接

SQL> show parameter process;

 

NAME                                 TYPE        VALUE

———————————— ———– ——————————

aq_tm_processes                      integer     1

cell_offload_processing              boolean     TRUE

db_writer_processes                  integer     10

gcs_server_processes                 integer     4

global_txn_processes                 integer     1

job_queue_processes                  integer     1000

log_archive_max_processes            integer     4

processes                            integer     4500

processor_group_name                 string

 

SQL> select inst_id,count(*) from gv$session group by inst_id;

 

INST_ID   COUNT(*)

———- ———-

1       4496

2        715

查询得知,1节点连接数达到4496,已经达到连接数4500的上限,因此前端业务出现连接不上的情况。

4. 定位异常连接

SQL> select program,machine,count(*)

2  from gv$session group by program,machine,sql_id order by 3;

 

 

PROGRAM                                          MACHINE       COUNT(*)

——————————– ————————–  ———-

w3wp.exe                         WORKGROUP\SXZZ_SERVER_CX            10

Medinfo.Quartz.DbJob.exe         WORKGROUP\QUANGUO-NB                10

yszj.exe                         WORKGROUP\MZ-3F-EBHYS7              10

yszj.exe                         WORKGROUP\MZ-2F-NK2YS4              10

yszj.exe                         WORKGROUP\MZ1F-EKYS7                10

emrproject.exe                   WORKGROUP\ZY1-2X-YS2                10

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-WKYS9               11

yszj.exe                         WORKGROUP\MZ-2F-FKYS13              11

emrjz.exe                        WORKGROUP\JZ-QJS-HS1                11

yzxt.exe                         WORKGROUP\ZY-604200-01-HS           11

yszj.exe                         WORKGROUP\MZ-2F-GKYS3               12

yszj.exe                         WORKGROUP\MZ-3F-KQYS6               12

yzxt.exe                         WORKGROUP\ZY-609400-02-HS           12

HisServer.exe                    WORKGROUP\YYGHSERV                  12

yszj.exe                         WORKGROUP\GRK-GBMZ-01               12

oracle@lis-server (TNS V1-V3)    lis-server                          12

yszj.exe                         WORKGROUP\MZ-2F-WKYS10              13

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-1F-EKYS6               13

yszj.exe                         WORKGROUP\MZ-NKJ-01                 13

yzxt.exe                         WORKGROUP\ZY-ZHNK1-HS-04            13

yszj.exe                         WORKGROUP\MZ-3F-YKYS2               13

yszj.exe                         WORKGROUP\M3-YK-YS1                 14

yszj.exe                         WORKGROUP\MZ-2F-FKYS-01             14

yszj.exe                         WORKGROUP\MZ-3F-YKYS4               14

yszj.exe                         WORKGROUP\MZ-2F-FKYS10              14

yszj.exe                         WORKGROUP\MZ-3F-TJWKYS1             15

yszj.exe                         WORKGROUP\MZ-3F-WKYS5               15

MAF.exe                          WORKGROUP\QUYUSC1                   16

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-3F-WKYS3               16

yzxt.exe                         WORKGROUP\HIS-CXYY                  16

yszj.exe                         WORKGROUP\MZ-2F-NK2YS05             16

yszj.exe                         WORKGROUP\MZ-2F-WKYS5               17

yszj.exe                         WORKGROUP\MZ-3F-WKYS4               17

yszj.exe                         WORKGROUP\MZ-3F-KQKYS2              17

w3wp.exe                         WORKGROUP\WX_DDJY                   17

MediII.Adapter.WinForm.Scanner_V WORKGROUP\YYQZ                      18

yszj.exe                         WORKGROUP\MZ-1F-EKYS4               18

yszj.exe                         WORKGROUP\MZ-2F-NK2YS6              18

emrproject.exe                   WORKGROUP\3020-CXYY                 19

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-GK24                19

yszj.exe                         WORKGROUP\ZY-609400-02-HS           19

yszj.exe                         WORKGROUP\MZ-GK-25                  20

emrproject.exe                   WORKGROUP\HIS-PC                    20

yszj.exe                         WORKGROUP\ZY-609400-02-YS           21

yszj.exe                         WORKGROUP\MZ-GUKE-YS04              21

yszj.exe                         WORKGROUP\MZ-2F-FKYS9               22

yszj.exe                         MSHOME\JZ-WKYS2                     22

yszj.exe                         WORKGROUP\MZ-2F-NK2YS-01            22

w3wp.exe                         WORKGROUP\YINYIQIANZHI1             23

yszj.exe                         WORKGROUP\MZ-2F-WKYS4               23

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-GKYS2               23

yszj.exe                         WORKGROUP\MZ-2F-FKYS6               24

yszj.exe                         WORKGROUP\MZ-2F-WKYS6               25

yszj.exe                         WORKGROUP\MZ-XEK7                   25

yszj.exe                         WORKGROUP\MZ-3F-EBYHYS2             25

yszj.exe                         WORKGROUP\MZ-1F-EKYS8               26

yszj.exe                         WORKGROUP\MZ-3F-ZYYS-8              26

yszj.exe                         WORKGROUP\MZ-2F-FKYS3               27

yszj.exe                         WORKGROUP\MZ-2F-WKYS2               27

yszj.exe                         WORKGROUP\MZ-2F-WKYS3               28

yszj.exe                         WORKGROUP\MZ-2F-NK2YS2              28

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-NK2-YS09            29

oracle@cxemrdb01 (TNS V1-V3)     cxemrdb01                           30

yszj.exe                         WORKGROUP\MZ-2F-NK2-9               30

yszj.exe                         WORKGROUP\MZ-3F-ZYYS2               30

yszj.exe                         WORKGROUP\MZ-3F-EBHYS6              30

yszj.exe                         WORKGROUP\MZ-2F-GKYS21              31

yszj.exe                         WORKGROUP\MZ-WAIKE-YS12             33

yzxt.exe                         WORKGROUP\ZY-7D-HS3                 33

yszj.exe                         WORKGROUP\MZ-2F-FKYS5               34

yszj.exe                         WORKGROUP\MZ-2F-NK1YS3              34

yszj.exe                         WORKGROUP\DESKTOP-VI7O6SC           36

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-WKYS1               38

yszj.exe                         WORKGROUP\MZ-NK2-07                 39

yszj.exe                         WORKGROUP\MZ-2F-FKYS11              42

yszj.exe                         WORKGROUP\MZ-2F-NK1YS5              43

yszj.exe                         WORKGROUP\JZ-EK-001                 45

yszj.exe                         WORKGROUP\LHMZ_YS2                  46

yszj.exe                         WORKGROUP\MZ-2F-NK1YS2              50

w3wp.exe                         WORKGROUP\HIS-YIYAO                 50

yszj.exe                         WORKGROUP\MZ-2F-WKYS11              56

yszj.exe                         WORKGROUP\LHMZ_YS1                  56

yszj.exe                         WORKGROUP\MZ-2F-NK1YS14             58

 

PROGRAM                          MACHINE                       COUNT(*)

——————————– ————————–  ———-

yszj.exe                         WORKGROUP\MZ-2F-NK1YS13             62

yszj.exe                         WORKGROUP\JZ-NK-2                   64

yszj.exe                         WORKGROUP\MZ-2F-WKYS007             66

yszj.exe                         WORKGROUP\MZ-2F-FKYS4               66

yszj.exe                         WORKGROUP\MZ-2F-NKYS7               69

yszj.exe                         WORKGROUP\MZ-2F-NK1YS4             103

yszj.exe                         WORKGROUP\MZ-PAOZHEN-01            108

oracle@lis-server (TNS V1-V3)    lis-server                         136

oracle@cxemrdb01 (TNS V1-V3)     cxemrdb01                          395

查询得知,yszj.exe应用程序对HIS数据库建立了较高数量的连接,其中主机名为“MZ-2F”开头的客户端尤为明显。

 

 

 

 

 

 

 

 

 

 

 

 

三、问题处理步骤

1. kill session

查询yszj.exe应用程序的进程ID

SQL> select b.spid

2  from v$session a,v$process b

3  where a.paddr=b.addr

4  and a.program=’yszj.exe’;

5  and a.machine like ‘MZ-2F%’;

 

SPID

————————

70593

72017

76189

77305

78409

78481

78636

78676

79541

80183

80816

杀会话

[oracle@cxhisdb01 ~]$ kill -9 70593

[oracle@cxhisdb01 ~]$ kill -9 72017

[oracle@cxhisdb01 ~]$ kill -9 76189

[oracle@cxhisdb01 ~]$ kill -9 77305

[oracle@cxhisdb01 ~]$ kill -9 78409

[oracle@cxhisdb01 ~]$ kill -9 78481

[oracle@cxhisdb01 ~]$ kill -9 78636

[oracle@cxhisdb01 ~]$ kill -9 78676

[oracle@cxhisdb01 ~]$ kill -9 79541

[oracle@cxhisdb01 ~]$ kill -9 80183

[oracle@cxhisdb01 ~]$ kill -9 80816

执行kill session后,效果不明显,因为会话查杀后,又有新的会话连接进来。

2. 客户沟通

通过与客户沟通了解到,YZSJ.EXE程序新上了检查报告互认功能,会导致程序连接不释放问题,导致程序到数据库连接数不断增加,从而数据库连接数达到上限。

3. 释放进程

客户安排前台业务程序主机重启,释放连接进程,业务端主机重启后,检查连接情况如下:

SQL> select inst_id,count(*) from gv$session group by inst_id;

 

INST_ID   COUNT(*)

———- ———-

1       3427

2        681

 

SQL> select program,machine,count(*)

2  from gv$session  where program=’yszj.exe’ group by program,machine order by 3;

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\XZ-YGK-07                     1

yszj.exe        WORKGROUP\YJ-NKJ-MZFT                   1

yszj.exe        WORKGROUP\MZ-3F-WKYS4                   2

yszj.exe        WORKGROUP\RLS                           2

yszj.exe        WORKGROUP\MZ-3F-KQKYS2                  2

yszj.exe        WORKGROUP\MZ-2F-GKYS3                   2

yszj.exe        WORKGROUP\MZ-FWT-01                     2

yszj.exe        WORKGROUP\FDHJ54365                     2

yszj.exe        WORKGROUP\MZ-2F-WKYS2                   2

yszj.exe        WORKGROUP\MZ-3F-KQK7                    2

yszj.exe        WORKGROUP\MZ-2F-WKYS9                   2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-NK1-YS6                 2

yszj.exe        WORKGROUP\MZ-2F-FKYS9                   2

yszj.exe        WORKGROUP\MZ-2F-GKYS2                   2

yszj.exe        WORKGROUP\MZ-2F-WKYS6                   2

yszj.exe        WORKGROUP\CXYY-YZ2                      2

yszj.exe        WORKGROUP\MZ-2F-FKYS13                  2

yszj.exe        WORKGROUP\MZ-KQK-YS-1                   2

yszj.exe        WORKGROUP\MZ-2F-NK1YS13                 2

yszj.exe        WORKGROUP\ZY-XYJHZX-01-YS               2

yszj.exe        WORKGROUP\MZ-2F-WKYS8                   2

yszj.exe        WORKGROUP\MZ-2F-FKYS4                   2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-NK1YS4                  2

yszj.exe        WORKGROUP\MZ-2F-WKYS3                   2

yszj.exe        WORKGROUP\MZ-2F-NKYS7                   2

yszj.exe        WORKGROUP\MZ-3F-WKYS5                   2

yszj.exe        WORKGROUP\MZ-2F-WKYS007                 2

yszj.exe        WORKGROUP\GAMZ-01                       2

yszj.exe        WORKGROUP\MZ-WJ3                        2

yszj.exe        WORKGROUP\MZ-3F-PFYS3                   2

yszj.exe        WORKGROUP\ZY-609400-03-HS               2

yszj.exe        WORKGROUP\MZ-2F-WKYS1                   2

yszj.exe        WORKGROUP\ZY1-4X-YS5                    2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-FKYS7                   2

yszj.exe        WORKGROUP\MZ-3F-TJWKYS1                 2

yszj.exe        WORKGROUP\ZY-601700-05-YS               2

yszj.exe        WORKGROUP\MZ-PICC-02                    2

yszj.exe        WORKGROUP\JZ-EK-001                     2

yszj.exe        WORKGROUP\FSK-SSZX-02                   2

yszj.exe        WORKGROUP\ZY1-5D-YS4                    2

yszj.exe        WORKGROUP\MZ-3F-KFYJT                   2

yszj.exe        WORKGROUP\ZY-XSR-YS5                    2

yszj.exe        WORKGROUP\MZ-ZY-YS9                     2

yszj.exe        WORKGROUP\MZ-3F-EBYHYS2                 2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-FKYS15                  2

yszj.exe        WORKGROUP\MZ-2F-FK12                    2

yszj.exe        WORKGROUP\MZ-2F-NK1YS5                  2

yszj.exe        WORKGROUP\MZ-WK-SKZK                    2

yszj.exe        WORKGROUP\MZ-2F-NK1YS-17                2

yszj.exe        WORKGROUP\MZ-2F-FKYS10                  2

yszj.exe        WORKGROUP\MZ-2F-GK24                    2

yszj.exe        WORKGROUP\MZ-2F-NK2YS10                 2

yszj.exe        WORKGROUP\MZ-KOUQIANG-Y11               2

yszj.exe        WORKGROUP\JZ-EK-YS-06                   2

yszj.exe        WORKGROUP\MZ-NKJ-01                     2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-NK2-9                   2

yszj.exe        WORKGROUP\MZ-3F-ZYYS2                   2

yszj.exe        WORKGROUP\MZ-3F-EBHYS6                  2

yszj.exe        MSHOME\JZ-QJ-04                         2

yszj.exe        WORKGROUP\MZ-3F-YKYS2                   2

yszj.exe        WORKGROUP\MZ-3F-ZY-YS3-                 2

yszj.exe        WORKGROUP\MZ-2F-NK1YS2                  2

yszj.exe        WORKGROUP\MZ-KOUQIANG-Y13               2

yszj.exe        WORKGROUP\MZ-3F-ZYYS10                  2

yszj.exe        WORKGROUP\JZ-EBH-01                     2

yszj.exe        WORKGROUP\JZ-GK-YS1                     2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-NK1-YS-017                 2

yszj.exe        WORKGROUP\MZ1F-EKYS7                    2

yszj.exe        WORKGROUP\JZ-NK-2                       2

yszj.exe        WORKGROUP\MZ-2F-FKYS11                  2

yszj.exe        WORKGROUP\MZ-2F-FKYS6                   2

yszj.exe        WORKGROUP\MZ-2F-FKYS-01                 2

yszj.exe        WORKGROUP\MZ-2F-NK1YS3                  2

yszj.exe        WORKGROUP\ZY2-11F-YS2                   2

yszj.exe        WORKGROUP\MZ-WAIKE-YS12                 2

yszj.exe        WORKGROUP\MZ-2F-NK2-YS09                2

yszj.exe        WORKGROUP\MZ-FWT-03                     2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-NK2YS6                  2

yszj.exe        WORKGROUP\MZ-1F-EKYS6                   2

yszj.exe        WORKGROUP\JZ_XIONGTONG                  2

yszj.exe        WORKGROUP\MZ-3F-KQYS6                   2

yszj.exe        WORKGROUP\MZ-3F-YK13                    2

yszj.exe        WORKGROUP\ZY-TXBQ-YYK                   2

yszj.exe        WORKGROUP\LXZL-205                      2

yszj.exe        WORKGROUP\MZ-NK-11-YS                   2

yszj.exe        WORKGROUP\MZ-3F-PFK-2                   2

yszj.exe        WORKGROUP\MZ-NK-YS8                     2

yszj.exe        WORKGROUP\MZ-2F-NK2YS2                  2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-3F-KQKYS4                  2

yszj.exe        WORKGROUP\MZ-2F-NK2YS-01                2

yszj.exe        WORKGROUP\MZ-2F-NK1YS14                 2

yszj.exe        WORKGROUP\MZ-3F-ZYYS1                   2

yszj.exe        WORKGROUP\FRMZ-YS-03                    2

yszj.exe        WORKGROUP\MZ-NKZJ-01                    2

yszj.exe        WORKGROUP\MZ-2F-NK2YS3                  2

yszj.exe        WORKGROUP\MZ-2F-NK2YS05                 2

yszj.exe        WORKGROUP\MZ-XEK7                       2

yszj.exe        WORKGROUP\MZ-ZJNK-YS8-A                 2

yszj.exe        WORKGROUP\MZ-ZHFW-05                    2

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-2F-FKYS3                   2

yszj.exe        WORKGROUP\ZY2-6F-ZLNKYS01               2

yszj.exe        WORKGROUP\MZ-FWT-04                     2

yszj.exe        WORKGROUP\FRMZ_YS2                      2

yszj.exe        WORKGROUP\MZ-3F-WKYS3                   2

yszj.exe        WORKGROUP\JZ-QJS-YS3                    2

yszj.exe        WORKGROUP\ZY-703700-05-YS               2

yszj.exe        WORKGROUP\TSZX-02                       3

yszj.exe        WORKGROUP\MZ-3F-KQXFS1                  3

yszj.exe        WORKGROUP\DESKTOP-VI7O6SC               3

yszj.exe        WORKGROUP\ZY-600400-06-YS               3

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\FSK-DPS-08                    4

yszj.exe        WORKGROUP\MZ-3F-ZYYS5                   4

yszj.exe        MSHOME\JZ-WKYS01                        4

yszj.exe        WORKGROUP\SF-YWS-YS1                    5

yszj.exe        WORKGROUP\MZ-3F-ZYYS4                   5

yszj.exe        WORKGROUP\ZY1-1X-YS                     5

yszj.exe        WorkGroup\USER-2021SXCVV                5

yszj.exe        WORKGROUP\MZ2-NJZX-001                  5

yszj.exe        WORKGROUP\ZY-GK3-HS4                    6

yszj.exe        WORKGROUP\MZ-GB-CDMZ                    6

yszj.exe        WORKGROUP\MZ-2F-NK2YS11                 8

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-3L-ZYNK-007                9

yszj.exe        WORKGROUP\MZ-3F-ZJK1                   11

yszj.exe        WORKGROUP\MZ-3F-KQKYS5                 14

yszj.exe        WORKGROUP\HIS-PC                       15

yszj.exe        WORKGROUP\MZ-2F-WKYS10                 16

yszj.exe        WORKGROUP\GRK-GBMZ-01                  16

yszj.exe        WORKGROUP\MZ-PIFU-YS3                  18

yszj.exe        WORKGROUP\ZY-609400-02-YS              21

yszj.exe        WORKGROUP\MZ-3F-KQKYS3                 22

yszj.exe        WORKGROUP\MZ-PAOZHEN-01                23

yszj.exe        WORKGROUP\MZ-GUKE-YS04                 23

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\MZ-FR-EK-1                   24

yszj.exe        WORKGROUP\ZY-609400-02-HS              25

yszj.exe        WORKGROUP\MZ-3F-PFKYS24                25

yszj.exe        WORKGROUP\MZ-2F-WKYS5                  34

yszj.exe        WORKGROUP\MZ-3F-YKYS4                  34

yszj.exe        WORKGROUP\MZ-2F-WKYS4                  35

yszj.exe        WORKGROUP\MZ-1F-EKYS8                  35

yszj.exe        MSHOME\JZ-WKYS2                        35

yszj.exe        WORKGROUP\MZ-3F-ZYYS-8                 37

yszj.exe        WORKGROUP\MZ-1F-EKYS4                  45

yszj.exe        WORKGROUP\LHMZ_YS2                     64

 

PROGRAM         MACHINE                          COUNT(*)

————— —————————— ———-

yszj.exe        WORKGROUP\LHMZ_YS1                     78

yszj.exe        WORKGROUP\MZ-2F-WKYS11                 87

 

145 rows selected.

通过查询得知,前端业务主机重启后,释放了一部分进程,节点1连接数已经下降至3400+,前端业务连接正常,未发现ORA-12516错误。

 

 

总结与后续处理建议

1. 问题总结

YZSJ.EXE应用程序通过192.2.1.10(节点1的VIP)连接HIS数据库,由于YZSJ.EXE程序新上了检查报告互认功能,有程序连接不释放问题,导致程序到数据库连接数不断增加,数据库连接数(4500)达到上限,从而导致前端业务连接数据库失败,出现ORA-12516:TNS:监听程序找不到符合协议堆栈要求的可用处理程序。

2. 处理操作

  • 查杀会话,释放进程资源 ==>效果不明显,会话杀掉后程序会自动重连
  • 重启业务前端主机,释放资源 ==>暂时缓解,数据库连接数已经下降至3400+
  • 修复exe应用程序  ==>彻底解决,预计中午业务研发部门对yszj.exe做修复

3. 后续建议

此次故障发生的根本原因是由于YZSJ.EXE程序更新了检查报告互认功能导致的程序连接不释放导致数据库连接数达到上限,从而导致前端业务无法连接数据库,针对此次发生的故障,建议点如下:

  • 业务测试,业务更新上线前,在开发库或测试环境做充分的业务功能性测试和性能测试;
  • 调整数据库连接数参数,当前数据库的process参数为4500,可根据实际需要适当调整增大数据库的process参数,调整此参数需要重启数据库生效,因此需要安排停机时间。

 

Oracle 连接异常处理故障一例

概述

某医院与2021/10/22上午10点48分左右,出现应用程序无法响应,前台业务无法办理等情况。通过远程的方式接入数据库进行故障排除,快速解决了问题使得业务恢复正常使用。

问题及相关日志分析

1. 查看数据库日志

Fri Oct 22 10:27:29 2021

Errors in file /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/trace/emrdb1_ora_98357.trc  (incident=677598):

ORA-04031: ?T·¨·??? 4000 ×??úμ?12?í?ú′? (“shared pool”,”SELECT /*+ index(BL_BINGLIJS…”,”sga heap(4,0)”,”kglsim heap”)

Incident details in: /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/incident/incdir_677598/emrdb1_ora_98357_i677598.trc

Fri Oct 22 10:27:31 2021

Errors in file /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/trace/emrdb1_ora_46120.trc  (incident=675174):

ORA-04031: ?T·¨·??? 3896 ×??úμ?12?í?ú′? (“shared pool”,”select jilunr from bl_bingli…”,”sga heap(6,0)”,”kglsim object batch”)

Incident details in: /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/incident/incdir_675174/emrdb1_ora_46120_i675174.trc

Fri Oct 22 10:27:31 2021

Dumping diagnostic data in directory=[cdmp_20211022102731], requested by (instance=1, osid=98357), summary=[incident=677598].

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Fri Oct 22 10:27:33 2021

Sweep [inc][677598]: completed

Sweep [inc][675174]: completed

Sweep [inc2][677598]: completed

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Fri Oct 22 10:27:36 2021

Errors in file /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/trace/emrdb1_ora_85236.trc  (incident=675246):

ORA-04031: ?T·¨·??? 3896 ×??úμ?12?í?ú′? (“shared pool”,”update BL_BINGLIJLNR set Val…”,”sga heap(3,0)”,”kglsim object batch”)

Incident details in: /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/incident/incdir_675246/emrdb1_ora_85236_i675246.trc

Fri Oct 22 10:27:37 2021

Errors in file /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/trace/emrdb1_ora_81262.trc  (incident=675606):

ORA-04031: unable to allocate 4000 bytes of shared memory (“shared pool”,”DELETE BL_BINGANSY WHERE BIN…”,”sga heap(1,0)”,”kglsim heap”)

Incident details in: /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/incident/incdir_675606/emrdb1_ora_81262_i675606.trc

Fri Oct 22 10:27:40 2021

Errors in file /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/trace/emrdb1_ora_36572.trc  (incident=674934):

ORA-04031: ?T·¨·??? 3896 ×??úμ?12?í?ú′? (“shared pool”,”update BL_BINGCHENGJLNR set …”,”sga heap(2,0)”,”kglsim object batch”)

Incident details in: /u01/app/oracle/diag/rdbms/emrdb_new/emrdb1/incident/incdir_674934/emrdb1_ora_36572_i674934.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

 

通过数据库alert日志发现,数据库在10:27开始发生了大量的ORA-04031错误。

2. 查看操作系统日志

Oct 22 00:06:07 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 00:36:08 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 01:06:08 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 01:21:51 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 01:36:17 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 02:08:17 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 02:49:54 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 03:09:42 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:1:7 —  1 2002.

Oct 22 03:22:03 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 03:22:03 cxemrdb01 rhsmd: In order for Subscription Manager to provide your system with updates, your system must be registered with the Customer Portal. Please enter your Red Hat login t

o ensure your system is up-to-date.

Oct 22 03:35:22 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:0:12 —  1 2002.

Oct 22 03:46:37 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:0:7 —  1 2002.

Oct 22 03:52:03 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 03:56:21 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:1:6 —  1 2002.

Oct 22 03:57:05 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:1:9 —  1 2002.

Oct 22 03:57:48 cxemrdb01 kernel: qla2xxx [0000:0c:00.0]-801c:0: Abort command issued nexus=0:0:14 —  1 2002.

Oct 22 04:00:52 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 04:22:04 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 04:54:28 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 05:25:57 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 05:59:49 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 06:35:21 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 06:46:06 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 07:17:35 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 07:50:07 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 08:20:08 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 09:02:08 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 09:14:12 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 09:35:29 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 10:14:02 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 10:54:50 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 11:26:03 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 11:52:33 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 11:56:03 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 12:26:04 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 12:56:20 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 13:27:57 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 13:59:17 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 14:24:03 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 14:32:22 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 15:04:22 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 15:38:23 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 16:12:00 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 16:42:24 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 16:58:54 cxemrdb01 auditd[7192]: Audit daemon rotating log files

Oct 22 17:14:01 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Oct 22 17:44:01 cxemrdb01 CLSD: The clock on host cxemrdb01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

操作系统中未发现内存不足等相关报错和告警信息。

3. 监听日志信息分析 

每小时的请求连接数

930 22-OCT-2021 00

797 22-OCT-2021 01

715 22-OCT-2021 02

719 22-OCT-2021 03

631 22-OCT-2021 04

671 22-OCT-2021 05

683 22-OCT-2021 06

1220 22-OCT-2021 07

1789 22-OCT-2021 08

1619 22-OCT-2021 09

2157 22-OCT-2021 10

8656 22-OCT-2021 11

654 22-OCT-2021 12

1085 22-OCT-2021 13

920 22-OCT-2021 14

855 22-OCT-2021 15

954 22-OCT-2021 16

417 22-OCT-2021 17

63 22-OCT-2021 18

10:00 – 10:59内每分钟连接数统计

97 22-OCT-2021 10:00

15 22-OCT-2021 10:01

29 22-OCT-2021 10:02

8 22-OCT-2021 10:03

12 22-OCT-2021 10:04

54 22-OCT-2021 10:05

13 22-OCT-2021 10:06

21 22-OCT-2021 10:07

13 22-OCT-2021 10:08

22 22-OCT-2021 10:09

50 22-OCT-2021 10:10

13 22-OCT-2021 10:11

6 22-OCT-2021 10:12

16 22-OCT-2021 10:13

21 22-OCT-2021 10:14

42 22-OCT-2021 10:15

18 22-OCT-2021 10:16

15 22-OCT-2021 10:17

19 22-OCT-2021 10:18

18 22-OCT-2021 10:19

45 22-OCT-2021 10:20

2 22-OCT-2021 10:21

12 22-OCT-2021 10:22

15 22-OCT-2021 10:23

17 22-OCT-2021 10:24

58 22-OCT-2021 10:25

18 22-OCT-2021 10:26

19 22-OCT-2021 10:27

26 22-OCT-2021 10:28

41 22-OCT-2021 10:29

109 22-OCT-2021 10:30

54 22-OCT-2021 10:31

62 22-OCT-2021 10:32

58 22-OCT-2021 10:33

45 22-OCT-2021 10:34

76 22-OCT-2021 10:35

37 22-OCT-2021 10:36

33 22-OCT-2021 10:37

45 22-OCT-2021 10:38

36 22-OCT-2021 10:39

79 22-OCT-2021 10:40

30 22-OCT-2021 10:41

41 22-OCT-2021 10:42

34 22-OCT-2021 10:43

55 22-OCT-2021 10:44

53 22-OCT-2021 10:45

28 22-OCT-2021 10:46

48 22-OCT-2021 10:47

65 22-OCT-2021 10:48

54 22-OCT-2021 10:49

72 22-OCT-2021 10:50

43 22-OCT-2021 10:51

30 22-OCT-2021 10:52

29 22-OCT-2021 10:53

20 22-OCT-2021 10:54

76 22-OCT-2021 10:55

28 22-OCT-2021 10:56

16 22-OCT-2021 10:57

27 22-OCT-2021 10:58

19 22-OCT-2021 10:59

 

查询得知,数据库未发现有短时间的大量连接。

4. 查看数据库运行信息

SQL> select inst_id,instance_name,to_char(startup_time,’yyyy-mm-dd hh24:mi:ss’) from gv$instance;

 

INST_ID INSTANCE_NAME    TO_CHAR(STARTUP_TIM

———- —————- ——————-

1 emrdb1           2021-10-22 11:11:06

2 emrdb2           2019-12-25 17:39:15

查询得知,emrdb实例子2019年12月25日启动一来,至今运行月1年10个月的时间,1节点实例今日处理故障时重启,在重启前时间与2节点一致。

5. 查看AWR报告

 

如上图,数据库中存在大量的无绑定变量的低效SQL,均占用1MB+的内存空间,列举几个SQL:

8mfvdk3v7xcnu

select * from bl_webwenshujl t left join v_zy_bingrenjbxx b on t.bingrenid = b.BINGRENID where b.DANGQIANBQ = ‘600700’ and b.CHUYUANRQ is null and (( t.mobanid = ‘CXZX003′ and t.zongfen > 44 and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX003’)) or ( t.mobanid = ‘CXZX008′ and t.zongfen > 11 and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX008’)))

 

gjv7strrc2mr6

select * from bl_webwenshujl t left join v_zy_bingrenjbxx b on t.bingrenid = b.BINGRENID where b.DANGQIANBQ = ‘600800’ and b.CHUYUANRQ is null and (( t.mobanid = ‘CXZX003′ and t.zongfen > 44 and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX003’)) or ( t.mobanid = ‘CXZX008′ and t.zongfen > 11 and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX008’)))

 

7gc2xtvupk227

select * from bl_webwenshujl t left join v_zy_bingrenjbxx b on t.bingrenid = b.BINGRENID where t.mobanid = ‘CXZX002’ and t.zongfen < 13 and b.DANGQIANBQ = ‘601800’ and b.CHUYUANRQ is null and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX002′)

 

3nh5uf64xy8sa

select * from bl_webwenshujl t left join v_zy_bingrenjbxx b on t.bingrenid = b.BINGRENID where t.mobanid = ‘CXZX002’ and t.zongfen < 13 and b.DANGQIANBQ = ‘600800’ and b.CHUYUANRQ is null and t.jilusj=(select max(jilusj) from bl_webwenshujl where bingrenid=t.bingrenid and mobanid=’CXZX002′)

 

6. 维护人员处理动作

由于检查只有1节点实例出现ORA-04031错误,而2节点实例正常,尝试手工刷新shared pool,但刷新失败。

 

为了尽快恢复业务,保证业务正常运行,对1节点实例进行了重启

Fri Oct 22 11:04:04 2021

Shutting down instance (immediate)

Stopping background process SMCO

Shutting down instance: further logons disabled

Stopping background process QMNC

Fri Oct 22 11:04:06 2021

Stopping background process CJQ0

Stopping background process MMNL

Stopping background process MMON

License high water mark = 1049

Fri Oct 22 11:05:02 2021

All dispatchers and shared servers shutdown

ALTER DATABASE CLOSE NORMAL

Fri Oct 22 11:05:05 2021

SMON: disabling tx recovery

Stopping background process RCBG

SMON: disabling cache recovery

 

总结与后续处理建议

1. 问题总结

  • 数据库2节点实例运行正常,1节点实例出现了ORA-04031错误;
  • 数据库自2019/12/25运行至今已达1年10个月之久,内存中存在大量的内存碎片;
  • 数据库中存在较多的无绑定变量SQL,无绑定变量的SQL均需要硬解析,消耗shared pool内存资源;
  • 重启数据库后,内存资源释放,内存刷新,内存碎片问题消失,业务恢复正常。

7. 处理操作

此次处理过程中,通过手工刷新shared pool失败,进而通过重启1节点实例的方式释放内存资源、重整内存碎片后,ORA-04031问题得以解决。

典型的Ora-04031错误 – 记某人民医院核心系统故障处理纪要

对于在云数据库 (OCI) 创建时 (虚拟机还没有生成) 的问题,只提供第一步的信息就可以。

 

对于其他的问题,需要提供步骤一到步骤三的所有信息。

1. 从 OCI 的控制台搜集数据库(OCI) 的相关信息:

a. Cloud Account Name
b. Region:
Region will be available on the top corner of every page in OCI console
c. Availability Domain:
Availability Domain can be found next to the database system name in OCI console
d. Tenancy ID:
In OCI console, Navigate to Administration –> Tenancy Details, Look OCID under the Tenancy Information
e. Compartment OCID:
In OCI console, Navigate to Identity –> Compartments. Look OCID under the compartment name
f. Database system OCID:
In OCI console, Navigate to Menu –> Bare Metal, VM, and Exadata, Look for Database System OCID near the database system name
g. Database OCID:
In OCI console, navigate to Database System page and click on the problematic database system. In the Database System page, look for the Database OCID under the “Databases”

 

2. 如果数据库 (OCI) 对应的虚拟机可以访问,那么用 root 用户登陆虚拟机,搜集基础架构、代理、数据库和任务详细信息:

基础架构详细信息

 

# sudo -s

# curl -s http://169.254.169.254/opc/v1/instance/ | egrep -v “user_data|ssh_authorized_keys|timeCreated”

代理详细信息

# sudo -s

# rpm -qa | grep dcs
# initctl status initdcsagent
# initctl status initdcsadmin

集群、数据库详细信息

# sudo -s

# Replace the grid home and issue below command
# /u01/app/12.2.0.1/grid/bin/crsctl check crs

# sudo su – oracle

# sqlplus / as sysdba

— Run below SQLs in SQLPLUS prompt

select status from v$instance;
select name, open_mode from v$database;
select banner from v$version where banner like ‘Oracle Database%’;

— if Multitenant

show pdbs;

任务详细信息

# sudo -s

# /opt/oracle/dcs/bin/dbcli describe-component > /tmp/dcs_job_details.log
# /opt/oracle/dcs/bin/dbcli list-databases >> /tmp/dcs_job_details.log
# /opt/oracle/dcs/bin/dbcli list-jobs >> /tmp/dcs_job_details.log
# /opt/oracle/dcs/bin/dbcli describe-job -i <faild_job_ID> >> /tmp/dcs_job_details.log

上传 /tmp/dcs_job_details.log

3. 使用 opc 用户登陆到数据库 (OCI) 对应的虚拟机,并执行如下命令:

# sudo /opt/oracle/dcs/bin/diagcollector.py

Sample Output:
=============[oci@ludatou ~]$ sudo /opt/oracle/dcs/bin/diagcollector.py
Log files collected to :/tmp/dcsdiag/diagLogs-1526004897.zip

Logs are being collected to:
/ludatou/ocidiag/diagLogs-4758698722.zip

该命令会生成 /tmp/dcsdiag/diagLogs-xxxxx.zip 的文件。

用如下命令上传这个文件到 SR:
Note 1547088.2 – How to upload large files to Oracle Support

# curl -T <path_and_filename>” -u “<userID>” https://transport.oracle.com/upload/issue/<sr-number>/

For example:
curl -T /ludatou/ocidiag/diagLogs-4758698722.zip -u ********@oracle.com https://transport.oracle.com/upload/issue/

Oracle cloud 上的云资源诊断信息收集

有一些用户的海外系统上到oracle的云,在海外oracle云上的备份设置以及相关信息查询加入到了工作常态,下面是描述如何在oracle database cloud上检查数据库的备份信息

 

1. 使用’opc’用户登录到主机

2. 切换到’root’用户并使用下面的任意一个命令:

a) 要检查备份的进度,使用下面的bkup_api的命令:
$ /var/opt/oracle/bkup_api/bkup_api bkup_status 

b) 使用bkup_api命令列出可用的备份

$ /var/opt/oracle/bkup_api/bkup_api recover_list

-或者-

c) 使用dbaascli命令列出正常的备份

$ dbaascli orec –args -list

-或者-

d) 使用dbaascli命令来列出长期保留的备份

$dbaascli orec –args -keep -list

3. 检查备份产生的obkup.log

/home/oracle/bkup/<SID>/log/obkup.log

4. 切换至’oracle’用户并连接到RMAN

RMAN target /RMAN> list backup;

-或者-

RMAN> list backup summary;

5. 关于处理了多少bytes的数据,备份持续的时间等更多细节,可以以SYS登陆数据库并检查动态视图V$RMAN_BACKUP_JOB_DETAILS

如何在Oracle Database Cloud上检查数据库备份的信息

optimizer_dynamic_sampling 参数与sql语句执行时的动态采样相关,这关系到sql的执行计划的准确与否。一般在表的统计信息收集准确的情况下执行计划(实际执行的)可以准确的,但有些情况下不一定,比如数据分布不均(要收集柱状图信息)、或使用了临时表(应用程序处于某种目的:如为了提高应用性能使用临时表存放事务的中间结果集或者将子查询作为一个临时表),都有可能执行计划不准确,最终导致sql语句在实际执行计划的产生严重的效率。

下面是收集关于这个参数的一些资料,同时在ORACLE10D的Oracle Database Documentation Library中也有记录(Oracle Database Documentation Library–>Books—>Reference–> Initialization Parameters—>OPTIMIZER_DYNAMIC_SAMPLING,Oracle Database Documentation Library–>Performance Tuning Guide–>14.5.6 Estimating Statistics with Dynamic Sampling–>Dynamic Sampling Levels

这个参数的default value=1(9i) (10g :optimizer_dynamic_sampling parameter=2) 

 

1. Dynamic Sampling (动态采样)

The purpose of dynamic sampling is to improve server performance by determining more accurate selectivity and cardinality estimates.
More accurate selectivity and cardinality estimates allow the optimizer to produce better performing plans.
(动态采样的目的是为了通过更精确的seletivity值cardinality值来提高服务器性能,更精确的seletivity值cardinality值可以让优化器提供更好的执行计划。)

Estimate single-table predicate selectivities when collected statistics cannot be used or are likely to lead to significant errors in estimation.
Estimate table cardinality for tables without statistics or for tables whose statistics are too out of date to trust.
(当没有使用statistics统计信息或者可能导致评估错误的时候,可以提前预估出来单表的selectivities值。
当表没有收集统计信息时,或者表的统计信息过期的时候,可以估算出表的cardinality值。)

2. How Dynamic Sampling Works(动态采样如何工作)

The primary performance attribute is compile time.
Oracle determines at compile time whether a query would benefit from dynamic sampling.
If so, a recursive SQL statement is issued to scan a small random sample of the table’s blocks,
and to apply the relevant single table predicates to estimate predicate selectivities.
The sample cardinality can also be used, in some cases, to estimate table cardinality.
(主要的性能影响被归因于编译时间。ORACLE来判断在编译的时候,动态采样是否对查询是否有意。如果是,那么sql语句会发起
相对应表的快的小部分随机采样,然后应用相关的单表去前瞻性预估相应的selectivities值。

3. When to Use Dynamic Sampling(什么时候使用动态采样)

(1) A better plan can be found using dynamic sampling.
(2) The sampling time is a small fraction of total execution time for the query.
(3) The query will be executed many times.

(1) 使用动态采样可以更好的生成执行计划
(2) 动态采样的时间占查询执行的时间一小部分
(3) 查询语句将被执行许多次

4. How to Use Dynamic Sampling to Improve Performance
   (如何使用动态采样提高性能)

 Level 0: dynamic sampling will not be done.
(动态采样不会收集)

 Level 1: (default value) dynamic sampling will be performed if all of the following conditions are true:
(1) There is more than one table in the query.
(2) Some table has not been analyzed and has no indexes.
(3) The optimizer determines that a relatively expensive table scan would be required for this unanalyzed table.
(默认值,如果如下的条件全部满足的时候,那么动态采样将被执行
(1) 有超过一个表的查询
(2) 一些表没有被分析,而且没有index
(3) 优化器认为这个没有被分析的表会消耗相当昂贵的表扫描资源)

 Level 2: Apply dynamic sampling to all unanalyzed tables.
The number of blocks sampled is the default number of dynamic sampling blocks.
(针对所有没有被分析的表应用动态采样,采样blocks的数量是默认的动态采样的数量)

Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for some predicate
that is a potential dynamic sampling predicate.
The number of blocks sampled is the default number of dynamic sampling blocks.
(根据level2的标准,应用动态采样到所有的表,以及为一些标准selectivity值的表使用一些采样预测,采样blocks的数量是默认的动态采样的数量)

Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns.
The number of blocks sampled is the default number of dynamic sampling blocks.
(根据level3的标准,应用动态采样到所有的表,以及一些大于2列的单表的预测。采样blocks的数量是默认的动态采样的数量)

Level 5: Apply dynamic sampling to all tables that meet the Level 4 criteria using 2 times the default number of dynamic sampling blocks.
(根据level4的标准,应用动态采样到所有的表,并且采样blocks的数量是默认的动态采样的数量的2倍)

 Level 6: Apply dynamic sampling to all tables that meet the Level 5 criteria using 4 times the default number of dynamic sampling blocks.
(根据level5的标准,应用动态采样到所有的表,并且采样blocks的数量是默认的动态采样的数量的4倍)

Level 7: Apply dynamic sampling to all tables that meet the Level 6 criteria using 8 times the default number of dynamic sampling blocks.
(根据level6的标准,应用动态采样到所有的表,并且采样blocks的数量是默认的动态采样的数量的8倍)

Level 8: Apply dynamic sampling to all tables that meet the Level 7 criteria using 32 times the default number of dynamic sampling blocks.
(根据level7的标准,应用动态采样到所有的表,并且采样blocks的数量是默认的动态采样的数量的32倍)

Level 9: Apply dynamic sampling to all tables that meet the Level 8 criteria using 128 times the default number of dynamic sampling blocks.
(根据level8的标准,应用动态采样到所有的表,并且采样blocks的数量是默认的动态采样的数量的128倍)

 Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.
(根据level9的标准,应用动态采样到所有的表,并且采样表中所有的blocks)

Increasing the value of the parameter results in more aggressive application of dynamic sampling, in terms of both the type of tables sampled (analyzed or unanalyzed) and the amount of I/O spent on sampling.
Dynamic sampling is repeatable if no rows have been inserted, deleted, or updated in the table being sampled.
(增加这个参数的值,从表的采样和I/O消耗的角度来说,动态采样将导致更多资源的征用。
在被采样的表中,即使没有记录被insert, deleted, update,采样的操作仍会被重复。)

举例如下:

SQL> select a,count(*) from t_zft group by a;

A   COUNT(*)
———- ———-
1     100000
2          1
SQL> set linesize 2000
SQL> show parameter dyna

NAME                                 TYPE                   VALUE
———————————— ———————- ——————————
optimizer_dynamic_sampling           integer                2                       –10g默认的方式
SQL> alter system set optimizer_dynamic_sampling=0;                     –改为0

系统已更改。

SQL> alter system flush shared_pool;

系统已更改。

SQL> set autotrace trace exp
SQL> select * from t_zft where a=1;

执行计划
———————————————————-
Plan hash value: 1844927459

—————————————————————————————–
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
—————————————————————————————–
|   0 | SELECT STATEMENT            |           | 27409 |    52M|   231   (0)| 00:00:03 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T_ZFT     | 27409 |    52M|   231   (0)| 00:00:03 |
|*  2 |   INDEX RANGE SCAN          | I_T_ZFT_A | 10964 |       |    97   (0)| 00:00:02 |
—————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

2 – access(“A”=1)

SQL> select * from t_zft where a=2;

执行计划
———————————————————-
Plan hash value: 1844927459

—————————————————————————————–
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
—————————————————————————————–
|   0 | SELECT STATEMENT            |           | 27409 |    52M|   231   (0)| 00:00:03 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T_ZFT     | 27409 |    52M|   231   (0)| 00:00:03 |
|*  2 |   INDEX RANGE SCAN          | I_T_ZFT_A | 10964 |       |    97   (0)| 00:00:02 |
—————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

2 – access(“A”=2)

—-默认情况下都是用索引扫描
SQL> alter system set optimizer_dynamic_sampling=2;             –重新改为2

系统已更改。

SQL> alter system flush shared_pool;

系统已更改。

SQL> select * from t_zft where a=1;

执行计划
———————————————————-
Plan hash value: 1060505680

—————————————————————————
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
—————————————————————————
|   0 | SELECT STATEMENT  |       |   102K|   196M|  9116   (1)| 00:01:50 |
|*  1 |  TABLE ACCESS FULL| T_ZFT |   102K|   196M|  9116   (1)| 00:01:50 |
—————————————————————————

Predicate Information (identified by operation id):
—————————————————

1 – filter(“A”=1)

Note
—–
– dynamic sampling used for this statement
SQL>  select * from t_zft where a=2;

执行计划
———————————————————-
Plan hash value: 1844927459

—————————————————————————————–
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
—————————————————————————————–
|   0 | SELECT STATEMENT            |           |     1 |  2015 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T_ZFT     |     1 |  2015 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_T_ZFT_A |     1 |       |     1   (0)| 00:00:01 |
—————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

2 – access(“A”=2)

Note
—–
– dynamic sampling used for this statement

因为oracle采样了默认的采样,级别为2
Level 2:    Sample all unanalyzed tables referenced in teh query using default sampling amounts(small sample)
也就是说你实际上已经使用了直方图的信息,只是这个信息是执行的时

探讨oracle optimizer_dynamic_sampling 参数与执行计划的关系