Skip to content

Oracle 排障配置与调整

手动升级到 Non-CDB Oracle Database 12c Release 2(12.2)的核对清单

步骤1:升级到数据库 12.2 的升级路径

能够直接升级到 Oracle 12c Release 2 (12.2) 的数据库最小版本:

 

源数据库 目标数据库
11.2.0.3 / 11.2.0.4 12.2.x
12.1.0.1 / 12.1.0.2 12.2.x

以下的数据库版本需要间接升级:

源数据库 升级路径 目标数据库
11.2.0.1 / 11.2.0.2 –> 11.2.0.4 –> 12.2.x
11.1.0.6 / 11.1.0.7 –> 11.2.0.4 –> 12.2.x
10.2.0.2/10.2.0.3/10.2.0.4/10.2.0.5 –> 11.2.0.4 / 12.1.0.2 –> 12.2.x
10.1.0.5 –> 11.2.0.4 / 12.1.0.2 –> 12.2.x
9.2.0.8 –> 11.2.0.3 / 11.2.0.4 –> 12.2.x

 

比如

  • 如果源库是 11.2.0.2 或者 11.1.0.7,那么你需要先升级至 11.2.0.4。
  • 如果源库是 10.2.0.2,10.2.0.3,10.2.0.4,10.2.0.5 或者 10.1.0.5,需要先升级至 11.2.0.4 或者 12.1.0.2。
  • 对于 9.2.0.8 版本的数据库,需要先升级至一个中间版本,比如:9.2.0.8 -> 11.2.0.3 或者 11.2.0.4 -> 12.1。

步骤2:推荐/需要在源库上完成的

 

  • 对源库做备份,冷备份或热备份都可以。
  • 禁用所有自定义的 before/after DDL 类型的触发器,完成升级后再启用它们。
  • 在 11g 数据库上定义的 Data security roles 不能自动转换成 ORAS。所以在升级前,需要删除所有在 11g 数据库上定义的 data security roles。升级后可以使用 Analytic Workspace Manager 12c 重新定义 data security roles。
  • 如果从 11g 升级到 12c 之前未删除 data security roles,那么所有的 data security policies 以及 data security role 都会在 12c 上失效。
  • Timezone 版本应当小于等于目标数据库的 Timezone 版本。
  • 如果源库上已经安装了 APEX 组件,那么升级数据库前需要先在源库上升级 APEX 组件。

步骤3:推荐/需要在目标库上完成的

 

  • 需要先检查您的硬件平台/操作系统是否兼容 12.2。点击此处来确定兼容性。
  • 安装数据库软件 12.2.0.1,并确保没有安装方面的问题。
  • 如果有的话,下载并应用最新的 PSU。
  • 从源库的 ORACLE_HOME/dbs 下拷贝 spfile 或者 pfile 到目标 ORACLE_HOME/dbs。
  • 从参数文件中删除所有废弃的参数。
  • 注意升级到 12.2 需要的最低的参数 COMPATIBLE 值为”11.2.0”,确保参数 COMPATIBLE 值设置为 11.2.0 或者更高。
  • 查看文档 “Patches to apply before upgrading Oracle GI and DB to 12.2.0.1 (Doc ID 2180188.1)” 给出的推荐补丁

步骤 4:检查源库的健康状况

 

  • 执行 dbupgdiag.sql(可以从 note 556610.1 下载这个脚本),并且确认是否有 SYS/SYSTEM 用户下的失效对象或者失效组件。如果存在的话,那么需要在升级前解决这些问题。你可以多次执行 utlrp.sql 来解决问题。如果在这样做之后仍然存在失效对象,那么开一个 SR 来解决这个问题。
  • 多次执行脚本 utlrp.sql 确认数据库中没有失效对象。

 

步骤5:升级前检查

 

清理数据库

 

清空回收站
检查 SYS 及 SYSTEM 用户的失效对象
检查 SYS 及 SYSTEM 用户下的重复对象
检查失效的、必需的、废弃的组件

 

检查物化视图

 

检查所有的物化视图的状态,刷新所有没有刷新的物化视图。
检查物化视图日志的大小,如果物化视图日志的行数非零,那么刷新物化视图。
检查 direct loader 日志以及 PMOP 日志(分区维护操作日志),如果 direct loader log 或者 PMOP 日志非空,那么刷新日志显示的物化视图。
升级数据库前,必须确保所有的物化视图都已经刷新完毕。

 

1. 执行下面的 SQL 查询:

 

SQL> SELECT o.name FROM sys.obj$ o, sys.user$ u, sys.sum$ s WHERE o.type# = 42 AND bitand(s.mflags, 8) =8; 

性能方面

 

保存性能相关指标
检查网络性能
收集优化器统计信息

收集统计信息可以减少停机时间,Oracle建议使用 DBMS_STATS.GATHER_DICTIONARY_STATS 来收集这些统计信息,比如:

 

SQL> EXEC DBMS_STATS.GATHER_DICTIONARY_STATS;

检查时区设置

 

Oracle database 12.2 的默认 time zone 文件版本是 V26。
源库的 time zone 文件版本应该小于或者等于目标库的 time zone 文件版本。如果源库的 time zone 文件版本更高,那么需要升级目标库的 time zone 文件版本来对应源库的 time zone 文件。

 

备份数据库

 

备份数据库,创建 guaranteed flashback restore point。
在升级数据库窗口前应最少测试一下回滚策略。确保回滚策略不仅考虑到升级中,同时也要考虑到升级后的失败。

1)连接到 RMAN

rman “target / nocatalog”
2)执行 RMAN 脚本来备份
RUN
{
ALLOCATE CHANNEL chan_name TYPE DISK;
BACKUP DATABASE FORMAT ‘some_backup_directory%U’ TAG before_upgrade;
BACKUP CURRENT CONTROLFILE FORMAT ‘controlfile location and name’;
}

 

确保升级前所有的文件都没有处于备份模式

 

执行下面的语句:

 

SQL> SELECT * FROM v$backup WHERE status != ‘NOT ACTIVE’; 

 

清空回收站

 

要清空回收站,执行下面的语句:

SQL> PURGE DBA_RECYCLEBIN

注意:升级前务必清空回收站来避免 ORA-00600 错误并且减少升级时间。

 

 

备份 Oracle EM DB Control 配置及数据

 

如果在升级数据库到 12.2 版本后,有需要再降级,那么我们必须在升级前使用 emdwgrd 工具备份 Database Control 的文件,这样在降级后可以恢复这些文件。

 

备份数据的步骤:

 

1. 安装 12.2 的数据库软件。
2. 设置 ORACLE_HOME 到旧的数据库版本。
3. 设置 ORACLE_SID 为要升级的数据库 SID。
4. 设置 PATH, LD_LIBRARY_PATH 和 SHLIB_PATH 到旧的 ORACLE_HOME 相关的目录下。
5. 切换目录到 12c 数据库软件的 ORACLE_HOME/bin。
6. 执行 emdwgrd。

对于单数据库实例 (非 RAC) 运行下面的命令:

 

emdwgrd[sh|bat] -save -sid old_SID -path save_directory

Oracle Real Application Clusters(Oracle RAC)数据库:

需要跨节点远程拷贝。定义一个环境变量 EM_REMCP 来指向远程拷贝的命令,比如:export EM_REMCP /usr/bin/scp

emdwgrd -save -cluster -sid old_SID -path save_directory

7. 输入 SYS 密码。

 

使用 emremove.sql 手工删除 DB control

 

1)关闭 DB control

emctl stop dbconsole

 
2)使用 sysdba 登陆

SQL>SET ECHO ON
SQL>SET SERVEROUTPUT ON
SQL>@emremove.sql >> 脚本位于新的 12c ORACLE_HOME/rdbms/admin

从系统中手工删除 ORACLE_HOME/HOSTNAME_SID/ 和 ORACLE_HOME/oc4j/j2ee/OC4J_DBConsole_HOSTNAME_SID 目录。
如果是 windows 系统则删除 DB Console service OracleDBConsoleSID。

 

删除 JSON-Enabled Context search 索引

 

如果源库版本为 12.1.0.2 并且创建了 JSON search index 那么 Oracle 推荐先删除这些索引,在升级后再创建回来。

 

检查使用大小写不敏感密码的用户

 

使用管理用户登录 SQL*Plus,并且执行下面的语句:

 

SELECT USERNAME,PASSWORD_VERSIONS FROM DBA_USERS;

如果存 在10g 版本的用户,建议根据 Oracle 文档解决这个问题,否则升级后用户会被 LOCK。

 

删除 Unified Auditing Schema and Roles

 

注意:如果在 Oracle 12.1 数据库上已经创建了 AUDIT_ADMIN   或者 AUDIT_VIEWER 用户或者 roles,或者数据库是在 12.1 版本上创建(而不是升级上来的),那么不需要删除这些 role 和 AUDSYS 用户。

更多信息请参照 URL

 

https://docs.oracle.com/database/122/UPGRD/unified-auditing-audit_admin-audit_viewer-changes.htm#UPGRD60010

 

使用 SYS 以 SYSDBA 权限登录 SQL*Plus,删除 AUDSYS,如果存在的话。以 migrate 模式启动数据库并删除 AUDSYS 用户

 

SQL> startup migrate pfile=$T_WORK/t_init1.ora
ORACLE instance started.
SQL> drop user audsys cascade;

删除 AUDIT_ADMIN 和 AUDIT_VIEWER roles

DROP ROLE AUDIT_ADMIN;
DROP ROLE AUDIT_VIEWER;

 

在升级过程中把某些 schema 的表空间置于 offline

 

记下所有在升级过程中需要 offline 的表空间,使用 -T 选项指定表空间的名字

 

dbupgrade –T

从 Oracle database 12.2 开始,可以在并行升级时使用 -T 参数来 offline 一些用户表空间。把用户表空间 offline 可以减少升级前的备份工作。并行升级工具(catctl.pl)可以自动选取正确的表空间来设置为只读。这个工具不会把任何包含Oracle 自带的对象的表空间设置为只读。

 

保留降级的能力

 

如果计划把数据库降级到之前的版本,那么需要在源库上打 patch 20898997,否则不能降级。

在源库执行:

$ORACLE_HOME/OPatch/opatch lsinventory -bugs_fixed |grep –i “20898997”

 
如果尚未应用这个补丁,那么从 MOS 下载 patch 20898997 并安装。

 

关于 Audit table 的升级前要求

 

 

如果要升级的数据库版本低于 12.1 并且使用了 Oracle database Vault,Oracle Label Security,那么必须先执行 olspreupgrade.sql。从目标 Oracle_home 拷贝 $ORACLE_HOME/rdbms/admin/olspreupgrade.sql 到源库的ORACLE_HOME,使用 DVOWNER 登陆源库

 

SQL> GRANT DV_PATCH_ADMIN to SYS;

在把 DV_PATCH_ADMIN 权限赋给 SYS 后,使用 SYSDBA 登陆执行 olspreupgrade.sql

 

SQL>ORACLE_HOME/rdbms/admin/olspreupgrade.sql

执行完毕后,以 DVOWNER 登陆数据库并收回 SYS 的 DV_PATCH_ADMIN 权限

 

SQL> REVOKE DV_PATCH_ADMIN from SYS;

 

集群数据库的需要

 

在升级数据库前需要先升级 GI 软件。如果是 RAC,那么需要把参数文件中的 CLUSTER_DATABASE 参数设置为 false。

注意!在升级完毕后需要把CLUSTER_DATABASE再设置回来

 

其它检查

 

1. 确保 ARCHIVE_LOG 以及 FLASHBACK 目录有足够的磁盘空间。

2. 执行下面的查询来检查是否有和SDO_GEOMETRY关联的表

 

col owner format a15
col table_name format a30
col column_name format a30
SELECT owner,table_name,column_name FROM dba_tab_columns WHERE data_type = ‘SDO_GEOMETRY’ AND owner != ‘MDSYS’ ORDER BY 1,2,3;

如果有返回行数,那么在升级前需要往12.2的ORACLE HOME上打补丁 patch 25293022

 

步骤6:Preupgrade 步骤

 

在源库执行 Preupgrade 脚本

 

$Earlier_release_Oracle_home/jdk/bin/java -jar $New_release_Oracle_home/rdbms/admin/preupgrade.jar FILE TEXT DIR output_dir

 

FILE – 使用这个参数把输出写入输出文件
TEXT – 使用这个参数指定日志格式为 TEXT 模式(如果不指定的话则为 XML 格式)
DIR – 日志会创建在<output_dir>指定的这个目录中

 

建议执行 pre-upgrade 的 fixup 脚本,如果发现的问题是可以使用这个脚本修复的话。

 

Network Utility 包的依赖关系

 

执行下面的语句

 

SQL> SELECT * FROM DBA_DEPENDENCIES WHERE referenced_name IN (‘UTL_TCP’,’UTL_SMTP’,’UTL_MAIL’,’UTL_HTTP’,’UTL_INADDR’,’DBMS_LDAP’) AND owner NOT IN (‘SYS’,’PUBLIC’,’ORDPLUGINS’);

 

在升级测试中,确保使用新的访问控制。在升级后确保这些包是可用的,在升级后,根据源库的使用情况赋予正确的权限。

 

检查 Time zone 文件版本

 

检查目标数据库的 time zone 文件版本是否低于源库的 time zone 文件版本,如果是的话,需要升级目标数据库的 time zone 文件版本。数据库 DST 补丁可以从 Note 412160.1 下载。

 

步骤7:升级数据库到 12.2

 

设置环境变量指向目标 ORACLE_HOME

 

export ORACLE_HOME=<Oracle 12.2的目录>
export PATH=$ORACLE_HOME/bin:$PATH
export ORACLE_BASE=<安装时指定的Oracle_Base目录>

 

使用目标 ORACLE_HOME(设置 ORACLE_HOME 为目标 ORACLE_HOME)启动数据库到 upgrade 模式

 

CONNECT / AS SYSDBA
SQL> startup upgrade;
SQL> exit

在 Linux/Unix 上

cd $ORACLE_HOME/bin
./dbupgrade 

在 Windows 上

cd %ORACLE_HOME%\bin
dbupgrade

执行 Post-Upgrade Status 工具,utlu122s.sql 并且检查升级的日志。在新的版本下执行 Post-Upgrade Status 工具。

 

$ sqlplus “/as sysdba”
SQL> STARTUP
SQL> @utlu122s.sql

 

检查升级日志看是否脚本 catuppst.sql 已被执行。如果尚未执行,那么在新的 ORACLE_HOME 里手工执行。这个脚本被放置在 $ORACLE_HOME/rdbms/admin 目录。

SQL> @catuppst.sql

在另一个 session 里执行 utlrp.sql 来编译 stored PL/SQL 和 Java 代码。

SQL> @utlrp.sql

检查诊断升级/迁移相关的状态的 Oracle 数据字典。dbupgdiag.sql 脚本可以收集和升级迁移诊断信息有关的数据字典的信息,可以在升级的数据库上以 SYS 用户来执行它,关于更多信息,请参考文档 Note 556610.1 Script to Collect DB Upgrade/Migrate Diagnostic Information (dbupgdiag.sql)

如果脚本 dbupgdiag.sql 发现了失效对象,执行 $ORACLE_HOME/rdbms/admin/utlrp.sql (多次) 来使它们生效,直到失效对象的个数不再改变。之后重新执行 dbupgdiag.sql 并确保没有任何问题。

如果使用了集群,那么必须升级这个数据库的 Oracle Clusterware keys,运行 srvctl 来完成,比如:

ORACLE_HOME/bin/srvctl upgrade database -db name -o ORACLE_HOME

 

步骤8:升级后步骤

 

在 Linux 和 Unix 上设置环境变量

 

确保下面的环境变量指向了新的 ORACLE_HOME 对应的目录:

ORACLE_HOME
PATH

 

更新 oratab 文件

 

修改 /etc/oratab 文件对应的条目指向新的 ORACLE_HOME 目录

 

Post-upgrade fixup 脚本

 

执行 pre-upgrade 产生的 post-upgrade fixup 脚本

 

升级依赖 Oracle-Maintained 类型的表

 

从 Oracle 12.2 开始,必须手工升级依赖 Oracle-Maintained 类型的用户表。

在数据库升级后确认需要升级的用户表,使用 SYSDBA 连接到数据库并执行下面的语句来列出这些表:

 

COLUMN owner FORMAT A30
COLUMN table_name FORMAT A30
SELECT DISTINCT owner, table_name
FROM dba_tab_cols
WHERE data_upgraded = ‘NO’
ORDER BY 1,2;

使用 sysdba 权限的用户,或者有权限 alter 所有这些表的用户执行脚本 utluptabdata.sql:

 

SET SERVEROUTPUT ON
@utluptabdata.sql

 

启用新的 Extended Data Type 功能(并不是必须的)

 

启用 extended data types 需要一些特定的操作。

Oracle 数据库 12c 引入了扩展 VARCHAR2,NVARCHAR2,以及 RAW 数据类型的大小的功能。设置 MAX_STRING_SIZE = EXTENDED 可以扩展这些数据类型最大限制到32767字节。

要实现这个功能,初始化参数 COMPATIBLE 要设置为 12.0.0.0 或者更高。 关于更多信息,请参考Note 1570297.1

 

升级 Recovery Catalog

 

如果 recovery catalog schema 比要备份的数据库版本低,那么必须升级它。可以使用 UPGRADE CATALOG 命令来升级。

请参 照Oracle 文档的”Upgrading the Recovery Catalog” 部分来得到更多信息。

 

在升级数据库后升级 Time Zone 文件版本

 

如果 Pre-Upgrade Information 工具要求在升级数据库后升级 time zone 文件版本,那么需要使用 DBMS_DST PL/SQL 包来更新 RDBMS DST (timezone) 版本。

参照 Oracle 文档的”Steps to Upgrade Time Zone File and Timestamp with Time Zone Data” 部分以及 Note 1509653.1 “Updating the RDBMS DST version in 12c Release 1 (12.1.0.1 and up) using DBMS_DST”

 

升级统计信息表

 

如果之前使用 DBMS_STATS.CREATE_STAT_TABLE 创建了一些统计信息表,那么使用 DBMS_STATS.UPGRADE_STAT_TABLE 来升级这些表。在下面的例子里,统计信息表的名字是’dictstattab’,而 SYS 是这个表的 owner。

EXECUTE DBMS_STATS.UPGRADE_STAT_TABLE(‘SYS’, ‘dictstattab’);

对所有的统计信息表都执行类似上面的语句。

 

升级外部验证的 SSL 用户

 

如果是从 9.2 或者 10.1 的数据库升级上来的,并且使用了外部验证的 SSL 用户,那么需要执行 SSL external users conversion (extusrupgrade) 脚本来升级这些用户。

ORACLE_HOME/rdbms/bin/extusrupgrade –dbconnectstring hostname:port_no:sid –dbuser db_admin –dbuserpassword password -a

 

在升级数据库后安装 Oracle Text Supplied Knowledge Bases

 

Oracle Text-supplied knowledge bases 是数据库 12c 的扩展产品的一部分,并不会在升级后立刻可用。所有依赖这个产品的 Oracle Text 功能在升级后都会不可用。如需重新启用这些功能,需要从安装介质安装 Oracle Text supplied knowledge bases。

 

升级数据库后更新 Oracle Application Express Configuration

 

如果要升级的数据库包含了 Oracle Application Express release 3.2 或更高版本,那么不需要做任何额外的操作。但是如果 Oracle Application Express 存在 registry里,那么 Oracle Application Express 是从低版本升级而来,那么需要设置open_cursors 参数最小为200。

 

对额外的 Network Services 配置访问控制列表(ACLs)

 

如果应用程序使用了 UTL_TCP, UTL_SMTP, UTL_MAIL, UTL_HTTP, 或者 UTL_INADDR 包,那么升级后需要配置这些包的网络访问控制列表 (ACLs)来让它们像之前版本一样正常工作。如果没有正确配置访问控制列表 (ACLs),应用程序会碰到错误”ORA-24247: network access denied by access control list (ACL).”

 

检查参数 SQLNET.ALLOWED_LOGON_VERSION

 

10g 之前的客户端连接到 12c 数据库会碰到 ORA-28040: No matching authentication protocol 错误,请参考 Oracle 文档来解决 ORA-28040: No matching authentication protocol 问题。

手动升级到 Non-CDB Oracle Database 12c(12.2)的核对清单

Oracle 12c 升级失败的降级步骤

 

降级前步骤
– XML DB 组件在 12c 中是必需的。
在升级到 12c 期间,将安装 XML DB 组件(如果未安装)。
从 12c 降级将删除安装的 XDB 组件

- Enterprise Manager 不支持降级。在降级之前,请重新配置 Oracle EM 控件。请参阅
Oracle Database Upgrade Guide 12c Release 1 (12.1) E17642-10
6 Downgrading Oracle Database to an Earlier Release
6.6.5 Restoring Oracle Enterprise Manager after Downgrading Oracle Database

- 升级到 12c 期间,将删除 Database Control 资料档案库。降级之后,需重新配置 DB Control。

Note 870877.1 How To Save Oracle Enterprise Manager Database Control Data Before Upgrading The Single Instance Database To Other Release ?
Note 876353.1 How To Restore The Oracle Enterprise Manager Data To Downgrade The Single Instance Database To Previous/Source Release ?
- compatible 参数不能已经更改到 12.1.0。
- 禁用 Data Vault(如果已启用)。
Note 803948.1  How To Uninstall Or Reinstall Database Vault in 11g (UNIX)
Note 453902.1 Enabling and Disabling Oracle Database Vault in WINDOWS
- 如果数据库使用 Oracle Label Security,则在新 Oracle Database 12c Oracle 主目录中运行 Oracle Label Security (OLS) 预处理降级 olspredowngrade.sql 脚本(在 $ORACLE_HOME/rdbms/admin 上提供)。注意!此步骤仅在需要降级到12c之前的版本时才需要
- 时区版本应相同。假设升级数据库的过程中时区版本也被升级了,那么要在升级前在源库打patch。例如11.2.0.4已经升级到12.1.0.2.0,时区作为升级的一部分也升级到了18,那么在降级到11.2.0.4.0之前,在11.2.0.4.0的home上打时区patch 18。
- 取消设置并指向 12c 主目录的 ORA_TZFILE(如果已设置)。
- 如果数据库上有 Oracle Application Express,则必须将 apxrelod.sql 文件从 Oracle Database 12c $ORACLE_HOME/apex/ 目录复制到 Oracle 主目录之外的目录,例如系统上的临时目录以稍后执行。
- 如果基于固定对象创建了对象,则删除这些对象以避免可能的 ORA-00600 错误。您可以在降级之后重新创建这些对象。
- 如果降级集群数据库,则彻底关闭实例并将 CLUSTER_DATABASE 初始化参数更改为 FALSE。降级之后,必须将此参数设置回 TRUE。
- 假设在升级之后你已经在升级的数据库打了patch,那么在降级之前,在数据库级别使用datapatch回滚这些patch。

满足以上先决条件之后,可以继续进行降级。

数据库的降级步骤

1) 确保所有数据库组件有效。只能从成功升级的数据库执行降级。要验证数据库组件状态,请执行以下查询

以 SYS 用户身份连接到数据库

col comp_id format a10
col comp_name format a30
col version format a10
col status format a8

select substr(comp_id,1,15) comp_id,substr(comp_name,1,30) comp_name,substr(version,1,10) version,status from dba_registry

2) 验证没有属于 sys/system 的无效对象

select owner, count(object_name) "Invalid object count" from dba_objects where status!='VALID' and owner in ('SYS','SYSTEM') group by owner;

如果计数为零,则可以继续降级。

如果有无效对象,则执行 utlrp.sql 多次,如果对象无法解析为有效状态,则不能继续降级。建立 SR 或在 DBA 社区上发帖以寻求帮助。

或者,对于 1 和 2,运行以下脚本:

Note 556610.1  Script to Collect DB Upgrade/Migrate Diagnostic Information (dbupgdiag.sql)

3) 关闭数据库

Shutdown immediate

4)  对 12c 数据库做备份

5)  以降级模式启动数据库

Startup downgrade;

6)  执行降级脚本

Sql> Spool downgrade.log

Sql> @$ORACLE_HOME/rdbms/admin/catdwgrd.sql

注:
$ORACLE_HOME 应指向 12c 主目录

catdwgrd.sql 脚本将数据库中的所有组件降级到支持的主版本或补丁集版本(您最初升级时的版本)

Sql> spool off

Sql> shutdown immediate

Exit SQL Plus

Sql> exit;

7) 如果操作系统为 LINUX/UNIX:

将以下环境变量更改为要降级到的源数据库:

ORACLE_HOME

PATH

编辑 /etc/oratab or /var/opt/oracle/oratab 以更改

将数据库映射到源数据库 Oracle 主目录

如果操作系统是 Windows,则完成以下步骤:


a. 停止所有 Oracle 服务,包括 Oracle Database 12c 数据库的 OracleServiceSID Oracle 服务,其中 SID 是实例名称。

例如,如果 SID 为 ORCL,则在命令行提示符中输入以下内容:

C:\> NET STOP OracleServiceORCL
b. 在命令提示符下,通过运行 ORADIM 命令删除 Oracle 服务。如果出现提示,则输入此 Windows 系统上活动标准用户帐户的口令。

例如,如果 SID 为 ORCL,则输入以下命令:

C:\> ORADIM -DELETE -SID ORCL
c. 在命令提示符下,使用 ORADIM 命令创建要降级的数据库的 Oracle 服务。

C:\> ORADIM -NEW -SID SID -INTPWD PASSWORD -MAXUSERS USERS

-STARTMODE AUTO -PFILE ORACLE_HOME\DATABASE\INITSID.ORA

8) 还原配置文件

将配置文件(口令文件、参数文件等)还原到降级版本的 ORACLE_HOME。

9) 如果这是 Oracle RAC 数据库,则执行以下命令以将数据库修改为单实例模式:

SET CLUSTER_DATABASE=FALSE

10) 从降级版本 $ORACLE_HOME/rdbms/admin 目录执行 catrelod 脚本。

启动 sqlplus,以具有 sysdba 权限的用户 SYS 身份连接到数据库实例,然后以升级模式启动数据库:

: cd $ORACLE_HOME/rdbms/admin

: sqlplus

sql> connect sys as sysdba

sql> startup upgrade

sql> spool catrelod.log

sql> @?/rdbms/admin/catrelod.sql

sql> spool off

catrelod.sql 脚本在降级的数据库中重新加载各个数据库组件的合适版本。

11) 运行 utlrp.sql 脚本:

SQL> @utlrp.sql

Sql> exit;

utlrp.sql 脚本重新编译先前处于 INVALID 状态的所有现有 PL/SQL 模块,例如 package、procedure、type 等。

12) 检查已降级数据库的状态:

Note 556610.1  Script to Collect DB Upgrade/Migrate Diagnostic Information (dbupgdiag.sql)
此 sql 脚本是一组查询语句,用在升级前后诊断数据库的状态。脚本将创建名为 db_upg_diag__<时间戳>.log 的文件。

13) 降级之后,可能在 sys 用户下发现无效的 QT 视图。这是因为视图已从基表中选择了错误的列。需要重新创建这些视图。

请参阅说明:

Note 1520209.1 QT_*BUFER Views Invalid after downgrade from 12C

降级后步骤:
1)如果是降级到 Oracle Database 11g 版本 1 (11.1.0.7) 并且数据库中有 Oracle Application Express,则将 apxrelod.sql 脚本复制到的目录(在降级前步骤中)。
运行 apxrelod.sql 脚本以手动重新加载 Oracle Application Express:

SQL> @apxrelod.sql

运行 apxrelod.sql 脚本以避免程序包 APEX_030200.WWV_FLOW_HELP 由于以下错误而成为 INVALID 状态:

PLS-00201: identifier ‘CTX_DDL’ must be declared
2) 如果数据库中启用了 Oracle Label Security,则执行以下脚本

a. 从 Oracle Database 12c 的 Oracle 主目录下将 olstrig.sql 脚本复制到要将数据库降级到的版本的 Oracle 主目录。

b. 从降级到的版本的 Oracle 主目录,运行 olstrig.sql 以在表上使用 Oracle Label Security 策略重新创建 DML 触发器:

SQL> @olstrig.sql

3) 如果降级集群数据库,则必须运行以下命令以降级 Oracle Clusterware database 配置:

$ srvctl downgrade database -d db-unique-name -o oraclehome -t to_version
其中 db-unique-name 是数据库名称(而非实例名称),oraclehome 是已降级数据库的旧 Oracle 主目录的位置,to_version 是数据库所降级到的数据库版本

因为代码的改变,现在-to_version对应的值如下:

RDBMS Version -to_version Value
9.2.0.*              9.2.0.0.0
10.1.0.*            10.0.0.0.0
10.2.0.*            10.2.0.0.0
11.1.0.*            11.0.0.0.0
11.2.0.1            11.2.0.1.0
11.2.0.2            11.2.0.2.0
11.2.0.3            11.2.0.3.0
11.2.0.4            11.2.0.4.0

4) 如果发现失效的SYS/SYETEM的对象,那么可以使用下面的方法来编译这些失效对象

$ sqlplus "/ as sysdba"
SQL> startup upgrade
SQL> spool catout.log
SQL> @?/rdbms/admin/catproc.sql
SQL> @?/rdbms/admin/utlrp.sql
SQL> spool off
SQL> shutdown immediate

5) 如果数据库从11.2.0.4升级到12.1.0.2之前没有安装XDB,那么在降级回11.2.0.4的时候,XDB的对象需要被移除。然而,我们还是可以发现无效的XDB对象残留在数据库中。这已经被记录为一个bug 22854967,patch已经可以下载了。

Invalid XDB-related Objects After Downgrading To 11.2.0.4 (Doc ID 2163596.1)

Oracle 12c 升级失败的降级步骤

Oracle 11g Rman – 08317错误

在一次帮助客户解决归档满的过程中遭遇了此错误. 客户是新上线系统,11g版本.设置了归档清除脚本(脚本参考:http://www.ludatou.com/?p=766),结果发现以往没问题的脚本在此刻居然行不通,报了一对Rman-08517的错误,经查询发现是在11g里归档日志记录的一个机制导致的,必须加上force才可以.

 

命令如下:

delete noprompt force archivelog until time ‘sysdate – 7’;

 

经查询确认为Oracle 11g的归档日志记录机制导致的,需要在delete后加上force,metalink解释为:

RMAN-08137 When deleting archivelogs even when Streams CAPTURE does not require them [ID 1079953.1]
修改时间 27-FEB-2011     类型 PROBLEM     状态 PUBLISHED

In this Document
Symptoms
Changes
Cause
Solution
References

Symptoms

Applies To: 11gR1-11gR2

RMAN is unable to delete archive logs from database even though they are not needed by CAPTURE process anymore.

Changes

 

Cause

In 11g, RMAN looks at MIN_REQUIRED_CAPTURE_CHANGE# in v$database and not at DBA_CAPTURE. By default this value is updated every 6 hours. So by default the value in v$database will be six hours behind the current value.

Solution

Since changing frequency of update to MIN_REQUIRED_CAPTURE_CHANGE# in v$database can not be controlled, following workaround can be used:

Archive log which are not needed by capture process can be deleted using:
delete noprompt force archivelog all completed before ‘sysdate-10/1440’;

关于11g密码大小写验证以及密码延迟验证特性引发的血案所思

在Oracle的11g之前的版本中密码是不区分大小写的(使用双引号强制除外)。在Oracle的11g版本中对此有所增强。从此密码有了大小写的区分,这个大小写敏感特性是通过SEC_CASE_SENSITIVE_LOGON参数来控制的。该参数默认设置为true。

前阵子客户数据从10g迁往11g后,这个参数未修改,新上线第二天全国业务人员输入密码时候参照以前习惯密码全部小写,导致部分业务无法登陆,由于反复尝试登陆导致触发了著名的登陆延迟验证的特性延伸出了library cache lock(Delay after three failed login attempts),导致全线应用响应缓慢假死。当然解决过程当时根据现象,进程达到限制值,做了等待事件模型的分析,发现大量的session都没有成功登陆,很多信息都为null逐判断因为密码延迟验证的原因导致,而密码错误的事情再经过现场验证后发现了大小写敏感的问题。后面就是很熟套的剧情,领导抽了根烟,痛骂了原厂一顿,把此2个特性都关掉后,没有再出现过进程累积的现象。由此问题引发了对密码大小写验证以及密码延迟验证特性的思考.

一:关于sec_case_sensitive_logon的设置

关于sec_case_sensitive_logon参数所关联的密码大小写敏感,我建议在升级数据库系统时候关闭,在新建数据库系统时候关闭。可以配合应用密码策略使用,默认时候可以商讨关闭。

二:密码延迟验证的特性使用与否
关于Delay after three failed login attempts的特性参考yangtingkun的http://blog.itpub.net/4227/viewspace-672925

这个密码延迟特性特性的解决办法有几种,这里我大概描述下:

    1.设置登陆密码验证失败次数超过几次后锁定用户,该办法可以防止用户反复登陆验证,但是也会增加一定的维护工作
    2.通过诊断事件28401关闭密码延迟验证

同时我并不建议用第二种办法,密码延迟验证的特性可以有效遏制恶意破解密码的行为,因此从数据库安全的角度我建议设置密码失败超数锁用户。

关于event 28401

ALTER SYSTEM SET EVENT = '28401 TRACE NAME CONTEXT FOREVER, LEVEL 1' SCOPE = SPFILE

[oracle@ludatou ~]$ oerr ora 28401
28401, 00000, "Event to disable delay after three failed login attempts"
// *Document: NO
// *Cause: N/A
// *Action: Set this event in your environment to disable the login delay
//          which will otherwise take place after three failed login attempts.
// *Note: THIS IS NOT A USER ERROR NUMBER/MESSAGE. THIS DOES NOT NEED TO BE
//        TRANSLATED OR DOCUMENTED.

Ora-00494 / Aiowait timeout cause instance crashed.(SYSTEM DUMP分析案例)

报错如下:

Wed Oct 21 03:06:22 2015
Thread 1 advanced to log sequence 254206 (LGWR switch)
  Current log# 9 seq# 254206 mem# 0: /boss_sysdata/oradata/dtvboss/redo09.log
Wed Oct 21 03:17:10 2015
WARNING: aiowait timed out 1 times
Wed Oct 21 03:22:11 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 367'
Wed Oct 21 03:22:17 2015
System State dumped to trace file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc
Wed Oct 21 03:22:50 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 367'
Wed Oct 21 03:22:55 2015
Killing enqueue blocker (pid=367) on resource CF-00000000-00000000
 by killing session 1648.1
Wed Oct 21 03:22:55 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-07445: exception encountered: core dump [ksuklms()+672] [SIGSEGV] [Address not mapped to object] [0x000000062] [] []
Wed Oct 21 03:23:18 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_pmon_343.trc:
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:18 2015
PMON: terminating instance due to error 470
Wed Oct 21 03:23:18 2015
ORA-470 encountered when generating server alert SMG-3503
Wed Oct 21 03:23:18 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_j006_22335.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:22 2015
Termination issued to instance processes. Waiting for the processes to exit
Wed Oct 21 03:23:28 2015
Instance termination failed to kill one or more processes
Wed Oct 21 03:23:58 2015
Instance terminated by PMON, pid = 343
Wed Oct 21 03:24:06 2015
Shutting down instance (abort)
Wed Oct 21 03:25:18 2015
WARNING: inbound connection timed out (ORA-3136)
Wed Oct 21 03:25:18 2015
WARNING: inbound connection timed out (ORA-3136)
Wed Oct 21 03:25:18 2015

1.宕机的起因分析:

告警日志分析:
Wed Oct 21 03:06:22 2015
Thread 1 advanced to log sequence 254206 (LGWR switch)
Current log# 9 seq# 254206 mem# 0: /boss_sysdata/oradata/dtvboss/redo09.log
Wed Oct 21 03:17:10 2015
WARNING: aiowait timed out 1 times –该时间段系统出现aio等待的告警,该告警由_iowait_timeouts隐含参数控制,超过该参数设置的值后会导致系统宕机,这里并未超过系统设置值。但是提示我们当时系统在io调度上出现了问题。
Wed Oct 21 03:22:11 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by ‘inst 1, osid 367’ –22分17秒系统抛错,进程367持有controlfile的enqueue lock时间被其他进程(arc0)等待超时,该enqueue lock持有时间由隐含参数_controlfile_enqueue_timeout控制,默认为900秒
Wed Oct 21 03:22:17 2015
System State dumped to trace file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc
Wed Oct 21 03:22:50 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by ‘inst 1, osid 367’ –22分50秒,进程367持有的controlfile的enqueue lock时间被其他进程(lgwr)等待超时抛错
Wed Oct 21 03:22:55 2015
Killing enqueue blocker (pid=367) on resource CF-00000000-00000000 –LGWR进程发现异常查杀持有controlfile equeue lock的进程367
by killing session 1648.1
Wed Oct 21 03:22:55 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-07445: exception encountered: core dump [ksuklms()+672] [SIGSEGV] [Address not mapped to object] [0x000000062] [] []
Wed Oct 21 03:23:18 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_pmon_343.trc: –后续一系列报错宕机开始
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:18 2015
PMON: terminating instance due to error 470
Wed Oct 21 03:23:18 2015
ORA-470 encountered when generating server alert SMG-3503
Wed Oct 21 03:23:18 2015
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_j006_22335.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:22 2015
Termination issued to instance processes. Waiting for the processes to exit
Wed Oct 21 03:23:28 2015
Instance termination failed to kill one or more processes

Trace文件分析(trace过长摘取关键部分):
*** SERVICE NAME:(SYS$BACKGROUND) 2015-10-21 03:22:11.264
*** SESSION ID:(1646.3) 2015-10-21 03:22:11.264
*** 2015-10-21 03:22:11.264
Unable to get enqueue on resource CF-00000000-00000000 (ges mode req=3 held=6)
Possible local blocker ospid=367 sid=1648 sser=1 time_held=37 secs (ges mode req=6 held=4)
DUMP LOCAL BLOCKER: initiate state dump for KILL BLOCKER
possible owner[17.367] on resource CF-00000000-00000000
Dumping process info of pid[17.367] requested by pid[23.404]
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by ‘inst 1, osid 367’

——————————————————————————-
ENQUEUE [CF] HELD FOR TOO LONG

enqueue holder: ‘inst 1, osid 367’

Process ‘inst 1, osid 367’ is holding an enqueue for maximum allowed time.
The process will be terminated.

Oracle Support Services triaging information: to find the root-cause, look
at the call stack of process ‘inst 1, osid 367’ located below. Ask the
developer that owns the first NON-service layer in the stack to investigate.
Common service layers are enqueues (ksq), latches (ksl), library cache
pins and locks (kgl), and row cache locks (kqr).

Dumping process 17.367 info:
*** 2015-10-21 03:22:11.300
Dumping diagnostic information for ospid 367:
OS pid = 367
loadavg : 0.44 0.62 1.82
swap info: free_mem = 1435.35M rsv = 59461.24M
alloc = 55941.14M avail = 55124.91 swap_free = 58645.01M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S oracle 367 1 0 39 20 ? 4116160 ? Mar 30 ? 498:14 ora_ckpt_dtvboss — 整个系统当时的空闲内存和swap还留有余地,足够当时系统的使用,所以内存不足的原因可以排除。这里提示进程367为oracle ckpt进程,可以确认当时持有controlfile enqueue lock的进程为oracle ckpt进程

PROCESS 23: –该进程为arc0 归档进程,trace信息显示在等待controlfile enqueue 队列锁。等待事件为enq: CF – contention
—————————————-
SO: 8a9004f88, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=23, calls cur/top: 8a95c6650/8a95c6650, flag: (2) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 24
last post received-location: ksasnd
last process to post me: b66002018 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: b66005770 1 2
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 5eb3388f0
O/S info: user: oracle, term: UNKNOWN, ospid: 404
OSD pid info: Unix process pid: 404, image: oracle@hzboss-ora1 (ARC0)
Dump of memory from 0x00000005F033B1F0 to 0x00000005F033B3F8
5F033B1F0 0000000C 00000000 00000008 AF739C70 [………….s.p]
5F033B200 00000006 000313A7 00000008 A95D1620 [………….]. ]
5F033B210 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B220 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B230 00000006 000313A7 00000008 A95D1FB8 [………….]..]
5F033B240 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B250 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B260 00000006 000313A7 00000008 A95D1FB8 [………….]..]
5F033B270 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B280 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B290 00000006 000313A7 00000008 A95D1FB8 [………….]..]
5F033B2A0 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B2B0 00000006 000313A7 00000008 A95D17E8 [………….]..]
5F033B2C0 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B2D0 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B2E0 00000006 000313A7 00000008 A95D1880 [………….]..]
5F033B2F0 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B300 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B310 00000006 000313A7 00000008 A95D16B8 [………….]..]
5F033B320 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B330 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B340 00000006 000313A7 00000008 A95D1880 [………….]..]
5F033B350 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B360 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B370 00000006 000313A7 00000008 A95D16B8 [………….]..]
5F033B380 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B390 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B3A0 00000006 000313A7 00000008 A95D17E8 [………….]..]
5F033B3B0 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B3C0 00000006 000313A7 00000008 AF739C70 [………….s.p]
5F033B3D0 00000006 000313A7 00000008 A95D17E8 [………….]..]
5F033B3E0 00000005 000313A7 00000008 AF739C70 [………….s.p]
5F033B3F0 00000006 000313A7 [……..]
(FOB) flags=2 fib=8af713780 incno=0 pending i/o cnt=0
fname=/boss_sysdata/oradata/dtvboss/control03.ctl
fno=2 lblksz=16384 fsiz=1308
(FOB) flags=2 fib=8af7133e0 incno=0 pending i/o cnt=0
fname=/boss_sysdata/oradata/dtvboss/control02.ctl
fno=1 lblksz=16384 fsiz=1308
(FOB) flags=2 fib=8af713040 incno=0 pending i/o cnt=0
fname=/boss_sysdata/oradata/dtvboss/control01.ctl
fno=0 lblksz=16384 fsiz=1308
—————————————-
SO: 8a9778500, type: 11, owner: 8a9004f88, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 8a9004f88,
event: 21, last message event: 21,
last message waited event: 21, messages read: 0
channel: (5eb52cc20) scumnt mount lock
scope: 1, event: 22, last mesage event: 20,
publishers/subscribers: 0/20,
messages published: 1
—————————————-
SO: 8a95b7328, type: 4, owner: 8a9004f88, flag: INIT/-/-/0x00
(session) sid: 1646 trans: 0, creator: 8a9004f88, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-0017-0000000D, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
service name: SYS$BACKGROUND
last wait for ‘enq: CF – contention’ blocking sess=0x0 seq=43322 wait_time=2929723 seconds since wait started=36
name|mode=43460004, 0=0, operation=0
Dumping Session Wait History
for ‘enq: CF – contention’ count=1 wait_time=2929723
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929725
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929726
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929726
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929725
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929723
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929725
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929726
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929724
name|mode=43460004, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929750
name|mode=43460004, 0=0, operation=0

SO: 8a95d1fb8, type: 5, owner: 8a95c6650, flag: INIT/-/-/0x00
(enqueue) CF-00000000-00000000 DID: 0001-0017-0000000D
lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x5eb40f608, lock_flag: 0x0
own: 0x8a95b7328, sess: 0x8a95b7328, prv: 0x8a95d1fe8 –这里说明arc0在申请0x5eb40f608的队列锁资源。而在process 17号的ckpt中可以看到0x5eb40f608资源被ckpt持有

process 17(ckpt进程)中的句柄:

SO: 8a95b9df8, type: 4, owner: 8a90037d0, flag: INIT/-/-/0x00
(session) sid: 1648 trans: 0, creator: 8a90037d0, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-0011-00000005, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
service name: SYS$BACKGROUND
waiting for ‘direct path write’ blocking sess=0x0 seq=39369 wait_time=0 seconds since wait started=37
file number=72, first dba=1, block cnt=1
Dumping Session Wait History
for ‘direct path write’ count=1 wait_time=0
file number=73, first dba=1, block cnt=1
for ‘direct path write’ count=1 wait_time=0
file number=74, first dba=1, block cnt=1
for ‘direct path write’ count=1 wait_time=0
file number=75, first dba=1, block cnt=1
for ‘direct path write’ count=1 wait_time=0
file number=76, first dba=1, block cnt=1
for ‘direct path write’ count=1 wait_time=309
file number=77, first dba=1, block cnt=1
for ‘direct path read’ count=1 wait_time=1
file number=8b, first dba=1, block cnt=1
for ‘direct path read’ count=1 wait_time=0
file number=8a, first dba=1, block cnt=1
for ‘direct path read’ count=1 wait_time=0
file number=89, first dba=1, block cnt=1
for ‘direct path read’ count=1 wait_time=1
file number=88, first dba=1, block cnt=1
for ‘direct path read’ count=1 wait_time=1
file number=87, first dba=1, block cnt=1
temporary object counter: 0
—————————————-
SO: 8a95d1cc0, type: 5, owner: 8a95c5df8, flag: INIT/-/-/0x00
(enqueue) CF-00000000-00000000 DID: 0001-0011-00000005
lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x5eb40f608, mode: SSX, lock_flag: 0x0 —————-ckpt以ssx的方式持有了该资源
own: 0x8a95b9df8, sess: 0x8a95b9df8, proc: 0x8a90037d0, prv: 0x5eb40f618

PROCESS 10: LGWR进程
—————————————-
SO: b66002018, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=10, calls cur/top: b665c6de0/b665c6de0, flag: (6) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 24
last post received-location: ksasnd
last process to post me: 8a9000860 1 6
last post sent: 109 0 4
last post sent-location: kslpsr
last process posted by me: 8a902ef90 105 0
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 5eb3388f0
O/S info: user: oracle, term: UNKNOWN, ospid: 365
OSD pid info: Unix process pid: 365, image: oracle@hzboss-ora1 (LGWR)

SO: b665bf398, type: 4, owner: b66002018, flag: INIT/-/-/0x00
(session) sid: 824 trans: 0, creator: b66002018, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-000A-00000006, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
service name: SYS$BACKGROUND
waiting for ‘enq: CF – contention’ blocking sess=0x8a95b9df8 seq=61266 wait_time=0 seconds since wait started=0 0x8a95b9df8该句柄持有者为process 17的ckpt进程
name|mode=43460005, 0=0, operation=0
Dumping Session Wait History
for ‘enq: CF – contention’ count=1 wait_time=2929719
name|mode=43460005, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929716
name|mode=43460005, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929716
name|mode=43460005, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929718
name|mode=43460005, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929719
name|mode=43460005, 0=0, operation=0
for ‘enq: CF – contention’ count=1 wait_time=2929719

—————————————-
SO: b665c6de0, type: 3, owner: b66002018, flag: INIT/-/-/0x00
(call) sess: cur b665bf398, rec 0, usr b665bf398; depth: 0
—————————————-
SO: b665cce90, type: 5, owner: b665c6de0, flag: INIT/-/-/0x00
(enqueue) CF-00000000-00000000 DID: 0001-000A-00000006
lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x5eb40f608, req: SSX, lock_flag: 0x0 ————前面已经说明了,该5eb40f608的资源队列锁已经被ckpt进程持有,而lgwr正在申请以ssx的方式持有。
own: 0xb665bf398, sess: 0xb665bf398, proc: 0xb66002018, prv: 0x5eb40f628

综上可以得出结论,由于arc0进程要写日志需要等待控制文件CF锁,lgwr估计要写日志或者切换也在等待控制文件CF锁,而ckpt进程持有controlfile enqueue lock时间过长,arch0等待超过900秒,导致oracle数据库系统判断异常。
此时oracle数据库系统会通知lgwr进程kill持有controlfile enqueue lock的进程,相当于lgwr进程杀了ckpt进程,而ckpt进程又为oracle关键进程,连锁反应,引起宕机。

Oracle Rac hang 案例分析一则(WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK)

LOG FILE
————–
Filename=zsinsraca1_151006_0000.nmon.xlsx
See the following:

+++ CPU_ALL ========> CPU usage is NOT high

CPU Total zsinsraca1 User% Sys% Wait% Idle% CPU% Logical CPUs (SMTmode=4)
10/6/15 10:30 AM 5.2 1.7 0.2 92.9 6.9 48
10/6/15 10:31 AM 4.3 1.6 0.1 94 5.9 48
10/6/15 10:32 AM 7.6 1.5 0.2 90.6 9.1 48
10/6/15 10:33 AM 8.6 1.7 0 89.7 10.3 48
10/6/15 10:34 AM 8.8 1.8 0 89.3 10.6 48
10/6/15 10:35 AM 10.7 2 0.1 87.2 12.7 48
10/6/15 10:36 AM 9.3 2.1 0.1 88.6 11.4 48
10/6/15 10:37 AM 9.4 2.1 0.1 88.4 11.5 48
10/6/15 10:38 AM 10.7 2.5 0.2 86.6 13.2 48
10/6/15 10:39 AM 10.2 2.4 0.1 87.3 12.6 48
10/6/15 10:40 AM 9.6 1.8 0 88.5 11.4 48
10/6/15 10:41 AM 9.6 2.1 0 88.2 11.7 48
10/6/15 10:42 AM 9 2.2 0.1 88.7 11.2 48
10/6/15 10:43 AM 9.2 1.7 0 89 10.9 48
10/6/15 10:44 AM 11 1.8 0 87.2 12.8 48
10/6/15 10:45 AM 10.7 1.9 0.1 87.3 12.6 48
10/6/15 10:46 AM 12.9 2.1 0.1 84.9 15 48
10/6/15 10:47 AM 9.4 1.5 0 89 10.9 48
10/6/15 10:48 AM 11.3 1.6 0.1 87.1 12.9 48
10/6/15 10:49 AM 11.8 2.4 0.1 85.7 14.2 48
10/6/15 10:50 AM 12.2 1.8 0 85.9 14 48
10/6/15 10:51 AM 13.2 1.9 0.1 84.8 15.1 48
10/6/15 10:52 AM 12.4 1.3 0.1 86.3 13.7 48
10/6/15 10:53 AM 12.7 1.8 0 85.5 14.5 48
10/6/15 10:54 AM 11.9 1.9 0.1 86.2 13.8 48
10/6/15 10:55 AM 10.2 2 0.1 87.6 12.2 48
10/6/15 10:56 AM 11.5 2.4 0.2 85.9 13.9 48
10/6/15 10:57 AM 11.4 1.8 0.1 86.6 13.2 48
10/6/15 10:58 AM 11.9 1.7 0 86.3 13.6 48
10/6/15 10:59 AM 10.3 1.6 0 88.1 11.9 48
10/6/15 11:00 AM 12.3 1.5 0.2 86 13.8 48
10/6/15 11:01 AM 12.1 1.7 0.1 86.2 13.8 48
10/6/15 11:02 AM 10.6 1.6 0.1 87.8 12.2 48
10/6/15 11:03 AM 13.8 2.4 0.1 83.7 16.2 48
10/6/15 11:04 AM 12.5 2.6 0.1 84.9 15.1 48
10/6/15 11:05 AM 11.4 1.8 0.1 86.7 13.2 48
10/6/15 11:06 AM 10.1 2.3 0.1 87.5 12.4 48
10/6/15 11:07 AM 9.7 2.1 0.1 88.2 11.8 48
10/6/15 11:08 AM 10.5 2 0 87.5 12.5 48
10/6/15 11:09 AM 11.3 1.1 0.1 87.5 12.4 48
10/6/15 11:10 AM 11.6 1.8 0.1 86.6 13.4 48
10/6/15 11:11 AM 9.5 1.4 0.1 89 10.9 48
10/6/15 11:12 AM 8.4 1.4 0 90.2 9.8 48
10/6/15 11:13 AM 9 1.6 0 89.3 10.6 48
10/6/15 11:14 AM 10.1 1.9 0 87.9 12 48
10/6/15 11:15 AM 11.7 2.2 0.1 86.1 13.9 48
10/6/15 11:16 AM 10.5 1.9 0.1 87.6 12.4 48
10/6/15 11:17 AM 11 1.2 0.1 87.8 12.2 48
10/6/15 11:18 AM 12.6 2.2 0.1 85.2 14.8 48
10/6/15 11:19 AM 9.9 1.6 0.1 88.4 11.5 48
10/6/15 11:20 AM 11.3 2.1 0.1 86.5 13.4 48
10/6/15 11:21 AM 11.4 1.3 0.1 87.3 12.7 48
10/6/15 11:22 AM 11.6 1.5 0.1 86.8 13.1 48
10/6/15 11:23 AM 12.7 3.2 0.1 84 15.9 48
10/6/15 11:24 AM 11.8 3.1 0.1 85.1 14.9 48
10/6/15 11:25 AM 9.1 1.5 0 89.3 10.6 48
10/6/15 11:26 AM 11.1 1.9 0.1 86.9 13 48
10/6/15 11:27 AM 13.3 3.9 0.4 82.4 17.2 48
10/6/15 11:28 AM 10.3 2 0.8 86.9 12.3 48
10/6/15 11:29 AM 10.2 1.2 0 88.6 11.4 48
10/6/15 11:30 AM 11.3 1.7 0 87 13 48
10/6/15 11:31 AM 13.9 2.8 0.1 83.2 16.7 48
10/6/15 11:32 AM 13.2 3.1 0.1 83.6 16.3 48
10/6/15 11:33 AM 10.9 2.2 0.1 86.8 13.1 48
10/6/15 11:34 AM 10.6 2.6 0.1 86.8 13.2 48
10/6/15 11:35 AM 12 2.6 0 85.4 14.6 48
10/6/15 11:36 AM 14.5 5.3 0.1 80.1 19.8 48
10/6/15 11:37 AM 11.4 1.6 0.1 86.9 13 48
10/6/15 11:38 AM 10.8 1.7 0 87.5 12.5 48
10/6/15 11:39 AM 9.7 1.6 0 88.6 11.3 48
10/6/15 11:40 AM 13.2 1.9 0 84.9 15.1 48
10/6/15 11:41 AM 10.6 1.3 0 88.2 11.9 48
10/6/15 11:42 AM 12.1 4 0.2 83.7 16.1 48
10/6/15 11:43 AM 11.9 4.4 0 83.6 16.3 48
10/6/15 11:44 AM 10.1 1.6 0 88.3 11.7 48
10/6/15 11:45 AM 12.4 2.4 0.4 84.8 14.8 48
10/6/15 11:46 AM 15.3 6.5 1.3 77 21.8 48
10/6/15 11:47 AM 13.4 4.4 1 81.2 17.8 48
10/6/15 11:48 AM 16.5 6.8 0.1 76.6 23.3 48
10/6/15 11:49 AM 15.9 5.1 0.1 78.9 21 48
10/6/15 11:50 AM 8.5 4.8 0.2 86.5 13.3 48

+++ MEM =========> free memory 在11:50时仍有3GB free

Memory zsinsraca1 Real Free % Virtual free % Real free(MB) Virtual free(MB) Real total(MB) Virtual total(MB)
10/6/15 10:30 AM 7.6 88 7476.4 28845 98304 32768
10/6/15 10:31 AM 7.6 88 7472.8 28845 98304 32768
10/6/15 10:32 AM 7.6 88 7454.3 28845 98304 32768
10/6/15 10:33 AM 7.5 88 7412.8 28845 98304 32768
10/6/15 10:34 AM 7.5 88 7374.2 28845.5 98304 32768
10/6/15 10:35 AM 7.4 88 7303.3 28845.5 98304 32768
10/6/15 10:36 AM 7.3 88 7147.1 28845.5 98304 32768
10/6/15 10:37 AM 7.3 88 7210.1 28845.5 98304 32768
10/6/15 10:38 AM 7.3 88 7170 28845.5 98304 32768
10/6/15 10:39 AM 7.3 88 7128 28845.5 98304 32768
10/6/15 10:40 AM 7.2 88 7126.4 28845.5 98304 32768
10/6/15 10:41 AM 7.2 88 7104.1 28845.5 98304 32768
10/6/15 10:42 AM 7.2 88 7052 28845.5 98304 32768
10/6/15 10:43 AM 7 88 6888.5 28845.6 98304 32768
10/6/15 10:44 AM 7.1 88 7028.3 28845.6 98304 32768
10/6/15 10:45 AM 7 88 6914.8 28845 98304 32768
10/6/15 10:46 AM 7 88 6917.3 28844.5 98304 32768
10/6/15 10:47 AM 7 88 6886.8 28844.1 98304 32768
10/6/15 10:48 AM 6.9 88 6822.9 28843.3 98304 32768
10/6/15 10:49 AM 6.8 88 6668 28842 98304 32768
10/6/15 10:50 AM 6.7 88 6602.4 28841.1 98304 32768
10/6/15 10:51 AM 6.6 88 6463.4 28840 98304 32768
10/6/15 10:52 AM 6.3 88 6235.7 28838.4 98304 32768
10/6/15 10:53 AM 6.5 88 6397.8 28837.6 98304 32768
10/6/15 10:54 AM 6.5 88 6384.6 28837.4 98304 32768
10/6/15 10:55 AM 6.5 88 6340.9 28836.2 98304 32768
10/6/15 10:56 AM 6.2 88 6123.9 28836.2 98304 32768
10/6/15 10:57 AM 6.3 88 6228.7 28835.8 98304 32768
10/6/15 10:58 AM 6.2 88 6143.9 28834.9 98304 32768
10/6/15 10:59 AM 6.2 88 6111.3 28834.2 98304 32768
10/6/15 11:00 AM 6.1 88 6017.9 28833.1 98304 32768
10/6/15 11:01 AM 5.9 88 5837.6 28832.9 98304 32768
10/6/15 11:02 AM 6 88 5888.8 28832 98304 32768
10/6/15 11:03 AM 5.9 88 5848.5 28831.3 98304 32768
10/6/15 11:04 AM 5.7 88 5606.2 28827.8 98304 32768
10/6/15 11:05 AM 5.7 88 5555.2 28827.1 98304 32768
10/6/15 11:06 AM 5.6 88 5503.2 28826.3 98304 32768
10/6/15 11:07 AM 5.4 88 5271.7 28825.5 98304 32768
10/6/15 11:08 AM 5.5 88 5388.1 28824.7 98304 32768
10/6/15 11:09 AM 5.3 88 5239.6 28824.7 98304 32768
10/6/15 11:10 AM 5.5 88 5391.1 28824.1 98304 32768
10/6/15 11:11 AM 5.4 88 5312.4 28823.9 98304 32768
10/6/15 11:12 AM 5.4 88 5276.5 28823.4 98304 32768
10/6/15 11:13 AM 5.4 88 5325.3 28823.3 98304 32768
10/6/15 11:14 AM 5.3 88 5231.7 28822.8 98304 32768
10/6/15 11:15 AM 5.2 88 5157.4 28821.8 98304 32768
10/6/15 11:16 AM 5.1 88 5054 28820.6 98304 32768
10/6/15 11:17 AM 5.1 88 5017.2 28820.5 98304 32768
10/6/15 11:18 AM 5 88 4928.5 28819.5 98304 32768
10/6/15 11:19 AM 5.1 88 5049.6 28821.2 98304 32768
10/6/15 11:20 AM 5.1 88 5015.3 28820.4 98304 32768
10/6/15 11:21 AM 5 88 4935.5 28819.6 98304 32768
10/6/15 11:22 AM 4.9 87.9 4815.3 28818.4 98304 32768
10/6/15 11:23 AM 4.7 87.9 4594.5 28816 98304 32768
10/6/15 11:24 AM 4.6 87.9 4528.6 28815.4 98304 32768
10/6/15 11:25 AM 4.6 87.9 4550.3 28814.9 98304 32768
10/6/15 11:26 AM 4.6 87.9 4486.5 28814.3 98304 32768
10/6/15 11:27 AM 4.4 87.9 4361.3 28813.4 98304 32768
10/6/15 11:28 AM 4.4 87.9 4287 28812.4 98304 32768
10/6/15 11:29 AM 4.3 87.9 4251.7 28811.8 98304 32768
10/6/15 11:30 AM 4.3 87.9 4187.3 28811 98304 32768
10/6/15 11:31 AM 4.2 87.9 4097.9 28810.4 98304 32768
10/6/15 11:32 AM 4.2 87.9 4101.4 28809.9 98304 32768
10/6/15 11:33 AM 4.1 87.9 4062.2 28809.6 98304 32768
10/6/15 11:34 AM 4 87.9 3960.2 28808.4 98304 32768
10/6/15 11:35 AM 3.9 87.9 3858.5 28806.8 98304 32768
10/6/15 11:36 AM 3.9 87.9 3790.8 28805.9 98304 32768
10/6/15 11:37 AM 3.8 87.9 3774.6 28805.7 98304 32768
10/6/15 11:38 AM 3.6 87.9 3586.9 28805.1 98304 32768
10/6/15 11:39 AM 3.9 87.9 3838.9 28804.7 98304 32768
10/6/15 11:40 AM 3.8 87.9 3763 28804.7 98304 32768
10/6/15 11:41 AM 3.7 87.9 3661.5 28804 98304 32768
10/6/15 11:42 AM 3.8 88.1 3763 28864.2 98304 32768
10/6/15 11:43 AM 3.8 88.1 3753.1 28864.3 98304 32768
10/6/15 11:44 AM 3.7 88.1 3676.4 28864.3 98304 32768
10/6/15 11:45 AM 3.5 88.1 3405.6 28864.3 98304 32768
10/6/15 11:46 AM 3.5 88.1 3484.2 28864.4 98304 32768
10/6/15 11:47 AM 3.5 88.1 3477 28864.5 98304 32768
10/6/15 11:48 AM 3.6 88.1 3570.6 28864.5 98304 32768
10/6/15 11:49 AM 4.7 88.1 4646.2 28868.9 98304 32768
10/6/15 11:50 AM 34.5 98 33959.3 32123.8 98304 32768

LOG FILE
————–
Filename=alert_zsiims1.log
See the following:

Tue Oct 6 02:34:52 2015
Thread 1 advanced to log sequence 37413 (LGWR switch)
Current log# 1 seq# 37413 mem# 0: +V5DATA/zsiims/onlinelog/group_1.9844.799569609
Tue Oct 6 10:50:38 2015
ALTER SYSTEM ARCHIVE LOG
Tue Oct 6 10:50:39 2015
Thread 1 advanced to log sequence 37414 (LGWR switch)
Current log# 3 seq# 37414 mem# 0: +V5DATA/zsiims/onlinelog/group_3.9851.799569423
Tue Oct 6 11:21:49 2015
Errors in file /oracle/admin/zsiims/bdump/zsiims1_j001_10027156.trc:
ORA-12012: error on auto execute of job 54375
ORA-27468: “EXFSYS.RLM$SCHDNEGACTION” is locked by another process
Tue Oct 6 11:23:07 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=142
System State dumped to trace file /oracle/admin/zsiims/udump/zsiims1_ora_20971728.trc
Tue Oct 6 11:25:43 2015
Errors in file /oracle/admin/zsiims/bdump/zsiims1_j002_6815818.trc:
ORA-12012: error on auto execute of job 54374
ORA-27468: “EXFSYS.RLM$EVTCLEANUP” is locked by another process
Tue Oct 6 11:26:57 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=28
System State dumped to trace file /oracle/admin/zsiims/bdump/zsiims1_smon_15335466.trc
Tue Oct 6 11:35:26 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=193
System State dumped to trace file /oracle/admin/zsiims/udump/zsiims1_ora_32309462.trc
Tue Oct 6 11:41:27 2015
Shutting down instance: further logons disabled

LOG FILE
————–
Filename=alert_zsiims2.log
See the following:
Tue Oct 6 10:50:40 2015
Thread 2 advanced to log sequence 25153 (LGWR switch)
Current log# 5 seq# 25153 mem# 0: +V5DATA/zsiims/onlinelog/group_5.9853.799569371
Tue Oct 6 11:45:06 2015
IPC Send timeout detected.Sender: ospid 7798804
Receiver: inst 1 binc 4 ospid 6881304
Tue Oct 6 11:45:08 2015
Trace dumping is performing id=[cdmp_20151006114508]
Tue Oct 6 11:48:58 2015
Shutting down instance (abort)
License high water mark = 450
Instance terminated by USER, pid = 20578736
Tue Oct 6 11:49:50 2015
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0

Filename=zsiims1_ora_20971728A.trc ===> 发生问题后产生的第一个systemstate dump, 在Oct 6 11:23:07AM产生的,此时数据库的状况应该已经很糟糕了
See the following:

50几个session在等待latch: row cache objects

33: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=45472
37: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=18549
68: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=13886
71: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=46595
81: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=48525
106: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11855
118: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11183
122: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8913
132: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=12194
140: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12643
149: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12173
154: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=15022
156: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12143
160: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=20615
173: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11470
174: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11668
177: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11249
183: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=10725
184: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11485
200: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9970
204: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11019
207: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8356
208: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8811
215: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8914
219: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10169
230: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9156
241: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=7911
244: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8785
249: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=17542
255: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9209
264: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9247
276: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8368
288: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8211
310: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=7413
317: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=17869
331: waiting for ‘latch: row cache objects'[Rcache object=700000418a9adf8,] seq=5309
352: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2625
353: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10183
361: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5948
364: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5303
366: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=5300
378: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=760
381: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4788
394: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4758
401: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=109
424: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2924
429: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=3120
432: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2426
454: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1861
457: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1579
463: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1603
468: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1016
470: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1076
486: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=501
576: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=16368
606: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=30065

还有很多session在等待row cache lock

28: waiting for ‘row cache lock’ [Rcache object=70000064a573ef8,] seq=40578
40: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=59768
93: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2938
98: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2777
193: waiting for ‘row cache lock’ [Rcache object=700000621252470,] seq=966
199: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=107
214: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3501
220: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=86
232: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1148
239: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=12734
252: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2908
259: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1134
260: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=65
266: waiting for ‘row cache lock’ [Rcache object=700000550af2690,] seq=1791
268: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1288
274: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=821
278: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1357
287: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=637
289: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
293: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3363
302: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
319: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=98
320: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=366
323: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2618
327: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=167
328: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1137
330: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=31
332: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=53
333: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
334: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=368
337: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=202
341: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=56
342: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=18
343: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
345: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=359
349: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=562
350: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=24
351: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=407
358: waiting for ‘row cache lock’ [Rcache object=70000078b3d6290,] seq=28
359: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
363: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
365: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
371: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
373: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
375: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=89
379: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=605
383: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=523
385: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=595
387: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=243
390: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
391: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=355
393: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=20
395: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
397: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1080
404: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
409: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
410: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=60
411: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1101
413: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=95
414: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=948
416: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=315
417: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=516
419: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=102
420: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=607
421: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=831
426: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=42
433: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=58
437: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=421
442: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=44
445: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=185
447: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=35
450: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=55
456: waiting for ‘row cache lock’ [Rcache object=700000789bc8cc8,] seq=349
459: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=127
465: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=234
474: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=52
475: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=25
478: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=41
490: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=49
610: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=15070

分析发生问题时候的systemstate dump trace zsiims1_ora_20971728A.trc:

50几个session在等待latch: row cache objects, 大部分是和700000768fd8ea8有关的

33: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=45472
37: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=18549
68: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=13886
71: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=46595
81: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=48525
106: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11855
118: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11183
122: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8913
132: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=12194
140: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12643
149: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12173
154: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=15022
156: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=12143
160: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=20615
173: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11470
174: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11668
177: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11249
183: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=10725
184: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11485
200: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9970
204: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=11019
207: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8356
208: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8811
215: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8914
219: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10169
230: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9156
241: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=7911
244: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8785
249: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=17542
255: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9209
264: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=9247
276: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8368
288: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=8211
310: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=7413
317: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=17869
331: waiting for ‘latch: row cache objects'[Rcache object=700000418a9adf8,] seq=5309
352: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2625
353: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=10183
361: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5948
364: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=5303
366: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=5300
378: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=760
381: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4788
394: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=4758
401: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=109
424: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2924
429: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=3120
432: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=2426
454: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1861
457: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1579
463: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1603
468: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1016
470: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=1076
486: waiting for ‘latch: row cache objects'[Latch 70000075f1cd020] seq=501
576: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=16368
606: waiting for ‘latch: row cache objects'[Latch 700000768fd8ea8] seq=30065

以其中一个比较典型的process为例:

PROCESS 576:
—————————————-
SO: 70000078326f638, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=576, calls cur/top: 70000039983fd08/70000039983f788, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 109 0 4
last post received-location: kslpsr
last process to post me: 700000786279d80 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: 700000786279d80 1 6
(latch info) wait_event=0 bits=0
Location from where call was made: kghfrunp: clatch: wait:
waiting for 700000768fd8ea8 Child row cache objects level=4 child#=9
Location from where latch is held: kghfrunp: clatch: wait: ===============> 可以看到这个latch是因为kghfrunp相关的功能发起的,而这个功能是需要从shared pool中释放空间有关
Context saved from call: 0 ===========================>kghfrunp: (KGH: Ask client to free unpinned space)
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
207 (5, 1444101815, 2)
174 (5, 1444101815, 2)
322 (5, 1444101815, 2)

还有很多session在等待row cache lock, 最多的对象70000078e9f0d28

28: waiting for ‘row cache lock’ [Rcache object=70000064a573ef8,] seq=40578
40: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=59768
93: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2938
98: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2777
193: waiting for ‘row cache lock’ [Rcache object=700000621252470,] seq=966
199: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=107
214: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3501
220: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=86
232: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1148
239: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=12734
252: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2908
259: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1134
260: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=65
266: waiting for ‘row cache lock’ [Rcache object=700000550af2690,] seq=1791
268: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1288
274: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=821
278: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1357
287: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=637
289: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
293: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=3363
302: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
319: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=98
320: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=366
323: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=2618
327: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=167
328: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1137
330: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=31
332: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=53
333: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
334: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=368
337: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=202
341: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=56
342: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=18
343: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
345: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=359
349: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=562
350: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=24
351: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=407
358: waiting for ‘row cache lock’ [Rcache object=70000078b3d6290,] seq=28
359: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
363: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=32
365: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=27
371: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
373: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
375: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=89
379: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=605
383: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=523
385: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=595
387: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=243
390: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
391: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=355
393: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=20
395: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=28
397: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1080
404: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=29
409: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=22
410: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=60
411: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=1101
413: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=95
414: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=948
416: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=315
417: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=516
419: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=102
420: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=607
421: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=831
426: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=42
433: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=58
437: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=421
442: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=44
445: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=185
447: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=35
450: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=55
456: waiting for ‘row cache lock’ [Rcache object=700000789bc8cc8,] seq=349
459: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=127
465: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=234
474: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=52
475: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=25
478: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=41
490: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=49
610: waiting for ‘row cache lock’ [Rcache object=70000078e9f0d28,] seq=15070

以其中的process 610为例:

PROCESS 610:
—————————————-
SO: 700000784271270, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=610, calls cur/top: 700000387cb3b58/700000387cb35d8, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 109 0 4
last post received-location: kslpsr
last process to post me: 700000786279d80 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: 700000786279d80 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 7000007852bdfc8
O/S info: user: oracle, term: UNKNOWN, ospid: 39321988
OSD pid info: Unix process pid: 39321988, image: oracle@zsinsraca1
Short stack dump:
ksdxfstk+002c< -ksdxcb+04e4<-sspuser+0074<-000047DC<-sskgpwwait+0034<-skgpwwait+00bc<-ksliwat+06c0<-kslwaitns_timed+0024<-kskthbwt+022c<-kslwait+00f4<-kqrigt+05e0<-kqrpre1+0960<-kqrpre+001c<-ktusmous_online_undoseg+05f4<-ktusmaus_add_us+0300<-ktubnd+04ac<-ktuchg+055c<-ktbchg2+01f0<-kdtchg+0524<-kdtwrp+2540<-kdtInsRow+0a20<-insrow+043c<-insdrv+0428<-inscovexe+02b8<-insExecStmtExecIniEngine+005c<-insexe+02f8<-opiexe+2738<-opipls+185c<-opiodr+0ae0<-rpidrus+01bc<-skgmstack+00c8<-rpidru+0088<-rpiswu2+034c<-rpidrv+095c<-psddr0+02bc<-psdnal+01d0<-pevm_EXECC+01f8<-pfrinstr_EXECC+0070<-pfrrun_no_tool+005c<-pfrrun+1014<-plsql_run+06b4<-pricar+042c<-pricbr+0258<-prient2+03a8<-prient+0b0c<-kkxrpc+0250<-kporpc+01d4<-opiodr+0ae0<-ttcpip+1020<-opitsk+1124<-opiino+0990<-opiodr+0ae0<-opidrv+0484<-sou2o+0090<-opimai_real+01bc<-main+0098<-__start+0070 ---------------------------------------- SO: 70000078f2c84a0, type: 4, owner: 700000784271270, flag: INIT/-/-/0x00 (session) sid: 236 trans: 7000007619fcf08, creator: 700000784271270, flag: (100041) USR/- BSY/-/-/-/-/- DID: 0001-025F-00004ADB, short-term DID: 0001-0262-00000005 txn branch: 700000760262cd0 oct: 2, prv: 0, sql: 70000043a39dda8, psql: 7000006feac35a8, user: 88/ZSCLMINTF service name: zsiims O/S info: user: oracle, term: , ospid: 10027038, machine: zsinsraca2 program: oracle@zsinsraca2 (TNS V1-V3) application name: oracle@zsinsraca2 (TNS V1-V3), hash value=450405937 waiting for 'row cache lock' blocking sess=0x0 seq=15070 wait_time=0 seconds since wait started=1605 cache id=3, mode=0, request=3 ============> cache id为3, 即dc_rollback_segments上的争用
Dumping Session Wait History
for ‘row cache lock’ count=1 wait_time=1388976
cache id=3, mode=0, request=3

此时我们并不能简单的认为,这个问题和rollback/undo有关
因为此时LCK0进程也在等待和kghfrunp(释放shared pool中内存)有关的latch,而在RAC环境里, row cache lock/library cache lock这样的instance lock的获取和释放必须经过LCK0进程,如果LCK0异常,那么就会发生row cache lock的问题.

PROCESS 33:
—————————————-
SO: 70000078627a560, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=33, calls cur/top: 7000007853a1520/7000007853a1520, flag: (6) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 33
last post received-location: ksrpublish
last process to post me: 700000782261488 12 0
last post sent: 0 0 21
last post sent-location: ksbria
last process posted by me: 70000078627a560 1 6
(latch info) wait_event=0 bits=0
Location from where call was made: kqrbip:
waiting for 70000075f1cd020 Child row cache objects level=4 child#=16
Location from where latch is held: kghfrunp: clatch: nowait:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
33 (3, 3, 3)
431 (3, 1444101789, 3)
486 (3, 1444101789, 3)
492 (3, 1444101789, 3)
468 (3, 1444101789, 3)
321 (3, 1444101789, 3)
470 (3, 1444101789, 3)
66 (3, 1444101789, 3)
276 (3, 1444101789, 3)
388 (3, 1444101789, 3)
207 (3, 1444101789, 3)
205 (3, 1444101789, 3)
443 (3, 1444101789, 3)
waiter count=13
gotten 1588533458 times wait, failed first 46002888 sleeps 3053343
gotten 2849791 times nowait, failed: 4267742
possible holder pid = 75 ospid=32440736
on wait list for 70000075f1cd020
Process Group: DEFAULT, pseudo proc: 7000007852bdfc8
O/S info: user: oracle, term: UNKNOWN, ospid: 6881304
OSD pid info: Unix process pid: 6881304, image: oracle@zsinsraca1 (LCK0)
Short stack dump:
ksdxfstk+002c< -ksdxcb+04e4<-sspuser+0074<-000047DC<-sskgpwwait+0034<-skgpwwait+00bc<-kslges+054c<-kslgetl+033c<-kqrbip+03c8<-kqrbfr+01ec<-kqrbtm+00c8<-ksbcti+03c8<-ksbabs+03fc<-kclabs+00d8<-ksbrdp+04b4<-opirip+03fc<-opidrv+0458<-sou2o+0090<-opimai_real+0150<-main+0098<-__start+0070 ---------------------------------------- SO: 70000078f366290, type: 4, owner: 70000078627a560, flag: INIT/-/-/0x00 (session) sid: 1062 trans: 0, creator: 70000078627a560, flag: (51) USR/- BSY/-/-/-/-/- DID: 0000-0021-00000003, short-term DID: 0000-0000-00000000 txn branch: 0 oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS service name: SYS$BACKGROUND waiting for 'latch: row cache objects' blocking sess=0x0 seq=45472 wait_time=0 seconds since wait started=0 address=70000075f1cd020, number=c8, tries=1 Dumping Session Wait History for 'latch: row cache objects' count=1 wait_time=906249 address=70000075f1cd020, number=c8, tries=0 for 'latch: row cache objects' count=1 wait_time=27503 address=70000075f1cd020, number=c8, tries=2 for 'latch: row cache objects' count=1 wait_time=292982 address=70000075f1cd020, number=c8, tries=1 综合考虑,从现在得到的信息来看,很可能发生问题时shared pool的使用出现了问题,但是因为我们没有更具体的dba_hist_active_sess_history数据,并且并没有在发生问题的第一时间收集systemstate dump,我们可能无法更进一步的确认它了. LOG FILE -------------- Filename=awrrpt_1_25919_25920.html ===> 发生问题前1个小时的AWR report
See the following:

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 25919 06-Oct-15 09:00:26 224 6.6
End Snap: 25920 06-Oct-15 10:00:36 233 6.3
Elapsed: 60.17 (mins)
DB Time: 18.51 (mins)

Cache Sizes

Begin End
Buffer Cache: 21,776M 21,776M Std Block Size: 8K
Shared Pool Size: 8,752M 8,752M Log Buffer: 30,672K

SGA breakdown difference
ordered by Pool, Name
N/A value for Begin MB or End MB indicates the size of that Pool/Name was insignificant, or zero in that snapshot
Pool Name Begin MB End MB % Diff
shared CCursor 302.83 323.98 6.98
shared Cursor Stats 222.80 222.80 0.00
shared KGH: NO ACCESS 6,056.99 6,056.99 0.00 ===========> 将近6个GB是被KGH: NO ACCESS所使用
shared PCursor 252.15 274.83 9.00
shared free memory 5,239.86 4,815.81 -8.09
shared gcs resources 418.68 418.68 0.00
shared gcs shadows 384.49 384.49 0.00
shared kglsim heap 245.99 245.99 0.00
shared kglsim object batch 416.87 416.87 0.00
shared library cache 251.35 258.46 2.83
shared sql area 382.75 764.75 99.80

1. 发生问题前的awr report显示shared pool中大部分的内存都被KGH: NO ACCESS占用
2. 而发生问题后的systemstate dump显示非常多的进程都在等待跟kghfrunp功能有关的latch (LCK0也在等这样的latch, 导致了系统出现严重的row cache enqueue/row cache lock问题)
而这个kghfrunp功能是要从shared pool中释放空间,这说明shared pool当时不够用

8666117.8, patch 8666117 可以优化lck0在shared pool有压力下的工作,可以减缓row cache lock的问题
note 801787.1里 ,有以下的三个方式可以减少 KGH: NO ACCESS的大小:

a). 设置最小的buffer cache及shared pool的值
b). 减少shared pool resize的频率:设置 _memory_broker_stat_interval
c). 打patch 7189722

Aix clone cause the user process limit value changed

aix 5以及aix6系统在克隆以后,导致oracle用户的process数量限制初始化为128.在一切ha搭建或者主备机切换过程会导致监听以及中间件,客户端时而可连时而不可连接的状况。

在实施oracle要求中这个参数应该设置为65536或者大于65536.可以通过在root用户中进去smit chgsys修改,当然也可以用命令 chdev -l sys0 -a maxuproc ‘65536’来修改。

最近几个客户在克隆系统后,都遭遇了此类问题。希望对大伙有帮助,能引起注意。

ORA-12518: TNS: listener could not hand off client connection

ORA-12518: TNS: listener could not hand off client connection

Note:
Starting in 11gR2, the listener log is not activated by default. You have to turn on listener logging: logging_listener_name=on

Oracle docs note this about ORA-12518:

ORA-12518:
TNS:listener could not hand off client connection

Cause:
The process of handing off a client connection to another process failed.

Action:
Turn on listener tracing and re-execute the operation. Verify that the listener and database instance are properly configured for direct handoff. If problem persists, call Oracle Support.

ORA-12518 has been known to occur in Oracle Net Services version 9.2.0.5-11.2.0.1 in which attempting new connections on the database throw either ORA-12518 or TNS-12518.ORA-12518 appears in the listener log upon new user connection.

来自orace.群员的提问摘录。

微信一个朋友问的imp过程遭遇Ora-01114

Oracle process number: 8

Windows thread id: 580, image: ORACLE.EXE


*** SESSION ID:(7.10) 2015-09-10 00:15:03.000
*** 2015-09-10 00:15:03.000
ksedmp: internal or fatal error
ORA-01114: 将块写入文件 201 时出现 IO 错误 (块 # 524044)
ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O
OSD-04026: 传递的参数无效。 (OS 524050)
ORA-01114: 将块写入文件 201 时出现 IO 错误 (块 # 524044)
ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O
OSD-04026: 传递的参数无效。 (OS 524050)
ORA-01114: 将块写入文件 201 时出现 IO 错误 (块 # 524044)
ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O
OSD-04026: 传递的参数无效。 (OS 524050)
ORA-01114: 将块写入文件 201 时出现 IO 错误 (块 # 524037)
ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O
OSD-04026: 传递的参数无效。 (OS 524043)
No current SQL statement being executed.
----- Call Stack Trace -----

创建唯一索引时候报错,文件201号,temp4G大小(db中唯一的4g大小文件),524*8k/1024=4G,不能远程查验,初步猜测临时表空间不能扩展导致,而且此过程报错‘ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O’,说明io超出了文件范围,验证了临时表空间的可能性。
最终通过增加temp的大小解决。

建议关注11g数据库password_life_time

Oracle 11g 之前默认的用户时是没有密码过期的限制的,在Oracle 11g 中default的profile启用了密码过期时间是180天,也就是password_life_time值为180,虽然是个小细节,但是很多客户在迁移到11g后有规律性的都在半年后出现了密码错误无法登陆的问题.

如下:

select * from dba_profiles where profile='DEFAULT' and resource_name='PASSWORD_LIFE_TIME';

PROFILE       RESOURCE_NAME       RESOURCE LIMIT
------------ -------------------- -------- -------
DEFAULT       PASSWORD_LIFE_TIME  PASSWORD 180

当过期时候系统会报错ORA-28002.当遭遇这个问题时候可以通过以下方式解决:

    1.新建profile,对用户指定新的profile.
    2.通过对用户重设密码(密码可以和原来一样).

    命令为:

    alter user username identified by password.
    

    3.针对默认的profile的password_life_time设置为unlimited

    命令为:

    alter profile default limit PASSWORD_LIFE_TIM 180.

以下是profile里的关于password的设置类目解释:

    FAILED_LOGIN_ATTEMPTS
    设定登录到Oracle 数据库时可以失败的次数。一旦某用户尝试登录数据库的达到该值时,该用户的帐户就被锁定,只能由DBA能解锁。
    PASSWORD_LIFE_TIME
    设定口令的有效时间(天数),一旦超过这一时间,必须重新设口令。缺省为180天(11g,10gUNLIMITED).
    PASSWORD_REUSE_TIME
    许多系统不许用户重新启用过去用过的口令。该资源项设定了一个失效口令要经过多少天,用户才可以重新使用该口令。缺省为UNLIMITED.
    PASSWORD_REUSE_MAX
    重新启用一个先前用过的口令前必须对该口令进行重新设置的次数(重复用的次数)。
    PASSWORD_LOCK_TIME
    设定帐户被锁定的天数(当登录失败达到FAILED_LOGIN_ATTEMPTS时)。
    PASSWORD_GRACE_TIME
    设定在口令失效前,给予的重新设该口令的宽限天。当口令失效之后回,在登录时会出现警告信息显示该天数。如果没有在宽限天内修改口令,口令将失效。
    PASSWORD_VERITY_FUNCTION
    该资源项允许调用一个PL/SQL 来验证口令。Oracle公司已提供该应用 的脚本,但是只要愿意的话,用户可以制定自己的验证脚本。该参数的设定就是PL/SQL函数的名称。缺省为NULL.