Skip to content

(1) 初始化参数设置的建议

(1.1) sec_case_sensitive_loacgon
在12.1.0.1中不建议修改。默认值是true,但是如果设置成false,启动的时候会有提示:ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance。但是设置成false仍然生效,即忽略大小写的密码可以登录。

(1.2) SQLNET.ALLOWED_LOGON_VERSION_SERVER
在12c中默认值是11,所以如10.2.0.5的JDBC连接过来,就会报错ora-28040,虽然可以设置SQLNET.ALLOWED_LOGON_VERSION_SERVER=8来解决这个问题,但是由于今后11.1以下版本的JDBC不再被oracle支持,因此还是建议升级JDBC驱动来实现。
jar file和JDBC驱动之间的关系,参考Doc ID 401934.1。 10.2版本的JDBC,使用ojdbc14.jar;11.2版本的JDBC,使用ojdbc6.jar;12.1.0的JDBC使用ojdbc7.jar。

SQLNET.ALLOWED_LOGON_VERSION_SERVER 的值有{ 8 | 10 | 11 | 12 | 12a },各个值如下含义:
12a for Oracle Database 12c authentication protocols (strongest protection)
12 for the critical patch updates CPUOct2012 and later Oracle Database 11g ,authentication protocols (recommended)
11 for Oracle Database 11g authentication protocols (default)
10 for Oracle Database 10g authentication protocols
8 for Oracle9i authentication protocol
默认值是11,推荐值是12(如果你没有小于10.2.0.5的客户端)

(1.3) AUDIT_TRAIL
取值范围是: { none | os | db [, extended] | xml [, extended] },默认值是db或者none。
和12c的新特性Unified Auditing有关。是否启用unified auditing,可以用select VALUE from V$OPTION where PARAMETER=’Unified Auditing’;检查。

Unified Auditing默认是工作在mixed模式。
建议:
如果你以前没有审计,那么你可以设置成none。
如果以前有审计,建议设置成db。
更多信息,参考 http://tinyurl.com/UnifiedAuditing

(1.4)DEFERRED_SEGMENT_CREATION
从11.2开始,这个值的默认值是true,建议12c中设置成false。

(1.5)JOB_QUEUE_PROCESSES
从11.1开始,这个值是1000,建议设置成和CPU core相等的值。

(1.6)_DATAFILE_WRITE_ERRORS_CRASH_INSTANCE
这个值默认值是true。所有datafile的IO写error,都会导致数据库crash。
在11.2.0.2之前这个值是false,即只是offline datafile(非system),而不crash instance。11.2.0.2之后,是true。
建议:注意这个从11.2.0.2之后的变化。

(1.7) MAX_STRING_SIZE
这是在12c中的新参数。默认值是standard。
这个参数用于控制varchar2,nvarchar2,raw类型的最大值。standard下,行为和12c之前一样,即varchar2和nvarchar2是4000 bytes,raw是2000 bytes,
改成extended之后,启用了32k strings新特性,varchar2、nvarchar2、raw最大长度可以达到32767 bytes。
修改方式:
1. startup upgrade
2. ALTER SYSTEM SET MAX_STRING_SIZE = EXTENDED;
3.运行@?/rdbms/admin/utl32k.sql
但是注意,单向修改,改了之后,就改不回来了。(可以flashback)

注意原生32k strings是以out-of-line的blob方式存储,且还是basic file,而如果是以modify成32k strings的是以in-line的方式行链接存储。
所以转32k strings需要考虑lob效率的问题。

(2)其他在12c中的参数:

(2.1) _OPTIMIZER_AGGR_GROUPBY_ELIM
Values: { TRUE | FALSE } 默认值是true
Recommendation: FALSE – Wrong Results with GROUP BY Clause in Nested Query (Doc ID 19567916.8)
建议值:false

(2.2) INMEMORY_FORCE
Values: { DEFAULT | OFF } 默认值是default
Explanation: In-Memory Optimization
Recommendation: OFF – Unless you have an Oracle In-Memory license
建议值为off,除非你有in-memory的license

(2.3) OPTIMIZER_DYNAMIC_SAMPLING
如果统计信息不存在,进行动态采样的比例。
Values: { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 } 默认值是2
解释:
a.0: Off
b.2: Check 不多于64个blocks – generate stats during parse
c.11:12c的新增加值,进行动态采样的block数由系统自动决定,采样结果会保留在statistics repository,供下次使用。
建议值:参考在线文档SQL Tuning Guide,最佳实践是在session级设置。

(2.4) AWR Lite Snapshots
参考Doc ID 1993045.1,建议在手工snapshot的时候,用lite模式
–默认值是bestfit
_AWR_SNAPSHOT_LEVEL = BASIC | LITE | TYPICAL | ALL | BESTFIT
–手工snapshot的时候,建议lite
SQL> exec dbms_workload_repository.create_snapshot(‘LITE’);

(2.5) _OPTIMIZER_COST_BASED_TRANSFORMATION
默认值是on,从10.2开始,默认值就是on了,但是在11.2.0.3之前,建议设置off

(2.6) SESSION_CACHED_CURSORS
默认值是50,建议设置成200,然后按照208857.1进行调整。
太大容易有shared pool碎片

(2.7) _MEMORY_IMM_MODE_WITHOUT_AUTOSGA
默认值是true,如果需要禁用偷buffer cache的属性,设置成false

(2.8) OPTIMIZER_USE_PENDING_STATISTICS
默认值是false,可以在session级设置成true后,测试新的但是还没发布的统计信息,对sql的影响。

(2.9)OPTIMIZER_USE_INVISIBLE_INDEXES
是否让优化器看见invisible的索引,默认值是false
ALTER INDEX idx_ename ON emp(ename) INVISIBLE;
ALTER SESSION SET OPTIMIZER_USE_INVISIBLE_INDEXES=TRUE;

(2.10) 其他建议值:
a. _optimizer_adaptive_plans=FALSE (需要评估,Adaptive Query Optimization Doc ID 2031605.1)
b. _optimizer_unnest_scalar_sq=FALSE (Bug 19894622 – ORA-600 [kkqcsfixfro:1 — frooutj] error occur in 12c (Doc ID 19894622.8))
c. _rowsets_enabled=FALSE (Bug 22173980 : WRONG RESULTS WHEN “_ROWSETS_ENABLED” = TRUE)
d. _optimizer_reduce_groupby_key=FALSE (Bug 20634449 – Wrong results from OUTER JOIN with a bind variable and a GROUP BY clause in 12.1.0.2)
e. _kks_obsolete_dump_threshold=0 or 8(Huge Trace Files Created Containing “—– Cursor Obsoletion Dump sql_id=%s —–” (Doc ID 1955319.1) ,该参数取值范围是0~8,默认值是1。设置0表示永远不dump,设置8表示parent cursor obsoleted 8次之后才dump。)

(2)其他在12c中的参数:

(2.1) _OPTIMIZER_AGGR_GROUPBY_ELIM
Values: { TRUE | FALSE } 默认值是true
Recommendation: FALSE – Wrong Results with GROUP BY Clause in Nested Query (Doc ID 19567916.8)
建议值:false

(2.2) INMEMORY_FORCE
Values: { DEFAULT | OFF } 默认值是default
Explanation: In-Memory Optimization
Recommendation: OFF – Unless you have an Oracle In-Memory license
建议值为off,除非你有in-memory的license

(2.3) OPTIMIZER_DYNAMIC_SAMPLING
如果统计信息不存在,进行动态采样的比例。
Values: { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 } 默认值是2
解释:
a.0: Off
b.2: Check 不多于64个blocks – generate stats during parse
c.11:12c的新增加值,进行动态采样的block数由系统自动决定,采样结果会保留在statistics repository,供下次使用。
建议值:参考在线文档SQL Tuning Guide,最佳实践是在session级设置。

(2.4) AWR Lite Snapshots
参考Doc ID 1993045.1,建议在手工snapshot的时候,用lite模式
–默认值是bestfit
_AWR_SNAPSHOT_LEVEL = BASIC | LITE | TYPICAL | ALL | BESTFIT
–手工snapshot的时候,建议lite
SQL> exec dbms_workload_repository.create_snapshot(‘LITE’);

(2.5) _OPTIMIZER_COST_BASED_TRANSFORMATION
默认值是on,从10.2开始,默认值就是on了,但是在11.2.0.3之前,建议设置off

(2.6) SESSION_CACHED_CURSORS
默认值是50,建议设置成200,然后按照208857.1进行调整。
太大容易有shared pool碎片

(2.7) _MEMORY_IMM_MODE_WITHOUT_AUTOSGA
默认值是true,如果需要禁用偷buffer cache的属性,设置成false

(2.8) OPTIMIZER_USE_PENDING_STATISTICS
默认值是false,可以在session级设置成true后,测试新的但是还没发布的统计信息,对sql的影响。

(2.9)OPTIMIZER_USE_INVISIBLE_INDEXES
是否让优化器看见invisible的索引,默认值是false
ALTER INDEX idx_ename ON emp(ename) INVISIBLE;
ALTER SESSION SET OPTIMIZER_USE_INVISIBLE_INDEXES=TRUE;

(2.10) 其他建议值:
a. _optimizer_adaptive_plans=FALSE (需要评估,Adaptive Query Optimization Doc ID 2031605.1)
b. _optimizer_unnest_scalar_sq=FALSE (Bug 19894622 – ORA-600 [kkqcsfixfro:1 — frooutj] error occur in 12c (Doc ID 19894622.8))
c. _rowsets_enabled=FALSE (Bug 22173980 : WRONG RESULTS WHEN “_ROWSETS_ENABLED” = TRUE)
d. _optimizer_reduce_groupby_key=FALSE (Bug 20634449 – Wrong results from OUTER JOIN with a bind variable and a GROUP BY clause in 12.1.0.2)
e. _kks_obsolete_dump_threshold=0 or 8(Huge Trace Files Created Containing “—– Cursor Obsoletion Dump sql_id=%s —–” (Doc ID 1955319.1) ,该参数取值范围是0~8,默认值是1。设置0表示永远不dump,设置8表示parent cursor obsoleted 8次之后才dump。)

(4) 查找补丁信息:

原来是通过dba_registry_history 查询,现在可以通过下面的方法查询:
SQL> exec dbms_qopatch.get_sqlpatch_status;

下面的语句是用来查询inventory的位置:
SQL> select xmltransform(DBMS_QOPATCH.GET_OPATCH_LSINVENTORY,DBMS_QOPATCH.GET_OPATCH_XSLT) from dual;

查询某个patch是否被安装:
SQL> select xmltransform(DBMS_QOPATCH.IS_PATCH_INSTALLED(‘19303936 ‘),DBMS_QOPATCH.GET_OPATCH_XSLT) from dual;

查询所有的patch的情况:
SQL> select xmltransform(DBMS_QOPATCH.GET_OPATCH_LIST,DBMS_QOPATCH.GET_OPATCH_XSLT) from dual;

(5)升级catalog变化:

RMAN Catalog Upgrade:
– SQL> @$ORACLE_HOME/rdbms/admin/dbmsrmansys.sql <<<<需要先运行这一步(以前没有),再运行upgrade catalog – $ rman CATALOG my_catalog_owner@catdb recovery catalog database Password: RMAN> UPGRADE CATALOG;
RMAN> UPGRADE CATALOG;
RMAN> EXIT;

(6)增量统计信息收集增强(增量统计信息收集和交换分区相结合)

–先设置使用过期比例USE_STALE_PERCENT。定义“变化”的分区不会收集新的统计信息,除非有xx%百分比的变化。
SQL> exec DBMS_STATS.SET_DATABASE_PREFS(‘INCREMENTAL_STALENESS’,’USE_STALE_PERCENT’);
–设置过期比例为12%(默认是10%)
SQL> exec DBMS_STATS.SET_DATABASE_PREFS(‘STALE_PERCENT’,’12’);

(7)统计信息情况查看:

–查看整个库的统计信息情况:
SQL> variable mystatrep2 clob;
SQL> set long 1000000
SQL> begin
2 :mystatrep2 := DBMS_STATS.REPORT_STATS_OPERATIONS(since=>SYSTIMESTAMP-
3 1,until=>SYSTIMESTAMP, detail_level=>’TYPICAL’,format=>’TEXT’);
4 end;
5 /

PL/SQL procedure successfully completed.

SQL>
SQL> set serverout on
SQL> set long 999999999
SQL> set line 10000
SQL> set pages 10000
SQL> col mystatrep2 for a200
SQL> print mystatrep2

MYSTATREP2
——————————————————————————————————————————————————————————————————–
——————————————————————————————————————————————————————————————————-
| Operation Id | Operation | Target | Start Time | End Time | Status | Total Tasks | Successful Tasks | Failed Tasks | Active Tasks |
——————————————————————————————————————————————————————————————————-
| 366 | export_stats_for_dp | TEST | 13-JUL-16 02.12.59.130000 PM +08:00 | 13-JUL-16 02.12.59.203000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 365 | export_stats_for_dp | TEST | 13-JUL-16 02.08.18.857000 PM +08:00 | 13-JUL-16 02.08.18.880000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 364 | export_stats_for_dp | TEST | 13-JUL-16 02.04.13.425000 PM +08:00 | 13-JUL-16 02.04.13.480000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 363 | export_stats_for_dp | TEST | 13-JUL-16 01.57.50.812000 PM +08:00 | 13-JUL-16 01.57.50.846000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 362 | export_stats_for_dp | TEST | 13-JUL-16 01.56.34.637000 PM +08:00 | 13-JUL-16 01.56.34.667000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 361 | import_stats_for_dp | TEST | 13-JUL-16 01.55.50.018000 PM +08:00 | 13-JUL-16 01.55.50.294000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 360 | export_stats_for_dp | TEST | 13-JUL-16 01.53.54.840000 PM +08:00 | 13-JUL-16 01.53.54.879000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 359 | import_stats_for_dp | TEST | 13-JUL-16 01.52.17.822000 PM +08:00 | 13-JUL-16 01.52.18.059000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 358 | import_stats_for_dp | TEST | 13-JUL-16 01.46.44.289000 PM +08:00 | 13-JUL-16 01.46.44.611000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 357 | export_stats_for_dp | TEST | 13-JUL-16 01.43.12.810000 PM +08:00 | 13-JUL-16 01.43.13.543000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 356 | import_stats_for_dp | TEST | 13-JUL-16 01.40.03.349000 PM +08:00 | 13-JUL-16 01.40.03.650000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 355 | import_stats_for_dp | TEST | 13-JUL-16 01.37.36.601000 PM +08:00 | 13-JUL-16 01.37.36.843000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 354 | import_stats_for_dp | TEST | 13-JUL-16 01.36.32.072000 PM +08:00 | 13-JUL-16 01.36.32.321000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 353 | import_stats_for_dp | TEST | 13-JUL-16 01.34.56.514000 PM +08:00 | 13-JUL-16 01.34.56.790000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 352 | import_stats_for_dp | TEST | 13-JUL-16 12.01.25.756000 PM +08:00 | 13-JUL-16 12.01.26.022000 PM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 351 | import_stats_for_dp | TEST | 13-JUL-16 11.45.08.131000 AM +08:00 | 13-JUL-16 11.45.08.703000 AM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-
| 350 | export_stats_for_dp | TEST | 13-JUL-16 11.42.21.767000 AM +08:00 | 13-JUL-16 11.42.24.557000 AM +08:00 | COMPLETED | 0 | 0 | 0 | 0 |
——————————————————————————————————————————————————————————————————-

SQL>
–查看某个schema的统计信息收集情况:
SQL> set serverout on
SQL> set long 999999999
SQL> set line 10000
SQL> set pages 10000
SQL> col my_report for a200
SQL> variable my_report clob;
SQL> BEGIN
2 :my_report := DBMS_STATS.REPORT_GATHER_SCHEMA_STATS(ownname => ‘TEST’,
3 detail_level => ‘TYPICAL’, format => ‘TEXT’);
4 END;
5 /

PL/SQL procedure successfully completed.

SQL>
SQL> print my_report

MY_REPORT
——————————————————————————————————————————————————————————————————–
——————————————————————————————————————————————————————————————————-
| Operation Id | Operation | Target | Start Time | End Time | Status | Total Tasks | Successful Tasks | Failed Tasks | Active Tasks |
——————————————————————————————————————————————————————————————————-
| 368 | gather_schema_stats (reporting mode) | TEST | | | | 1 | | | |
——————————————————————————————————————————————————————————————————-
——————————————————————————————————————————————————————————————————-
| |
| ————————————————————————————————————————————————————————————————— |
| T A S K S |
| ————————————————————————————————————————————————————————————————— |
| ——————————————————————————————————————————————————————————————— |
| | Target | Type | Start Time | End Time | Status | |
| ——————————————————————————————————————————————————————————————— |
| | TEST.T1 | TABLE | | | | |
| ——————————————————————————————————————————————————————————————— |
| |
| |
——————————————————————————————————————————————————————————————————-

(8)DBMS_ROLLING

Data Guard Simple Rolling Upgrade
Semi-automation of Transient Logical Standby Rolling Upgrade
Works with Data Guard Broker
Procedure DBMS_ROLLING

(9) Real-Time ADDM:

– MMON进程负责收集数据, 每隔3秒一次,不会有lock/latch
– MMON的子进程创建report,并保留在AWR中,可以通过查询 DBA_HIST_REPORTS

参考 Mike Dietrich Upgrade, Migrate & Consolidate to Oracle Database 12c/ oracleblog

升级到12c/18c以后的版本需要了解的一些方面参数(参考Upgrade, Migrate & Consolidate to Oracle Database 12c)

跟踪oracle 进程链接建立时候发生的情况

 

本方法主要使用触发器的方式对新建立的进程自动进行10046的trace并进行分析的方式,期间使用到truss,需要掌握truss的使用方法,具体如下:

 

创建用户

create user trace_user identified by trace_user account unlock;
grant create session,dba to trace_user;

创建登陆触发器

create or replace trigger sys.set_trace
after logon on database
when (user = 'trace_user')
declare
lcommand varchar(200);
begin
execute immediate 'alter session set statistics_level=all';
execute immediate 'alter session set max_dump_file_size=unlimited';
execute immediate 'alter session set events ''10046 trace name context forever, level 12''';
end set_trace;
/

 

尝试登陆

sqlplus "trace_user/trace_user" << eof
exit;
eof

这将在$ ORACLE_BASE / admin / <db_name> / udump目录下生成(10046)跟踪文件

然后通过查看“e =”部分查看数据库调用(PARSE,BIND,EXEC,FETCH)是否有大量时间,这意味着数据库调用所消耗的持续时间

通过查看“ela =”部分检查等待事件是否有大量时间,这意味着等待事件消耗的持续时间(可能通过进行系统调用)

PARSING IN CURSOR #5 len=131 dep=1 uid=0 oct=3 lid=0 tim=15687899202263 hv=1389591971 ad='ca9a7948'
select privilege#,nvl(col#,0),max(mod(nvl(option$,0),2))from objauth$ where obj#=:1 and grantee#=:2 group by privilege#,nvl(col#,0)
END OF STMT
EXEC #5:c=0,e=145,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=15687899202260
WAIT #5: nam='db file sequential read' ela= 30005 file#=1 block#=24208 blocks=1 obj#=-1 tim=15687899232346
FETCH #5:c=0,e=30075,p=1,cr=2,cu=0,mis=0,r=0,dep=1,og=4,tim=15687899232396

没有数据库调用占用大量时间

less oracle_ora_1118270.trc | grep -i ",e=" | cut -d , -f2 > karlarao.txt ; sed -n 's/e=/ /p' karlarao.txt | sort -nr | less

 358746
 183162
 61293
 44661
 32580
 30075
 28695
 26950
 25837
 24244
 ...

没有等待事件耗费大量时间

less oracle_ora_1118270.trc | grep -i "ela=" | cut -d " " -f8 | sort -nr | less

30005
28624
13253
11592
9650

没有 “os thread startup”事件!

less oracle_ora_1118270.trc | grep -i "ela=" | cut -d "=" -f2 | uniq

'SQL*Net message to client' ela
'SQL*Net message from client' ela
'SQL*Net message to client' ela
'SQL*Net message from client' ela
'SQL*Net message to client' ela
'SQL*Net message from client' ela
'SQL*Net message to client' ela
'SQL*Net message from client' ela
'db file sequential read' ela
'SQL*Net message to client' ela
'SQL*Net message from client' ela
'db file sequential read' ela
'SQL*Net message to client' ela
'db file sequential read' ela
'SQL*Net message from client' ela
'SQL*Net message to client' ela
'SQL*Net message from client' ela
'db file sequential read' ela
'SQL*Net message to client' ela
'SQL*Net message from client' ela
'SQL*Net message to client' ela
'SQL*Net message from client' ela
'SQL*Net message to client' ela
'SQL*Net message from client' ela
'SQL*Net message to client' ela
'SQL*Net message from client' ela

下来对系统调用进行检测,这里使用truss工具

这个工具可以让你做一些应用程序跟踪,这里是跟踪SQL * Plus,并显示应用程序对外部库和内核的调用

truss -c sqlplus "/ as sysdba" << EOF
exit;
EOF

truss -d sqlplus "/ as sysdba" << EOF
exit;
EOF

 

“-c”开关计算跟踪的系统调用,故障和信号

“-d”开关显示每行的时间戳(以秒为单位)。跟踪输出的第一行将显示测量各个时间戳的基准时间

这里下面可以发现是DNS服务导致sqlplus“/ as sysdba”操作减慢,并且有“os线程启动”等待,下面的文本是“truss”输出的部分,当在/etc/resolv.conf和/ etc / hosts上读取时,“(sleep……)”事件表示减速。

ENOTTY和ECONNREFUSED是看到的错误,这里此文件/usr/include/sys/errno.h上找到有关错误消息的更多详细信息。以下是它们的解释

 

#define	ENOTTY	25	/* Inappropriate I/O control operation	*/
#define ECONNREFUSED    79      /* Connection refused */

 

根据“truss -d”的记录,可以看到时间信息从2.3379秒(读取/etc/resolv.conf) – > 7.3477秒 – > 17.3489秒 – > 37.3555秒(读数) / etc / hosts) – > 70.3863秒(已连接)

0.3172:        gethostname(0x0FFFFFFFFFFF9920, 256) = 0
0.3175:        _getpid()                        = 1102056
0.3177:        open("/etc/resolv.conf", O_RDONLY) = 8
0.3180:        kioctl(8, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
0.3182:        kioctl(8, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(8, " d o m a i n     this is the domain name, i removed it".., 4096)    = 69
kread(8, " d o m a i n     this is the domain name, i removed it".., 4096)    = 0
0.3190:        close(8)                         = 0
0.3194:        socket(2, 2, 0)                  = 8
0.3197:        getsockopt(8, 65535, 4104, 0x0FFFFFFFFFFF70E4, 0x0FFFFFFFFFFF70E0) = 0
0.3199:        connext(8, 0x09001000A0018600, 16) = 0
0.3202:        send(8, 0x0FFFFFFFFFFF7FF0, 41, 0) = 41
0.3205:        _poll(0x0FFFFFFFFFFF7190, 1, 5000) = 1
0.3208:        nrecvfrom(8, 0x0FFFFFFFFFFF9320, 1024, 0, 0x0FFFFFFFFFFF7950, 0x0FFFFFFFFFFF7178) Err#79 ECONNREFUSED
0.3210:        close(8)                         = 0
0.3212:        socket(2, 2, 0)                  = 8
0.3215:        sendto(8, 0x0FFFFFFFFFFF7FF0, 41, 0, 0x09001000A0018610, 16) = 41
0.3217:        _poll(0x0FFFFFFFFFFF7190, 1, 5000) = 1
0.3220:        nrecvfrom(8, 0x0FFFFFFFFFFF9320, 1024, 0, 0x0FFFFFFFFFFF7950, 0x0FFFFFFFFFFF7178) = 108
0.3222:        close(8)                         = 0
0.3224:        socket(2, 2, 0)                  = 8
0.3227:        getsockopt(8, 65535, 4104, 0x0FFFFFFFFFFF70E4, 0x0FFFFFFFFFFF70E0) = 0
0.3229:        connext(8, 0x09001000A0018600, 16) = 0
0.3231:        send(8, 0x0FFFFFFFFFFF7FF0, 28, 0) = 28
0.3233:        _poll(0x0FFFFFFFFFFF7190, 1, 5000) = 1
0.3236:        nrecvfrom(8, 0x0FFFFFFFFFFF9320, 1024, 0, 0x0FFFFFFFFFFF7950, 0x0FFFFFFFFFFF7178) Err#79 ECONNREFUSED
0.3238:        close(8)                         = 0
0.3240:        socket(2, 2, 0)                  = 8
0.3243:        sendto(8, 0x0FFFFFFFFFFF7FF0, 28, 0, 0x09001000A0018610, 16) = 28
0.3245:        _poll(0x0FFFFFFFFFFF7190, 1, 5000) = 1
0.3248:        nrecvfrom(8, 0x0FFFFFFFFFFF9320, 1024, 0, 0x0FFFFFFFFFFF7950, 0x0FFFFFFFFFFF7178) = 28
0.3250:        close(8)                         = 0
0.3252:        socket(2, 2, 0)                  = 8
0.3373:        sendto(8, 0x0FFFFFFFFFFF7FF0, 28, 0, 0x09001000A0018600, 16) = 28
2.3379:        _poll(0x0FFFFFFFFFFF7190, 1, 5000) (sleeping...)
2.3379:        _poll(0x0FFFFFFFFFFF7190, 1, 5000) = 0
5.3468:        close(8)                         = 0
5.3470:        socket(2, 2, 0)                  = 8
5.3472:        sendto(8, 0x0FFFFFFFFFFF7FF0, 28, 0, 0x09001000A0018600, 16) = 28
7.3477:        _poll(0x0FFFFFFFFFFF7190, 1, 10000) (sleeping...)
7.3477:        _poll(0x0FFFFFFFFFFF7190, 1, 10000) = 0
15.3479:        close(8)                        = 0
15.3482:        socket(2, 2, 0)                 = 8
15.3484:        sendto(8, 0x0FFFFFFFFFFF7FF0, 28, 0, 0x09001000A0018600, 16) = 28
17.3489:        _poll(0x0FFFFFFFFFFF7190, 1, 20000) (sleeping...)
17.3489:        _poll(0x0FFFFFFFFFFF7190, 1, 20000) = 0
35.3491:        close(8)                        = 0
35.3495:        getdomainname(0x0FFFFFFFFFFF9480, 256) = 0
35.3497:        open("/etc/hosts", O_RDONLY)    = 8
35.3500:        kioctl(8, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
35.3502:        kfcntl(8, F_SETFD, 0x0000000000000001) = 0
35.3505:        kioctl(8, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(8, " #   @ ( # ) 4 7\t 1 . 1".., 4096)    = 3453
kread(8, " #   @ ( # ) 4 7\t 1 . 1".., 4096)    = 0
35.3514:        close(8)                        = 0
kwrite(9, "01 �0603 s".., 432)    = 432
37.3555:        kread(10, "16060280".., 2064) (sleeping...)
kread(10, "06 E06\b".., 2064)   = 1605
70.3762:        open("/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msb", O_RDONLY) = 8
70.3764:        kfcntl(8, F_SETFD, 0x0000000000000001) = 0
70.3767:        lseek(8, 0, 0)                  = 0
kread(8, "1513 "011303\t\t".., 256)     = 256
70.3772:        lseek(8, 512, 0)                = 512
kread(8, "1A Q 5 C [ V u �85 �9480".., 512)     = 512
70.3776:        lseek(8, 1024, 0)               = 1024
kread(8, "18 $ 4 = G S".., 512)     = 512
70.3781:        lseek(8, 50688, 0)              = 50688
kread(8, "\f05 ] P05 ^".., 512)     = 512
70.3786:        lseek(8, 512, 0)                = 512
kread(8, "1A Q 5 C [ V u �85 �9480".., 512)     = 512
70.3790:        lseek(8, 1024, 0)               = 1024
kread(8, "18 $ 4 = G S".., 512)     = 512
70.3795:        lseek(8, 50688, 0)              = 50688
kread(8, "\f05 ] P05 ^".., 512)     = 512
70.3799:        lseek(8, 512, 0)                = 512
kread(8, "1A Q 5 C [ V u �85 �9480".., 512)     = 512
70.3804:        lseek(8, 1024, 0)               = 1024
kread(8, "18 $ 4 = G S".., 512)     = 512
70.3808:        lseek(8, 50688, 0)              = 50688
kread(8, "\f05 ] P05 ^".., 512)     = 512
70.3813:        lseek(8, 512, 0)                = 512
kread(8, "1A Q 5 C [ V u �85 �9480".., 512)     = 512
70.3817:        lseek(8, 1024, 0)               = 1024
kread(8, "18 $ 4 = G S".., 512)     = 512
70.3822:        lseek(8, 51712, 0)              = 51712
kread(8, "\n0589 D058A".., 512)     = 512
70.3827:        close(8)                        = 0
kwrite(9, " <0611 k".., 60)     = 60
kread(10, " �06\b".., 2064)   = 179
kwrite(9, " U0603 h".., 85)     = 85
kread(10, "1606\b".., 2064)   = 22
kwrite(9, " U0603 h".., 85)     = 85
kread(10, "1606\b".., 2064)   = 22
70.3854:        lseek(4, 512, 0)                = 512
kread(4, "17 �".., 512)     = 512
70.3858:        lseek(4, 1024, 0)               = 1024
kread(4, "16 * R h819E".., 512)     = 512
70.3863:        lseek(4, 4096, 0)               = 4096
kread(4, "\f82 P86".., 512)     = 512

kwrite(1, "\n", 1)                              = 1
Connected to:
kwrite(1, " C o n n e c t e d   t o".., 14)     = 14
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
kwrite(1, " O r a c l e   D a t a b".., 77)     = 77
With the Partitioning, OLAP, Data Mining and Real Application Testing options
kwrite(1, " W i t h   t h e   P a r".., 78)     = 78

kwrite(1, "\n", 1)                              = 1
70.3882:        kfcntl(1, F_GETFL, 0x0000000000000008) = 2
70.3886:        __libc_sbrk(0x0000000000030020) = 0x000000001023C880
70.3892:        access("login.sql", 0)          Err#2  ENOENT
70.3895:        access("/oracle/app/product/10.2/db_1/sqlplus/admin/glogin.sql", 0) = 0
70.3898:        statfs("/oracle/app/product/10.2/db_1/sqlplus/admin/glogin.sql", 0x0FFFFFFFFFFFDA10) = 0

 

事实证明确实是dns出现了问题,在调整后迅速恢复了响应时间

3.0713:        gethostname(0x0FFFFFFFFFFF9900, 256) = 0
3.0716:        _getpid()                        = 2539762
3.0719:        open("/etc/resolv.conf", O_RDONLY) = 8
3.0721:        kioctl(8, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
3.0724:        kioctl(8, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(8, " d o m a i n     this is the domain name, i removed it".., 4096)    = 70
kread(8, " d o m a i n     this is the domain name, i removed it".., 4096)    = 0
3.0731:        close(8)                         = 0
3.0735:        socket(2, 2, 0)                  = 8
3.0740:        getsockopt(8, 65535, 4104, 0x0FFFFFFFFFFF70C4, 0x0FFFFFFFFFFF70C0) = 0
3.0742:        connext(8, 0x09001000A0018600, 16) = 0
3.0746:        send(8, 0x0FFFFFFFFFFF7FD0, 41, 0) = 41
3.0750:        _poll(0x0FFFFFFFFFFF7170, 1, 5000) = 1
3.0753:        nrecvfrom(8, 0x0FFFFFFFFFFF9300, 1024, 0, 0x0FFFFFFFFFFF7930, 0x0FFFFFFFFFFF7158) = 108
3.0756:        send(8, 0x0FFFFFFFFFFF7FD0, 28, 0) = 28
3.0758:        _poll(0x0FFFFFFFFFFF7170, 1, 5000) = 1
3.0761:        nrecvfrom(8, 0x0FFFFFFFFFFF9300, 1024, 0, 0x0FFFFFFFFFFF7930, 0x0FFFFFFFFFFF7158) = 28
3.0764:        close(8)                         = 0
3.0767:        getdomainname(0x0FFFFFFFFFFF9460, 256) = 0
3.0769:        open("/etc/hosts", O_RDONLY)     = 8
3.0771:        kioctl(8, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
3.0774:        kfcntl(8, F_SETFD, 0x0000000000000001) = 0
3.0777:        kioctl(8, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(8, " #   @ ( # ) 4 7\t 1 . 1".., 4096)    = 3453
kread(8, " #   @ ( # ) 4 7\t 1 . 1".., 4096)    = 0
3.0786:        close(8)                         = 0
kwrite(9, "01 �0603 s".., 432)    = 432
kread(10, "06 H06\b".., 2064)   = 1608
3.0973:        open("/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msb", O_RDONLY) = 8
3.0975:        kfcntl(8, F_SETFD, 0x0000000000000001) = 0
3.0978:        lseek(8, 0, 0)                   = 0
kread(8, "1513 "011303\t\t".., 256)     = 256
3.0983:        lseek(8, 512, 0)                 = 512
kread(8, "1A Q 5 C [ V u �85 �9480".., 512)     = 512
3.0988:        lseek(8, 1024, 0)                = 1024
kread(8, "18 $ 4 = G S".., 512)     = 512
3.0993:        lseek(8, 50688, 0)               = 50688
kread(8, "\f05 ] P05 ^".., 512)     = 512
3.0997:        lseek(8, 512, 0)                 = 512
kread(8, "1A Q 5 C [ V u �85 �9480".., 512)     = 512
3.1002:        lseek(8, 1024, 0)                = 1024
kread(8, "18 $ 4 = G S".., 512)     = 512
3.1006:        lseek(8, 50688, 0)               = 50688
kread(8, "\f05 ] P05 ^".., 512)     = 512
3.1011:        lseek(8, 512, 0)                 = 512
kread(8, "1A Q 5 C [ V u �85 �9480".., 512)     = 512
3.1015:        lseek(8, 1024, 0)                = 1024
kread(8, "18 $ 4 = G S".., 512)     = 512
3.1021:        lseek(8, 50688, 0)               = 50688
kread(8, "\f05 ] P05 ^".., 512)     = 512
3.1025:        lseek(8, 512, 0)                 = 512
kread(8, "1A Q 5 C [ V u �85 �9480".., 512)     = 512
3.1029:        lseek(8, 1024, 0)                = 1024
kread(8, "18 $ 4 = G S".., 512)     = 512
3.1034:        lseek(8, 51712, 0)               = 51712
kread(8, "\n0589 D058A".., 512)     = 512
3.1038:        close(8)                         = 0
kwrite(9, " <0611 k".., 60)     = 60
kread(10, " �06\b".., 2064)   = 179
kwrite(9, " U0603 h".., 85)     = 85
kread(10, "1606\b".., 2064)   = 22
kwrite(9, " U0603 h".., 85)     = 85
kread(10, "1606\b".., 2064)   = 22
3.1064:        lseek(4, 512, 0)                 = 512
kread(4, "17 �".., 512)     = 512
3.1069:        lseek(4, 1024, 0)                = 1024
kread(4, "16 * R h819E".., 512)     = 512
3.1073:        lseek(4, 4096, 0)                = 4096
kread(4, "\f82 P86".., 512)     = 512

kwrite(1, "\n", 1)                              = 1
Connected to:
kwrite(1, " C o n n e c t e d   t o".., 14)     = 14
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
kwrite(1, " O r a c l e   D a t a b".., 77)     = 77
With the Partitioning, OLAP, Data Mining and Real Application Testing options
kwrite(1, " W i t h   t h e   P a r".., 78)     = 78

kwrite(1, "\n", 1)                              = 1
3.1095:        kfcntl(1, F_GETFL, 0x0000000000000008) = 2
3.1099:        __libc_sbrk(0x0000000000030020)  = 0x000000001023C880
3.1105:        access("login.sql", 0)           Err#2  ENOENT
3.1109:        access("/oracle/app/product/10.2/db_1/sqlplus/admin/glogin.sql", 0) = 0
3.1112:        statfs("/oracle/app/product/10.2/db_1/sqlplus/admin/glogin.sql", 0x0FFFFFFFFFFFD9F0) = 0
跟踪oracle 进程链接建立时候发生的情况

从坏块中恢复未损坏的数据

前面介绍通过skip坏块(event或者dbms包)和构造rowid的方式,以及通过salvage脚本从存在坏块的表中恢复数据,同时也可以通过mdata恢复工具来unload出表中的数据,这里补充一个场景,就是坏块中未损坏的数据的恢复。

从已经损坏的块中尽可能的恢复数据的思路依旧参考的是通过rowid的方式来实现恢复,这里讨论oracle 8i以及8i以上版本的数据库.

 

rowid基本结构这里不做详细解释,前面有介绍过,这下面是官方的介绍。

       OOOOOO = is a base 64 encoding of the 32-bit dataobj# (Data object
                number was introduced in 8.0 to track versions of the same
                segment because certain operations can change the version.
                It is used to discover stale ROWIDs and stale undo records)
          FFF = is a base 64 encoding of the relative file number
       BBBBBB = is a base 64 encoding of the block number
          SSS = is a base 64 encoding of the slot (row) number

   Note that the ROWID contains the relative file number which is distinct from
   the absolute file number and new for Oracle8.  A relative file number is
   relative to the tablespace (meaning a tablespace can have a first, second,
   third file, etc.) and an absolute file number is absolute in the whole
   system.  Two different files may have the same relative number.

   例如 "SELECT ACCT_NO, ROWID from EXAMPLE;" 获得的如下结果:

   ACCT_NO    ROWID
   ---------- ------------------
        12345 AAAAh3AAGAAACJAAAA
        19283 AAAAh3AAGAAACJAAAB
        22345 AAAAh4AAFAAAAADAAA
        60372 AAAAh4AAFAAAAADAAB

     dbms_rowid构造rowid的函数:

              ROWID_CREATE(rowid_type    IN number,
                           object_number IN number,
                           relative_fno  IN number,
                           block_number  IN number,
                           row_number    IN number)
              return ROWID;

     -- rowid_type      - type (restricted=0/extended=1)
     -- object_number   - data object number
     -- relative_fno    - relative file number
     -- block_number    - block number in this file
     -- row_number      - row number in this block



    使用高低位的rowid来跳过坏块使用基于rowid的方式扫描出相关rows,这和上篇介绍的构造的rowid方式是一样的。

     The "LOW_RID" is the lowest rowid INSIDE the corrupt block:

       SELECT dbms_rowid.rowid_create(1,<DATA_OBJECT_ID>,<RFN>,<BL>,0) LOW_RID
	 from DUAL;

     The "HI_RID" is the first rowid AFTER the corrupt block:

       SELECT dbms_rowid.rowid_create(1,<DATA_OBJECT_ID>,<RFN>,<BL>+1,0) HI_RID
	 from DUAL;


	CREATE TABLE salvage_table AS
	 SELECT /*+ ROWID(A) */ * FROM <owner.tablename> A
	  WHERE rowid < '<low_rid>'
	;

	INSERT INTO salvage_table 
	 SELECT /*+ ROWID(A) */ * FROM <owner.tablename> A
	  WHERE rowid >= '<hi_rid>'
	;

    如果是分区表,则通过针对存在坏块的分区的方式即可

	CREATE TABLE salvage_table AS
	 SELECT /*+ ROWID(A) */ * 
	   FROM <owner.tablename> PARTITION (<partition_name>) A
	  WHERE rowid < '<lo_rid>'
	;

	INSERT INTO salvage_table 
	 SELECT /*+ ROWID(A) */ * 
	   FROM <owner.tablename> PARTITION (<partition_name>) A
	  WHERE rowid >= '<hi_rid>'
	;

   如果表中带有long字段需要使用exp/imp的方式来进行相关数据的导入导出,同时要带有where条件来帅选出rowid的范围。
   剩下就是要讨论的从损坏的块中恢复数据,当然如果坏块是段头那么就没必要从段头中恢复rows,可以通过如下的sql判断损坏的块是否表段头。
 
   select file_id,block_id,blocks,extent_id 
   from dba_extents 
   where owner='<owner>' 
     and segment_name='<table_name>' 
     and segment_type='TABLE'
   order by extent_id;

     FILE_ID  BLOCK_ID    BLOCKS EXTENT_ID
   --------- --------- --------- ---------
           8     94854     20780         0 <- EXTENT_ID 为0则代表坏块为段头


  Finding out about data in the Corrupt Block
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  构造rowid的方式根据块号可以构造出坏块里面row地址,前面有说明,如果表中存在索引,则可以尝试通过索引来判断坏块中row的信息,可以结合上面提到的坏块上下块的高低位rowid来扫描出坏块里的rows信息,具体如下:
 
  如果索引所在字段要求为非空,则可以通过以下sql的索引快速全扫描来实现坏块rowid的获取: 
	SELECT /*+ INDEX_FFS(X <index_name>) */ 
		<index_column1>, <index_column2> ...
	  FROM <tablename> X
	 WHERE rowid >= '<low_rid>'
	   AND rowid <  '<hi_rid>'
	;

如果索引所在的列允许null,则无法使用索引快速全扫描并且必须使用范围扫描。这需要知道确保领先索引列的最小可能值,具体sql如下:

	SELECT /*+ INDEX(X <index_name>) */ 
		<index_column1>, <index_column2> ...
	  FROM <tablename> X
	 WHERE rowid >= '<low_rid>'
	   AND rowid <  '<hi_rid>'
	   AND <index_column1> >= <min_col1_value>
	;

 

8i以上版本的恢复案例:

SQL> select * from scott.partitionexample;

	ORA-01578: ORACLE data block corrupted (file # 7, block # 12698)
	ORA-01110: data file 7: '/oracle1/oradata/V816/oradata/V816/users01.dbf'

    >>  <RFN> = 7 , <BL> = 12698 , <AFN> = 7

	SQL> SELECT tablespace_name, segment_type, owner, segment_name
               FROM dba_extents
              WHERE file_id =7
		AND 12698 between block_id AND block_id + blocks - 1 ;

        TABLESPACE_NAME  SEGMENT_TYPE       OWNER   SEGMENT_NAME
        ---------------  ------------       -----   ------------
        USERS            TABLE PARTITION    SCOTT   PARTITIONEXAMPLE

        SQL> SELECT partition_name FROM dba_extents
              WHERE file_id =7
	        AND 12698 between block_id AND block _id + blocks - 1;

	PARTITION_NAME
	------------------------------
	PARTEX2

        SQL> SELECT data_object_id
	       FROM dba_objects
	      WHERE object_name = 'PARTITIONEXAMPLE'  and owner='SCOTT'
	        AND subobject_name= 'PARTEX2';

	DATA_OBJECT_ID
 	---------------
     	88145
       
        找出高低位的rowid并扫描相关数据

	SQL> select dbms_rowid.rowid_create(1, 88145,7,12698,0) from dual;

	DBMS_ROWID.ROWID_C
	------------------
	AAAVhRAAHAAADGaAAA

	SQL>  select dbms_rowid.rowid_create(1, 88145,7,12699,0) from dual;

	DBMS_ROWID.ROWID_C
	------------------
	AAAVhRAAHAAADGbAAA

	SQL> SELECT /*+ ROWID(A) */ * 
	       FROM scott.partitionexample A
     	      WHERE rowid < 'AAAVhRAAHAAADGaAAA';

	COLUMN1    COLUMN2
	---------- ----------
	        15 a
	       ...

	SQL> SELECT /*+ INDEX_FFS(A PARTEXAM) */  column1
	       FROM  scott.partitionexample A
	      WHERE  rowid >= 'AAAVhRAAHAAADGaAAA'
		AND  rowid <  'AAAVhRAAHAAADGbAAA' ;
	
	COLUMN1
	----------
	        25
	...

 

 


 

 

REM NAME: salvage.sql
REM NOTE: This program will not handle long, long raw colums or chained rows.
SET ECHO OFF;  
SET SERVEROUTPUT ON;  
  
CREATE OR REPLACE FUNCTION dectohex(a IN NUMBER) RETURN VARCHAR2 IS  
      x VARCHAR2(8) := '';  
        y VARCHAR2(1);  
        z NUMBER;  
        w NUMBER;  
    BEGIN  
        IF a > POWER(2,32) OR a < 0  THEN  
            RAISE invalid_number;  
        END IF;  
        w := a;  
        WHILE w > 0 LOOP  
 z := w MOD 16;  
        IF z = 10 THEN y := 'A';  
        ELSIF z = 11 THEN y := 'B';  
        ELSIF z = 12 THEN y := 'C';  
        ELSIF z = 13 THEN y := 'D';  
     ELSIF z = 14 THEN y := 'E';  
        ELSIF z = 15 THEN y := 'F';  
        ELSE y := TO_CHAR(z);  
        END IF;  
            w := TRUNC(w / 16);  
            x := CONCAT(y,x);   -- build x string backwards  
        END LOOP;  
        RETURN x;  
    END;  
/  
  
  
DECLARE  
  
        table_name         VARCHAR2(30):= 'dept';  
   user_name          VARCHAR2(30):= 'scott';      
  
        file_id            NUMBER;  
        block_id           NUMBER;  
        extentid           number;  
    blocks             NUMBER;  
        file_id_hex        VARCHAR2(4);  
        block_id_hex       VARCHAR2(8);  
        row_count_hex      VARCHAR2(4);  
        row_count          NUMBER;  
        block_counter      NUMBER;  
        for_loop_counter   NUMBER;  
        corrupt_block_flag NUMBER:=0;  
        check_file NUMBER;  
        check_block        NUMBER;  
        row_id_hex         CHAR(18);  
        invalid_rowid      EXCEPTION;  
        rows_per_block     NUMBER:=100;  
  
    CURSOR c1 IS SELECT file_id, block_id, blocks,extent_id  FROM dba_extents  
       WHERE SEGMENT_NAME = UPPER(table_name) AND OWNER = UPPER(user_name)  
    ORDER BY EXTENT_ID;  
            
    CURSOR c2 IS SELECT file_id,block_id FROM corrupt_block;  
      
    PROCEDURE ins (v_rowid VARCHAR2) is  
      bad_rowid  EXCEPTION;  
      PRAGMA EXCEPTION_INIT (bad_rowid, -01410);  
      BEGIN  
   INSERT INTO salvaged_rows_table  
                SELECT *  
                FROM SCOTT.DEPT  
                WHERE rowid = v_rowid;  
  
      EXCEPTION  
         WHEN bad_rowid THEN  
         NULL;  
    END ins;  
  
BEGIN  
  OPEN c1;  
  LOOP   
    FETCH c1 INTO file_id, block_id, blocks,extentid;  
    EXIT WHEN c1%NOTFOUND;  
    dbms_output.put_line('extent: file:'||to_char(file_id)||'  
block_id:'||to_char(block_id)||' blocks:'||to_char(blocks));  
    file_id_hex := DECTOHEX(file_id);  
    block_counter:= block_id;  
    for_loop_counter:= block_id + blocks-1;  
    FOR i IN block_id..for_loop_counter LOOP  
      corrupt_block_flag:= 0;  
      OPEN c2;  
        LOOP  
          FETCH c2 INTO check_file, check_block;  
          EXIT WHEN c2%NOTFOUND;  
          IF (check_block = block_counter AND check_file = file_id) THEN  
             corrupt_block_flag :=1;  
               dbms_output.put_line('....Skipping corrupt  
block:'||to_char(block_counter)|| ' file id:'||to_char(file_id));  
          END IF;  
        END LOOP;    
      CLOSE c2;  
      block_id_hex := dectohex(block_counter);  
      IF corrupt_block_flag = 0 THEN  
        FOR row_count IN 0..rows_per_block LOOP  
            IF row_count = 0 THEN  
               row_count_hex := '0';  
            ELSE  
               row_count_hex:=dectohex(row_count);    
      END IF;  
            row_id_hex:=LPAD(block_id_hex,8,0) ||'.'||  
    LPAD(row_count_hex,4,0) ||'.'|| LPAD(file_id_hex,4,0);  
            ins(row_id_hex);  
        END LOOP;  
      END IF;  
      block_counter := block_counter+1;  
      COMMIT;  
    END LOOP;       
  END LOOP;  
CLOSE c1;  
COMMIT;  
END;  
/
SALVAGE.SQL – PL/SQL Code to Extract Data from a Corrupt Table

利用构造ROWID脚本实现无备份情况下绕过ORA-1578、8103、1410、00600 kdsgrp1 等坏块场景

利用构造rowid的方式来实现绕过坏块恢复正常的表数据库,也可以通过该脚本尝试读取坏块内部未损坏的数据,只是脚本需要修改。

 

SKIP ORA-1578 ORA-8103 ORA-1410


REM Create a new table based on the table that is producing errors with no rows:

create table <new table name>
as
select *
from   <original table name>
where  1=2;

REM Create the table to keep track of ROWIDs pointing to affected rows:

create table bad_rows (row_id rowid, oracle_error_code number);

set serveroutput on

DECLARE
TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;

CURSOR c1 IS select /*+ index_ffs(tab1 <index name>) parallel(tab1) */ rowid
from <original table name> tab1
where <indexed column> is NOT NULL
order by rowid;

r RowIDTab;
rows NATURAL := 20000;
bad_rows number := 0 ;
errors number;
error_code number;
myrowid rowid;
BEGIN
OPEN c1;
LOOP
FETCH c1 BULK COLLECT INTO r LIMIT rows;
EXIT WHEN r.count=0;
BEGIN
FORALL i IN r.FIRST..r.LAST SAVE EXCEPTIONS
insert into <new table name>
select /*+ ROWID(A) */ <list of columns from table (ie col1, col2,..)>
from <original table name> A where rowid = r(i);
EXCEPTION
when OTHERS then
BEGIN
errors := SQL%BULK_EXCEPTIONS.COUNT;
FOR err1 IN 1..errors LOOP
error_code := SQL%BULK_EXCEPTIONS(err1).ERROR_CODE;
if error_code in (1410, 8103, 1578) then
myrowid := r(SQL%BULK_EXCEPTIONS(err1).ERROR_INDEX);
bad_rows := bad_rows + 1;
insert into bad_rows values(myrowid, error_code);
else
raise;
end if;
END LOOP;
END;
END;
commit;
END LOOP;
commit;
CLOSE c1;
dbms_output.put_line('Total Bad Rows: '||bad_rows);
END;
/

Notes:

  • Replace the next values in the plsql script by the values of the affected table: <index name>, <original table name>, <indexed column>, <list of columns from table (ie col1, col2,..)>
  • <index name> should be replaced by preferrable a Primary Key index.
  • If a Primary Key index is used, remove the where condition:  “where <indexed column> is NOT NULL”
  • The “order by rowid” clause is to get the rows ordered by blocks so a batch of Inserts finds the block in the buffer cache.  Otherwise, one rowid can bring one block from the table and insert just one row, the next rowid brings a different row and insert one row as opposed of finding the same previous block.
  • The idea is to get the rowid’s from an existent index, then get all the columns from the table for each rowid and insert these rows into the new table. Using the “index” hint, allows the optimizer to choose the most appropriated index to scan the table based on the indexed column.
  • Make sure that the select in the plsql is using an index. One way to verify if the index is used is to get an execution plan from sqlplus:

explain plan for
select /*+ index_ffs(tab1 <index name>) parallel(tab1) */ rowid
from <original table name> tab1
where <indexed column> is NOT NULL
order by rowid;

set lines 200
@?/rdbms/admin/utlxplp</div>


  • Note that the plsql executes an INSERT for 20000 rows and COMMIT. If it is required to change this, adjust the value of rows. e.g.:
rows NATURAL := 50000; -> to insert 50000 rows in one execution of INSERT and commit every 50000 records.
  • If 'Total Bad Rows:' displays 0 and it is known for certain that there is a block incorrect on disk that is causing the ORA-8103, , then it means that the block is empty (no rows) and there is not data loss.

SKIP ORA-600 in a Table

 

This is useful when the ORA-600 is produced by a non-existent chained row (invalid nrid) like ORA-600 [kdsgrp1] and when event 10231 does not work.


drop table bad_rows;
create table bad_rows (row_id ROWID
,oracle_error_code number);

rem Create the new empty table:

create table &&new_table
as select *
from &&affected_table
where 1=2;


set serveroutput on
declare
n number:=0;
bad_rows number := 0;
error_code number;
ora600 EXCEPTION;
PRAGMA EXCEPTION_INIT(ora600, -600);
begin
for i in (select rowid rid from  &&affected_table)  loop
begin
insert into  &&new_table
select *
from &&affected_table
where rowid=i.rid;
n:=n+1;
exception
when ora600 then
bad_rows := bad_rows + 1;
insert into bad_rows values(i.rid,600);
commit;
when others then
error_code:=SQLCODE;
bad_rows := bad_rows + 1;
insert into bad_rows values(i.rid,error_code);
commit;
end;
end loop;
dbms_output.put_line('Total Bad Rows: '||bad_rows);
dbms_output.put_line('Total Good rows: '||n);
end;
/

SKIP ORA-600 in IOT

This is useful when the ORA-600 is produced by a non-existent chained row (invalid nrid) like ORA-600 [kdsgrp1] and when event 10231 does not work for an Index Organized Table (IOT).


drop table bad_rows;
create table bad_rows (row_id UROWID
,oracle_error_code number);

rem Create the new empty table:

create table &&new_table
as select *
from &&affected_table
where 1=2;


set serveroutput on
declare
n number:=0;
bad_rows number := 0;
error_code number;
ora600 EXCEPTION;
PRAGMA EXCEPTION_INIT(ora600, -600);
begin
for i in (select /*+ INDEX_FFS(IOT_1) */ rowid rid from  &&affected_table IOT_1)  loop
begin
insert into  &&new_table
select *
from &&affected_table
where rowid=i.rid;
n:=n+1;
exception
when ora600 then
bad_rows := bad_rows + 1;
insert into bad_rows values(i.rid,600);
commit;
when others then
error_code:=SQLCODE;
bad_rows := bad_rows + 1;
insert into bad_rows values(i.rid,error_code);
commit;
end;
end loop;
dbms_output.put_line('Total Bad Rows: '||bad_rows);
dbms_output.put_line('Total Good rows: '||n);
end;
/