Skip to content

未分类 - 9. page

使用IMPDP导入数据时,通常我们会设置并行,希望数据库在导入数据、创建索引等耗时间长的动作中能够使用并行技术来加快导入动作的执行速度;但是在12.2版本上,IMPDP已经设置了并行,但是在trace中发现索引创建始终使用串行,而不是并行。

SQL> conn test/test@PDB1
SQL> create table a(m number,n number) parallel 4;
SQL> create index a_ind on a(m) parallel 3;
SQL> !expdp test/test@PDB1 dumpfile=b.dmp directory=my_dir
SQL> !impdp test/test@PDB1 directory=my_dir dumpfile=b.dmp parallel=2 TRACE=480301

DW trace显示index创建使用的是parallel=1,而不是parallel=2!。

CDB2_dw00_6658.trc
=====================
PARSING IN CURSOR #140037561274968 len=170 dep=2 uid=79 oct=1 lid=79
tim=841576694 hv=1135291776 ad=’61d1c0b0′ sqlid=’apjngud1uqbc0′
CREATE TABLE “TEST”.”A” (“M” NUMBER, “N” NUMBER) SEGMENT CREATION DEFERRED
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING TABLESPACE “USERS” PARALLEL 4
END OF STMT

=====================
PARSING IN CURSOR #140037558550112 len=114 dep=2 uid=79 oct=9 lid=79
tim=842113538 hv=68235534 ad=’6374e7a8′ sqlid=’0u96wjh212c8f’
CREATE INDEX “TEST”.”A_IND” ON “TEST”.”A” (“M”) PCTFREE 10 INITRANS 2
MAXTRANS 255 TABLESPACE “USERS” PARALLEL 1 <=======PARALLEL 1, even if parallel=3 was set during index creation phase
END OF STMT

但是在12.1.0.2却没有这个问题,该并行并行:

R1201_dw00_29326.trc :
=====================
PARSING IN CURSOR #140427279060200 len=115 dep=2 uid=111 oct=9 lid=111
tim=8385394705 hv=1693801083 ad=’77900cf8′ sqlid=’3t4ktqdkgaqmv’
CREATE INDEX “TEST”.”A_IND” ON “TEST”.”A” (“M”) PCTFREE 10 INITRANS 2
MAXTRANS 255 TABLESPACE “SYSTEM” PARALLEL 2 <========PARALLEL 2
END OF STMT

这是ORACLE bug,ORACLE开发的解释是,这是期待的行为,因为,“我们发现这样更快”!

BUG 26091146 – IMPDP CREATE INDEX WITH PARALLEL 1 IGNORING COMMAND LINE PARALLEL=2, Development explained that this is an expected behavior supplying the following explanation:

“General support for parallel import of most object type, including indexes, is a 12.2 feature, which led to study of parallel creation of individual indexes. What was found was that using parallel index creation was generally slower than non-parallel. That led to a decision to backport the change to not use parallel index creation.”

因为在12.2新feature的开发过程中,我们研究了一下impdp时的index的创建,发现“一般情况下”串行比并行建索引更快,所以我们决定把impdp时的索引都改成串行创建,并且在创建完成后,再使用’ALTER INDEX … PARALLEL n’ 设置索引的并行度,以实现查询时的并行效果。

所以,ORACLE开发不认为是BUG;因此在导入数据时,需要注意这个地方,如果时间紧急,要考虑其他方式的并行建索引(如IMPDP导入不建索引,导出建索引SQL文本并人工并行执行等方式灵活处理)。

Oracle在12.2版本使用IMPDP并行导入数据时未并行建索引的分析

关于重建索引有用与否的讨论有很多。一般而言,极少需要重建 B 树索引,基本原因是 B 树索引很大程度上可以自我管理或自我平衡。

认为需要重建索引的最常见理由有:
– 索引碎片在不断增加
– 索引不断增加,删除的空间没有重复使用
– 索引 clustering factor (群集因子)不同步

事实上,大多数索引都能保持平衡和完整,因为空闲的叶条目可以重复使用。插入/更新和删除操作确实会导致索引块周围的可用空间形成碎片,但是一般来说这些碎片都会被正确的重用。
Clustering factor 群集因子可以反映给定的索引键值所对应的表中的数据排序情况。重建索引不会对群集因子产生影响,要改变集群因子只能通过重组表的数据。

另外,重建索引的影响非常明显,请仔细阅读以下说明:

1. 大多数脚本都依赖 index_stats 动态表。此表使用以下命令填充:

analyze index ... validate structure;

尽管这是一种有效的索引检查方法,但是它在分析索引时会获取独占表锁。特别对于大型索引,它的影响会是巨大的,因为在此期间不允许对表执行 DML 操作。虽然该方法可以在不锁表的情况下在线运行,但是可能要消耗额外的时间。

2. 重建索引的直接结果是 REDO 活动可能会增加,总体的系统性能可能会受到影响。

插入/更新/删除操作会导致索引随着索引的分割和增长不断发展。重建索引后,它将连接的更为紧凑;但是,随着对表不断执行 DML 操作,必须再次分割索引,直到索引达到平衡为止。结果,重做活动增加,且索引分割更有可能对性能产生直接影响,因为我们需要将更多的 I/O、CPU 等用于索引重建。经过一段时间后,索引可能会再次遇到“问题”,因此可能会再被标记为重建,从而陷入恶性循环。因此,通常最好是让索引处于自然平衡和(或)至少要防止定期重建索引。

3. 通常是优先考虑index coalesce(索引合并),而不是重建索引。索引合并有如下优点:

- 不需要占用近磁盘存储空间 2 倍的空间
- 可以在线操作
- 无需重建索引结构,而是尽快地合并索引叶块,这样可避免系统开销过大,请见第 2 点中的解释。

注意:例如,如要将索引转移到其他表空间,则需要重建索引。

综上所述,强烈建议不要定期重建索引,而应使用合适的诊断工具,下面分享一个Oracle推荐的分析索引结构的脚本以及使用建议:

 

1). 创建一个将包含索引统计信息表的用户

2). 为此用户分配“dba”权限。并授予选择dba_tablespaces;<user>

3). 执行位于脚本部分中的代码。

如果脚本以 SYS 以外的用户身份运行,则在创建包正文时可能会遇到 ORA-942 错误。

使用定义者权限调用过程时,角色将丢失,因此除非显式授予以下 SELECT 权限,否则创建包正文将失败:

grant select on dba_tablespaces to <user>;
grant select on dba_indexes to <user>;
grant select on dba_tables to <user>;
grant select on dba_ind_columns to <user>;
grant select on dba_tab_cols to <user>;
grant select on dba_objects to <user>;
grant select on v_$parameter to <user>;

并且执行分析脚本前需要分析索引的统计信息,以下为例:

 

SQL> exec dbms_stats.gather_schema_stats(‘SCOTT’);

SQL> exec index_util.inspect_schema (‘SCOTT’);

 

脚本样例:

 

CREATE TABLE index_log (
 owner          VARCHAR2(30),
 index_name     VARCHAR2(30),
 last_inspected DATE,
 leaf_blocks    NUMBER,    
 target_size    NUMBER,
 idx_layout     CLOB);

ALTER TABLE index_log ADD CONSTRAINT pk_index_log PRIMARY KEY (owner,index_name);

CREATE TABLE index_hist (
 owner          VARCHAR2(30),
 index_name     VARCHAR2(30),
 inspected_date DATE,
 leaf_blocks    NUMBER,    
 target_size    NUMBER,
 idx_layout     VARCHAR2(4000));

ALTER TABLE index_hist ADD CONSTRAINT pk_index_hist PRIMARY KEY  (owner,index_name,inspected_date);

--
-- Variables:
--  vMinBlks: Specifies the minimum number of leaf blocks for scanning the index
--            Indexes below this number will not be scanned/reported on
--  vScaleFactor: The scaling factor, defines the threshold of the estimated leaf block count
--                to be smaller than the supplied fraction of the current size.
--  vTargetUse : Supplied percentage utilisation. For example 90% equates to the default pctfree 10
--  vHistRet : Defines the number of records to keep in the INDEX_HIST table for each index entry
--

CREATE OR REPLACE PACKAGE index_util AUTHID CURRENT_USER IS
vMinBlks     CONSTANT POSITIVE := 1000;
vScaleFactor CONSTANT NUMBER := 0.6;
vTargetUse   CONSTANT POSITIVE := 90;  -- equates to pctfree 10  
vHistRet     CONSTANT POSITIVE := 10;  -- (#) records to keep in index_hist
 procedure inspect_schema (aSchemaName IN VARCHAR2);
 procedure inspect_index (aIndexOwner IN VARCHAR2, aIndexName IN VARCHAR2, aTableOwner IN VARCHAR2, aTableName IN VARCHAR2, aLeafBlocks IN NUMBER);
END index_util;
/

CREATE OR REPLACE PACKAGE BODY index_util IS
procedure inspect_schema (aSchemaName IN VARCHAR2) IS
 begin
 FOR r IN (select table_owner, table_name, owner index_owner, index_name, leaf_blocks
           from dba_indexes  
           where owner = upper(aSchemaname)
             and index_type in ('NORMAL','NORMAL/REV','FUNCTION-BASED NORMAL')
             and partitioned = 'NO'  
             and temporary = 'N'  
             and dropped = 'NO'  
             and status = 'VALID'  
             and last_analyzed is not null  
           order by owner, table_name, index_name) LOOP

   IF r.leaf_blocks > vMinBlks THEN
   inspect_index (r.index_owner, r.index_name, r.table_owner, r.table_name, r.leaf_blocks);
   END IF;
  END LOOP;
 commit;
end inspect_schema;
procedure inspect_index (aIndexOwner IN VARCHAR2, aIndexName IN VARCHAR2, aTableOwner IN VARCHAR2, aTableName IN VARCHAR2, aLeafBlocks IN NUMBER) IS
 vLeafEstimate number;  
 vBlockSize    number;
 vOverhead     number := 192; -- leaf block "lost" space in index_stats
 vIdxObjID     number;
 vSqlStr       VARCHAR2(4000);
 vIndxLyt      CLOB;
 vCnt          number := 0;
  TYPE IdxRec IS RECORD (rows_per_block number, cnt_blocks number);
  TYPE IdxTab IS TABLE OF IdxRec;
  l_data IdxTab;
begin  
  select a.block_size into vBlockSize from dba_tablespaces a,dba_indexes b where b.index_name=aIndexName and b.owner=aIndexOwner and a.tablespacE_name=b.tablespace_name;
 select round (100 / vTargetUse *       -- assumed packing efficiency
              (ind.num_rows * (tab.rowid_length + ind.uniq_ind + 4) + sum((tc.avg_col_len) * (tab.num_rows) )  -- column data bytes  
              ) / (vBlockSize - vOverhead)  
              ) index_leaf_estimate  
   into vLeafEstimate  
 from (select  /*+ no_merge */ table_name, num_rows, decode(partitioned,'YES',10,6) rowid_length  
       from dba_tables
       where table_name  = aTableName  
         and owner       = aTableOwner) tab,  
      (select  /*+ no_merge */ index_name, index_type, num_rows, decode(uniqueness,'UNIQUE',0,1) uniq_ind  
       from dba_indexes  
       where table_owner = aTableOwner  
         and table_name  = aTableName  
         and owner       = aIndexOwner  
         and index_name  = aIndexName) ind,  
      (select  /*+ no_merge */ column_name  
       from dba_ind_columns  
       where table_owner = aTableOwner  
         and table_name  = aTableName
         and index_owner = aIndexOwner   
         and index_name  = aIndexName) ic,  
      (select  /*+ no_merge */ column_name, avg_col_len  
       from dba_tab_cols  
       where owner = aTableOwner  
         and table_name  = aTableName) tc  
 where tc.column_name = ic.column_name  
 group by ind.num_rows, ind.uniq_ind, tab.rowid_length;

 IF vLeafEstimate < vScaleFactor * aLeafBlocks THEN
  select object_id into vIdxObjID
  from dba_objects  
  where owner = aIndexOwner
    and object_name = aIndexName;
   vSqlStr := 'SELECT rows_per_block, count(*) blocks FROM (SELECT /*+ cursor_sharing_exact ' ||
             'dynamic_sampling(0) no_monitoring no_expand index_ffs(' || aTableName ||
             ',' || aIndexName || ') noparallel_index(' || aTableName ||
             ',' || aIndexName || ') */ sys_op_lbid(' || vIdxObjID ||
             ', ''L'', ' || aTableName || '.rowid) block_id, ' ||
             'COUNT(*) rows_per_block FROM ' || aTableOwner || '.' || aTableName || ' GROUP BY sys_op_lbid(' ||
             vIdxObjID || ', ''L'', ' || aTableName || '.rowid)) group by rows_per_block order by rows_per_block';
   execute immediate vSqlStr BULK COLLECT INTO l_data;
  vIndxLyt := '';

   FOR i IN l_data.FIRST..l_data.LAST LOOP
    vIndxLyt := vIndxLyt || l_data(i).rows_per_block || ' - ' || l_data(i).cnt_blocks || chr(10);
   END LOOP;

   select count(*) into vCnt from index_log where owner = aIndexOwner and index_name = aIndexName;

   IF vCnt = 0   
    THEN insert into index_log values (aIndexOwner, aIndexName, sysdate, aLeafBlocks, round(vLeafEstimate,2), vIndxLyt);
    ELSE vCnt := 0;

         select count(*) into vCnt from index_hist where owner = aIndexOwner and index_name = aIndexName;

         IF vCnt >= vHistRet THEN
           delete from index_hist
           where owner = aIndexOwner
             and index_name = aIndexName
             and inspected_date = (select MIN(inspected_date)
                                   from index_hist
                                   where owner = aIndexOwner
                                     and index_name = aIndexName);
         END IF;

          insert into index_hist select * from index_log where owner = aIndexOwner and index_name = aIndexName;

         update index_log  
         set last_inspected = sysdate,
             leaf_blocks = aLeafBlocks,
             target_size = round(vLeafEstimate,2),
             idx_layout = vIndxLyt
        where owner = aIndexOwner and index_name = aIndexName;

   END IF;
  END IF;
 END inspect_index;
END index_util;
/

脚本执行输出的样例:

 

SQL> select owner, index_name, last_inspected, leaf_blocks, target_size 

  from index_log

OWNER INDEX_NAME LAST_INSP LEAF_BLOCKS TARGET_SIZE

------------------------------ ------------------------------ --------- ----------- -----------

SYS I_ARGUMENT1 17-JUN-10 432 303

SYS I_ARGUMENT2 17-JUN-10 282 186

SYS I_COL1 17-JUN-10 288 182

SYS I_DEPENDENCY1 17-JUN-10 109 103

SYS I_DEPENDENCY2 17-JUN-10 136 95

SYS I_H_OBJ#_COL# 17-JUN-10 258 104

SYS WRH$_SQL_PLAN_PK 17-JUN-10 118 59

SYS WRI$_ADV_PARAMETERS_PK 17-JUN-10 210 121

SYS I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST 17-JUN-10 2268 1313

SYS I_WRI$_OPTSTAT_H_ST 17-JUN-10 1025 963

SYS I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST 17-JUN-10 338 191

 

SQL> select idx_layout from index_log where owner='SCOTT' and index_name='T_IDX';



IDX_LAYOUT 

------------

104 - 1 

204 - 1 

213 - 1 

219 - 1 

221 - 2 

222 - 1 

223 - 2 

224 - 1 

225 - 1 

230 - 1 

231 - 3 

235 - 3 

236 - 1 

238 - 3 

239 - 2 

241 - 1 

242 - 2 

243 - 1 

245 - 3 

247 - 1 

249 - 1 

250 - 1 

252 - 3 

255 - 1 

257 - 2 

263 - 2 

264 - 1 

267 - 1 

268 - 1 

276 - 1 

283 - 1 

296 - 1 

345 - 1
SQL> select to_char(inspected_date,'DD-MON-YYYY HH24:MI:SS') inspected_date, 

leaf_blocks, target_size 

from index_hist 

where index_name='T_IDX';



INSPECTED_DATE LEAF_BLOCKS TARGET_SIZE

-------------------- ----------- -----------

10-MAR-2010 10:04:04 432 303

10-APR-2010 10:04:03 435 430

10-MAY-2010 10:04:02 431 301

 

 

 

 

 

 

索引重建的必要性与影响以及相关分析索引结构的脚本

1.CurrentActivity.sql

 

set lines 150
set pages 999
clear col

set termout off
set trimout on
set trimspool on

REM
REM Current transactions
REM
REM Will show only last transaction by a user
REM
REM May need to use 786472.1 for better picture
REM of activity

connect / as sysdba
alter session set nls_date_format=’dd-Mon-yyyy hh24:mi’;

col username format a10 wrapped heading “User”
col name format a22 wrapped heading “Undo Segment Name”
col xidusn heading “Undo|Seg #”
col xidslot heading “Undo|Slot #”
col xidsqn heading “Undo|Seq #”
col ubafil heading “File #”
col ubablk heading “Block #”
col start_time format a10 word_wrapped heading “Started”
col status format a8 heading “Status”
col blk format 999,999,999 heading “KBytes”
col used_urec heading “Rows”

spool undoactivity.out

prompt
prompt ############## RUNTIME ##############
prompt

col rdate head “Run Time”

select sysdate rdate from dual;

prompt
prompt ############## Current Uncommitted Transactions ##############
prompt

select start_time, username, r.name,
ubafil, ubablk, t.status, (used_ublk*p.value)/1024 blk, used_urec
from v$transaction t, v$rollname r, v$session s, v$parameter p
where xidusn=usn
and s.saddr=t.ses_addr
and p.name=’db_block_size’
order by 1;
spool off

set termout on
set trimout off
set trimspool off
clear col

 

2.LobData.sql

 

REM List table and column information for LOBs for a specific user
REM
REM UNDO handling with LOBs is not designed for frequent updates
REM Frequent updates are best handled with PCTVERSION at 100
REM This means you must have a lot of space available in the LOB
REM tablespace as all UNDO will be maintained over time.
REM
REM Using RETENTION does not work as expected
REM It is set to UNDO_RETENTION at the time of the creation of the
REM object. It does not change over time as UNDO_RETENTION
REM or auto-tuned undo retention changes.

set pages 999
set lines 110

spool lobdata.out

col column_name format a25 head “Column”
col table_name format a25 head “Table”
col tablespace_name format a25 head “Tablespace”
col pctversion format 999 head “PCTVersion %”
col segment_space_management format a30 head “Space|Mngmnt”
col retention format 999,999,999 head “Retention”

select l.table_name, l.column_name, l.tablespace_name, l.pctversion, l.retention,
t.segment_space_management
from dba_lobs l, dba_tablespaces t
where owner=upper(‘&USER’)
and l.tablespace_name = t.tablespace_name
/

spool off

 

3. undoconfig.sql

spool UndoConfig.out

ttitle off
set pages 999
set lines 150
set verify off

set termout off
set trimout on
set trimspool on

REM
REM ————————————————————————

REM
REM —————————————————————–
REM

set space 2

REM REPORTING TABLESPACE INFORMATION:
REM
REM This looks at Tablespace Sizing – Total bytes and free bytes
REM

column tablespace_name format a30 heading ‘TS Name’
column sbytes format 9,999,999,999 heading ‘Total MBytes’
column fbytes format 9,999,999,999 heading ‘Free MBytes’
column file_name format a30 heading ‘File Name’
column kount format 999 heading ‘Ext’

compute sum of fbytes on tablespace_name
compute sum of sbytes on tablespace_name
compute sum of sbytes on report
compute sum of fbytes on report

break on tablespace_name skip 2

select a.tablespace_name, a.file_name, round(a.bytes/1024/1024,0) sbytes,
round(sum(b.bytes/1024/1024),0) fbytes, count(*) kount, autoextensible
from dba_data_files a, dba_free_space b
where a.file_id = b.file_id
and a.tablespace_name in (select z.tablespace_name from dba_tablespaces z where retention like ‘%GUARANTEE’)
group by a.tablespace_name, a.file_name, a.bytes, autoextensible
order by a.tablespace_name
/

set linesize 160

REM
REM If you can significantly reduce physical reads by adding incremental
REM data buffers…do it. To determine whether adding data buffers will
REM help, set db_block_lru_statistics = TRUE and
REM db_block_lru_extended_statistics = TRUE in the init.ora parameters.
REM You can determine how many extra hits you would get from memory as
REM opposed to physical I/O from disk. **NOTE: Turning these on will
REM impact performance. One shift of statistics gathering should be enough
REM to get the required information.
REM

REM
REM —————————————————————–
REM

set lines 160

col tablespace_name format a30 heading “Tablespace”
col tb format a15 heading “TB Status”
col df format a10 heading “DF Status”
col extent_management format a15 heading “Extent|Management”
col allocation_type format a8 heading “Type”
col segment_space_management format a7 heading “Auto|Segment”
col retention format a11 heading “Retention|Level”
col autoextensible format a5 heading “Auto?”
col mx format 999,999,999 heading “Max Allowed”

select t.tablespace_name, t.status tb, d.status df,
extent_management, allocation_type, segment_space_management, retention,
autoextensible, (maxbytes/1024/1024) mx
from dba_tablespaces t, dba_data_files d
where t.tablespace_name = d.tablespace_name
and retention like ‘%GUARANTEE’
/

col status format a20 head “Status”
col cnt format 999,999,999 head “How Many?”

select status, count(*) cnt
from dba_rollback_segs
group by status
/

spool off

set termout on
set trimout off
set trimspool off

4.undodatafiles.sql

set lines 120
set pages 999
clear col

set termout off
set trimout on
set trimspool on

connect / as sysdba
alter session set nls_date_format=’dd-Mon-yyyy hh24:mi’;

spool undodatafiles.out

prompt
prompt ############## RUNTIME ##############
prompt

col rdate head “Run Time”

select sysdate rdate from dual;

prompt
prompt ############## DATAFILES ##############
prompt

col retention head “Retention”
col tablespace_name format a30 head “TBSP Name”
col file_id format 999 head “File #”
col a format 999,999,999,999,999 head “Bytes Alloc (MB)”
col b format 999,999,999,999,999 head “Max Bytes Used (MB)”
col autoextensible head “Auto|Ext”
col extent_management head “Ext Mngmnt”
col allocation_type head “Type”
col segment_space_management head “SSM”

select tablespace_name, file_id, sum(bytes)/1024/1024 a,
sum(maxbytes)/1024/1024 b,
autoextensible
from dba_data_files
where tablespace_name in (select tablespace_name from dba_tablespaces
where retention like ‘%GUARANTEE’ )
group by file_id, tablespace_name, autoextensible
order by tablespace_name
/

spool off
set termout on
set trimout off
set trimspool off
clear col

5.undoextends.sql

spool UndoExts.out

ttitle off
set pages 999
set lines 150
set verify off

set termout off
set trimout on
set trimspool on

REM
REM ————————————————————————

REM
REM —————————————————————–
REM

REM
REM REPORTING UNDO EXTENTS INFORMATION:
REM
REM —————————————————————–
REM
REM Undo Extents breakdown information
REM

ttitle center “Rollback Segments Breakdown” skip 2

col status format a20
col cnt format 999,999,999 head “How Many?”

select status, count(*) cnt from dba_rollback_segs
group by status
/

ttitle center “Undo Extents” skip 2

col segment_name format a30 heading “Name”
col “ACT BYTES” format 999,999,999,999 head “Active|Extents”
col “UNEXP BYTES” format 999,999,999,999 head “Unxpired|Extents”
col “EXP BYTES” format 999,999,999,999 head “Expired|Extents”

select segment_name,
nvl(sum(act),0) “ACT BYTES”,
nvl(sum(unexp),0) “UNEXP BYTES”,
nvl(sum(exp),0) “EXP BYTES”
from (
select segment_name,
nvl(sum(bytes),0) act,00 unexp, 00 exp
from DBA_UNDO_EXTENTS
where status=’ACTIVE’ group by segment_name
union
select segment_name,
00 act, nvl(sum(bytes),0) unexp, 00 exp
from DBA_UNDO_EXTENTS
where status=’UNEXPIRED’ group by segment_name
union
select segment_name,
00 act, 00 unexp, nvl(sum(bytes),0) exp
from DBA_UNDO_EXTENTS
where status=’EXPIRED’ group by segment_name
) group by segment_name;

ttitle center “Undo Extents Statistics” skip 2

col size format 999,999,999,999 heading “Size”
col “HOW MANY” format 999,999,999 heading “How Many?”
col st heading a12 heading “Status”

select distinct status st, count(*) “HOW MANY”, sum(bytes) “SIZE”
from dba_undo_extents
group by status
/

col segment_name format a30 heading “Name”
col TABLESPACE_NAME for a20
col BYTES for 999,999,999,999
col BLOCKS for 999,999,999
col status for a15 heading “Status”
col segment_name heading “Segment”
col extent_id heading “ID”

select SEGMENT_NAME, TABLESPACE_NAME, EXTENT_ID,
FILE_ID, BLOCK_ID, BYTES, BLOCKS, STATUS
from dba_undo_extents
order by 1,3,4,5
/

REM
REM —————————————————————–
REM
REM Undo Extents Contention breakdown
REM Take out column TUNED_UNDORETENTION if customer
REM prior to 10.2.x
REM
REM The time frame can be adjusted with this query
REM By default using around 4 hour window of time
REM
REM Ex.
REM Using sysdate-.04 looking at the last hour
REM Using sysdate-.16 looking at the last 4 hours
REM Using sysdate-.32 looking at the last 8 hours
REM Using sysdate-1 looking at the last 24 hours
REM

set linesize 140

ttitle center “Undo Extents Error Conditions (Default – Last 4 Hours)” skip 2

col UNXPSTEALCNT format 999,999,999 heading “# Unexpired|Stolen”
col EXPSTEALCNT format 999,999,999 heading “# Expired|Reused”
col SSOLDERRCNT format 999,999,999 heading “ORA-1555|Error”
col NOSPACEERRCNT format 999,999,999 heading “Out-Of-space|Error”
col MAXQUERYLEN format 999,999,999 heading “Max Query|Length”
col TUNED_UNDORETENTION format 999,999,999 heading “Auto-Ajusted|Undo Retention”
col hours format 999,999 heading “Tuned|(HRs)”

select inst_id, to_char(begin_time,’MM/DD/YYYY HH24:MI’) begin_time,
UNXPSTEALCNT, EXPSTEALCNT , SSOLDERRCNT, NOSPACEERRCNT, MAXQUERYLEN,
TUNED_UNDORETENTION, TUNED_UNDORETENTION/60/60 hours
from gv$undostat
where begin_time between (sysdate-.16)
and sysdate
order by inst_id, begin_time
/

spool off
set termout on
set trimout off
set trimspool off

6.undoheathcheck.sql

spool Undohealth.out

ttitle off
set pages 999
set lines 150
set verify off
set termout off
set trimout on
set trimspool on

REM
REM ————————————————————————

col name format a30
col gets format 9,999,999
col waits format 9,999,999

PROMPT ROLLBACK HIT STATISTICS:
REM

REM GETS – # of gets on the rollback segment header
REM WAITS – # of waits for the rollback segment header

set head on;

select name, waits, gets
from v$rollstat, v$rollname
where v$rollstat.usn = v$rollname.usn
/

col pct head “< 2% ideal” select ‘The average of waits/gets is ‘|| round((sum(waits) / sum(gets)) * 100,2)||’%’ PCT From v$rollstat / PROMPT REDO CONTENTION STATISTICS: REM REM If the ratio of waits to gets is more than 1% or 2%, consider REM creating more rollback segments REM REM Another way to gauge rollback contention is: REM column xn1 format 9999999 column xv1 new_value xxv1 noprint select class, count from v$waitstat where class in (‘system undo header’, ‘system undo block’, ‘undo header’, ‘undo block’ ) / set head off select ‘Total requests = ‘||sum(count) xn1, sum(count) xv1 from v$waitstat / select ‘Contention for system undo header = ‘|| (round(count/(&xxv1+0.00000000001),4)) * 100||’%’ from v$waitstat where class = ‘system undo header’ / select ‘Contention for system undo block = ‘|| (round(count/(&xxv1+0.00000000001),4)) * 100||’%’ from v$waitstat where class = ‘system undo block’ / select ‘Contention for undo header = ‘|| (round(count/(&xxv1+0.00000000001),4)) * 100||’%’ from v$waitstat where class = ‘undo header’ / select ‘Contention for undo block = ‘|| (round(count/(&xxv1+0.00000000001),4)) * 100||’%’ from v$waitstat where class = ‘undo block’ / REM REM NOTE: Not as useful with AUM configured REM REM If the percentage for an area is more than 1% or 2%, consider REM creating more rollback segments. Note: This value is usually very REM small REM and has been rounded to 4 places. REM REM ———————————————————————— REM REM The following shows how often user processes had to wait for space in REM the redo log buffer: select name||’ = ‘||value from v$sysstat where name = ‘redo log space requests’ / REM REM This value should be near 0. If this value increments consistently, REM processes have had to wait for space in the redo buffer. If this REM condition exists over time, increase the size of LOG_BUFFER in the REM init.ora file in increments of 5% until the value nears 0. REM ** NOTE: increasing the LOG_BUFFER value will increase total SGA size. REM REM ———————————————————————– col name format a15 col gets format 9999999 col misses format 9999999 col immediate_gets heading ‘IMMED GETS’ format 9999999 col immediate_misses heading ‘IMMED MISS’ format 9999999 col sleeps format 999999 PROMPT LATCH CONTENTION: REM REM GETS – # of successful willing-to-wait requests for a latch REM MISSES – # of times an initial willing-to-wait request was unsuccessful REM IMMEDIATE_GETS – # of successful immediate requests for each latch REM IMMEDIATE_MISSES = # of unsuccessful immediate requests for each latch REM SLEEPS – # of times a process waited and requests a latch after an REM initial willing-to-wait request REM REM If the latch requested with a willing-to-wait request is not REM available, the requesting process waits a short time and requests REM again. REM If the latch requested with an immediate request is not available, REM the requesting process does not wait, but continues processing REM set head on select name, gets, misses, immediate_gets, immediate_misses, sleeps from v$latch where name in (‘redo allocation’, ‘redo copy’) / set head off select ‘Ratio of MISSES to GETS: ‘|| round((sum(misses)/(sum(gets)+0.00000000001) * 100),2)||’%’ from v$latch where name in (‘redo allocation’, ‘redo copy’) / select ‘Ratio of IMMEDIATE_MISSES to IMMEDIATE_GETS: ‘|| round((sum(immediate_misses)/ (sum(immediate_misses+immediate_gets)+0.00000000001) * 100),2)||’%’ from v$latch where name in (‘redo allocation’, ‘redo copy’) / set head on REM REM If either ratio exceeds 1%, performance will be affected. REM REM Decreasing the size of LOG_SMALL_ENTRY_MAX_SIZE reduces the number of REM processes copying information on the redo allocation latch. REM REM Increasing the size of LOG_SIMULTANEOUS_COPIES will reduce contention REM for redo copy latches. REM REM —————————————————————– REM This looks at overall i/o activity against individual REM files within a tablespace REM REM Look for a mismatch across disk drives in terms of I/O REM REM Also, examine the Blocks per Read Ratio for heavily accessed REM TSs – if this value is significantly above 1 then you may have REM full tablescans occurring (with multi-block I/O) REM REM If activity on the files is unbalanced, move files around to balance REM the load. Should see an approximately even set of numbers across files REM set space 1 PROMPT REPORTING I/O STATISTICS: column pbr format 99999999 heading ‘Physical|Blk Read’ column pbw format 999999 heading ‘Physical|Blks Wrtn’ column pyr format 999999 heading ‘Physical|Reads’ column readtim format 99999999 heading ‘Read|Time’ column name format a55 heading ‘DataFile Name’ column writetim format 99999999 heading ‘Write|Time’ compute sum of f.phyblkrd, f.phyblkwrt on report select fs.name name, f.phyblkrd pbr, f.phyblkwrt pbw, f.readtim, f.writetim from v$filestat f, v$datafile fs where f.file# = fs.file# order by fs.name / REM REM —————————————————————– PROMPT GENERATING WAIT STATISTICS: REM REM This will show wait stats for certain kernel instances. This REM may show the need for additional rbs, wait lists, db_buffers REM column class heading ‘Class Type’ column count heading ‘Times Waited’ format 99,999,999 column time heading ‘Total Times’ format 99,999,999 select class, count, time from v$waitstat where count > 0
order by class
/

REM
REM Look at the wait statistics generated above (if any). They will
REM tell you where there is contention in the system. There will
REM usually be some contention in any system – but if the ratio of
REM waits for a particular operation starts to rise, you may need to
REM add additional resource, such as more database buffers, log buffers,
REM or rollback segments
REM
REM —————————————————————–

PROMPT ROLLBACK EXTENT STATISTICS:
REM

column usn format 999 heading ‘Undo #’
column extents format 999 heading ‘Extents’
column rssize format 999,999,999 heading ‘Size in|Bytes’
column optsize format 999,999,999 heading ‘Optimal|Size’
column hwmsize format 99,999,999 heading ‘High Water|Mark’
column shrinks format 9,999 heading ‘Num of|Shrinks’
column wraps format 9,999 heading ‘Num of|Wraps’
column extends format 999,999 heading ‘Num of|Extends’
column aveactive format 999,999,999 heading ‘Average size|Active Extents’
column rownum noprint

select usn, extents, rssize, optsize, hwmsize,
shrinks, wraps, extends, aveactive
from v$rollstat
order by rownum
/

spool off
set termout on
set trimout off
set trimspool off

 

7.undohistoryinfo.sql

 

set lines 120
set pages 999
clear col

set termout off
set trimout on
set trimspool on

connect / as sysdba
alter session set nls_date_format=’dd-hh24:mi’;

spool undohistoryinfo.out

prompt
prompt ############## RUNTIME ##############
prompt

col rdate head “Run Time”

select sysdate rdate from dual;

prompt
prompt ############## HISTORICAL DATA ##############
prompt

col x format 999,999 head “Max Concurrent|Last 7 Days”
col y format 999,999 head “Max Concurrent|Since Startup”

select max(maxconcurrency) x from v$undostat
/
select max(maxconcurrency) y from sys.wrh$_undostat
/

col i format 999,999 head “1555 Errors”
col j format 999,999 head “Undo Space Errors”

select sum(ssolderrcnt) i from v$undostat
where end_time > sysdate-2
/

select sum(nospaceerrcnt) j from v$undostat
where end_time > sysdate-2
/
clear break
clear compute

prompt
prompt ############## CURRENT STATUS OF SEGMENTS ##############
prompt ############## SNAPSHOT IN TIME INFO ##############
prompt ##############(SHOWS CURRENT UNDO ACTIVITY)##############
prompt

col segment_name format a30 head “Segment Name”
col “ACT BYTES” format 999,999,999,999 head “Active Bytes”
col “UNEXP BYTES” format 999,999,999,999 head “Unexpired Bytes”
col “EXP BYTES” format 999,999,999,999 head “Expired Bytes”

select segment_name, nvl(sum(act),0) “ACT BYTES”,
nvl(sum(unexp),0) “UNEXP BYTES”,
nvl(sum(exp),0) “EXP BYTES”
from (select segment_name, nvl(sum(bytes),0) act,00 unexp, 00 exp
from dba_undo_extents where status=’ACTIVE’ group by segment_name
union
select segment_name, 00 act, nvl(sum(bytes),0) unexp, 00 exp
from dba_undo_extents where status=’UNEXPIRED’ group by segment_name
union
select segment_name, 00 act, 00 unexp, nvl(sum(bytes),0) exp
from dba_undo_extents where status=’EXPIRED’ group by segment_name)
group by segment_name
order by 1
/

prompt
prompt ############## UNDO SPACE USAGE ##############
prompt

col usn format 999,999 head “Segment#”
col shrinks format 999,999,999 head “Shrinks”
col aveshrink format 999,999,999 head “Avg Shrink Size”

select usn, shrinks, aveshrink from v$rollstat
/
spool off
set termout on
set trimout off
set trimspool off
clear col

8.undoparameters.sql

 

set lines 120
set pages 999
clear col

set termout off
set trimout on
set trimspool on

connect / as sysdba
alter session set nls_date_format=’dd-Mon-yyyy hh24:mi’;

spool undoparameters.out

prompt
prompt ############## RUNTIME ##############
prompt

col rdate head “Run Time”

select sysdate rdate from dual;

col inst_id format 999 head “Instance #”
col Parameter format a35 wrap
col “Session Value” format a25 wrapped
col “Instance Value” format a25 wrapped

prompt
prompt ############## PARAMETERS ##############
prompt

select a.inst_id, a.ksppinm “Parameter”,
b.ksppstvl “Session Value”,
c.ksppstvl “Instance Value”
from x$ksppi a, x$ksppcv b, x$ksppsv c
where a.indx = b.indx and a.indx = c.indx
and a.inst_id=b.inst_id and b.inst_id=c.inst_id
and a.ksppinm in (‘_undo_autotune’, ‘_smu_debug_mode’,
‘_highthreshold_undoretention’,
‘undo_tablespace’,’undo_retention’,’undo_management’)
order by 2;

spool off
set termout on
set trimout off
set trimspool off
clear col

 

9. undopressure.sql

 

set lines 120
set pages 999
clear col

set termout off
set trimout on
set trimspool on

connect / as sysdba
alter session set nls_date_format=’dd-hh24:mi’;

spool undopressure.out

prompt
prompt ############## RUNTIME ##############
prompt

col rdate head “Run Time”

select sysdate rdate from dual;

prompt
prompt ############## WAITS FOR UNDO (Since Startup) ##############
prompt

col inst_id head “Instance#”
col eq_type format a3 head “Enq”
col total_req# format 999,999,999,999,999,999 head “Total Requests”
col total_wait# format 999,999 head “Total Waits”
col succ_req# format 999,999,999,999,999,999 head “Successes”
col failed_req# format 999,999,999999 head “Failures”
col cum_wait_time format 999,999,999 head “Cummalitve|Time”

select * from v$enqueue_stat where eq_type=’US’
union
select * from v$enqueue_stat where eq_type=’HW’
/

prompt
prompt ############## LOCKS FOR UNDO ##############
prompt

col addr head “ADDR”
col KADDR head “KADDR”
col sid head “Session”
col osuser format a10 head “OS User”
col machine format a15 head “Machine”
col program format a17 head “Program”
col process format a7 head “Process”
col lmode head “Lmode”
col request head “Request”
col ctime format 9,999 head “Time|(Mins)”
col block head “Blocking?”

select /*+ RULE */ a.SID, b.process,
b.OSUSER, b.MACHINE, b.PROGRAM,
addr, kaddr, lmode, request, round(ctime/60/60,0) ctime, block
from
v$lock a,
v$session b
where
a.sid=b.sid
and a.type=’US’
/

prompt
prompt ############## TUNED RETENTION HISTORY (Last 2 Days) ##############
prompt ############## LOWEST AND HIGHEST DATA ##############
prompt

col low format 999,999,999,999 head “Undo Retention|Lowest Tuned Value”
col high format 999,999,999,999 head “Undo Retention|Highest Tuned Value”

select end_time, tuned_undoretention from v$undostat where tuned_undoretention = (
select min(tuned_undoretention) low
from v$undostat
where end_time > sysdate-2)
/

select end_time, tuned_undoretention from v$undostat where tuned_undoretention = (
select max(tuned_undoretention) high
from v$undostat
where end_time > sysdate-2)
/

prompt
prompt ############## CURRENT TRANSACTIONS ##############
prompt

col sql_text format a40 word_wrapped head “SQL Code”

select a.start_date, a.start_scn, a.status, c.sql_text
from v$transaction a, v$session b, v$sqlarea c
where b.saddr=a.ses_addr and c.address=b.sql_address
and b.sql_hash_value=c.hash_value
/

select current_scn from v$database
/

col a format 999,999 head “UnexStolen”
col b format 999,999 head “ExStolen”
col c format 999,999 head “UnexReuse”
col d format 999,999 head “ExReuse”

prompt
prompt ############## WHO’S STEALING WHAT? (Last 2 Days) ##############
prompt

select unxpstealcnt a, expstealcnt b,
unxpblkreucnt c, expblkreucnt d
from v$undostat
where (unxpstealcnt > 0 or expstealcnt > 0)
and end_time > sysdate-2
/

spool off
set termout on
set trimout off
set trimspool off
clear col

 

10. undostatistics.sql

 

set lines 120
set pages 999
clear col

set termout off
set trimout on
set trimspool on

connect / as sysdba
alter session set nls_date_format=’dd-hh24:mi’;

spool undostatistics.out

prompt
prompt ############## RUNTIME ##############
prompt

col rdate head “Run Time”

select sysdate rdate from dual;

col current_scn head “SCN Now”
col start_date head “Trans Started”
col start_scn head “SCN for Trans”
col ses_addr head “ADDR”

prompt
prompt ############## Historical V$UNDOSTAT (Last 2 Days) ##############
prompt

col end_time format a18 Head “Date/Time”
col maxq format 999,999 head “Query|Maximum|Minutes”
col maxquerysqlid head “SqlID”
col undotsn format 999,999 head “TBS”
col undoblks format 999,999,999 head “Undo|Blocks”
col txncount format 999,999,999 head “# of|Trans”
col unexpiredblks format 999,999,999 head “# of Unexpired”
col expiredblks format 999,999,999 head “# of Expired”
col tuned format 999,999 head “Tuned Retention|(Minutes)”

select end_time, round(maxquerylen/60,0) maxq, maxquerysqlid,
undotsn, undoblks, txncount, unexpiredblks, expiredblks,
round(tuned_undoretention/60,0) Tuned
from dba_hist_undostat
where end_time > sysdate-2
order by 1
/

prompt
prompt ############## RECENT MISSES FOR UNDO (Last 2 Days) ##############
prompt

clear col
set lines 500
select * from v$undostat where maxquerylen > tuned_undoretention
and end_time > sysdate-2
order by 2
/

select * from sys.wrh$_undostat where maxquerylen > tuned_undoretention
and end_time > sysdate-2
order by 2
/

prompt
prompt ############## AUTO-TUNING TUNE-DOWN DATA ##############
prompt ############## ROLLBACK DATA (Since Startup) ##############
prompt

col name format a60 head “Name”
col value format 999,999,999 head “Counters”

select name, value from v$sysstat
where name like ‘%down retention%’ or name like ‘une down%’
or name like ‘%undo segment%’ or name like ‘%rollback%’
or name like ‘%undo record%’
/

prompt
prompt ############## Long Running Query History ##############
prompt

col end_time head “Date”
col maxquerysqlid head “SQL ID”
col runawayquerysqlid format a15 head “Runaway SQL ID”
col results format a35 word_wrapped head “Space Issues”
col status head “Status”
col newret head “Tuned Down|Retention”

select end_time, maxquerysqlid, runawayquerysqlid, status,
decode(status,1,’Slot Active’,4,’Reached Best Retention’,5,’Reached Best Retention’,
8, ‘Runaway Query’,9,’Runaway Query-Active’,10,’Space Pressure’,
11,’Space Pressure Currently’,
16, ‘Tuned Down (to undo_retention) due to Space Pressure’,
17,’Tuned Down (to undo_retention) due to Space Pressure-Active’,
18, ‘Tuning Down due to Runaway’, 19, ‘Tuning Down due to Runaway-Active’,
28, ‘Runaway tuned down to last tune down value’,
29, ‘Runaway tuned down to last tune down value’,
32, ‘Max Tuned Down – Not Auto-Tuning’,
33, ‘Max Tuned Down – Not Auto-Tuning (Active)’,
37, ‘Max Tuned Down – Not Auto-Tuning (Active)’,
38, ‘Max Tuned Down – Not Auto-Tuning’,
39, ‘Max Tuned Down – Not Auto-Tuning (Active)’,
40, ‘Max Tuned Down – Not Auto-Tuning’,
41, ‘Max Tuned Down – Not Auto-Tuning (Active)’,
42, ‘Max Tuned Down – Not Auto-Tuning’,
44, ‘Max Tuned Down – Not Auto-Tuning’,
45, ‘Max Tuned Down – Not Auto-Tuning (Active)’,
‘Other (‘||status||’)’) Results, spcprs_retention NewRet
from sys.wrh$_undostat
where status > 1
/

prompt
prompt ############## Details on Long Run Queries ##############
prompt

col sql_fulltext head “SQL Text”
Col sql_id heading “SQL ID”

select sql_id, sql_fulltext, last_load_time “Last Load”,
round(elapsed_time/1000000/60/60/24,0) “Elapsed Days”
from v$sql where sql_id in
(select maxquerysqlid from sys.wrh$_undostat
where status > 1)
/

spool off
set termout on
set trimout off
set trimspool off
clear col

 

11.

 

set lines 120
set pages 999
clear col

set termout off
set trimout on
set trimspool on

connect / as sysdba
alter session set nls_date_format=’dd-Mon-yyyy hh24:mi’;

spool undousage.out

prompt
prompt ############## RUNTIME ##############
prompt

col rdate head “Run Time”

select sysdate rdate from dual;

prompt
prompt ############## IN USE Undo Data ##############
prompt

select
((select (nvl(sum(bytes),0))
from dba_undo_extents
where tablespace_name in (select tablespace_name from dba_tablespaces
where retention like ‘%GUARANTEE’ )
and status in (‘ACTIVE’,’UNEXPIRED’)) *100) /
(select sum(bytes)
from dba_data_files
where tablespace_name in (select tablespace_name from dba_tablespaces
where retention like ‘%GUARANTEE’ )) “PCT_INUSE”
from dual;

select tablespace_name, extent_management, allocation_type,
segment_space_management, retention
from dba_tablespaces where retention like ‘%GUARANTEE’
/

col c format 999,999,999,999 head “Sum of Free”

select (nvl(sum(bytes),0)) c from dba_free_space
where tablespace_name in
(select tablespace_name from dba_tablespaces where retention like ‘%GUARANTEE’)
/

col d format 999,999,999,999 head “Total Bytes”

select sum(bytes) d from dba_data_files
where tablespace_name in
(select tablespace_name from dba_tablespaces where retention like ‘%GUARANTEE’)
/

PROMPT
PROMPT ############## UNDO SEGMENTS ##############
PROMPT

col status head “Status”
col z format 999,999 head “Total Extents”
break on report
compute sum on report of z

select status, count(*) z from dba_undo_extents
group by status
/

col z format 999,999 head “Undo Segments”

select status, count(*) z from dba_rollback_segs
group by status
/

clear break
clear compute

prompt
prompt ############## CURRENT STATUS OF SEGMENTS ##############
prompt ############## SNAPSHOT IN TIME INFO ##############
prompt ##############(SHOWS CURRENT UNDO ACTIVITY)##############
prompt

col segment_name format a30 head “Segment Name”
col “ACT BYTES” format 999,999,999,999 head “Active Bytes”
col “UNEXP BYTES” format 999,999,999,999 head “Unexpired Bytes”
col “EXP BYTES” format 999,999,999,999 head “Expired Bytes”

select segment_name, nvl(sum(act),0) “ACT BYTES”,
nvl(sum(unexp),0) “UNEXP BYTES”,
nvl(sum(exp),0) “EXP BYTES”
from (select segment_name, nvl(sum(bytes),0) act,00 unexp, 00 exp
from dba_undo_extents where status=’ACTIVE’ group by segment_name
union
select segment_name, 00 act, nvl(sum(bytes),0) unexp, 00 exp
from dba_undo_extents where status=’UNEXPIRED’ group by segment_name
union
select segment_name, 00 act, 00 unexp, nvl(sum(bytes),0) exp
from dba_undo_extents where status=’EXPIRED’ group by segment_name)
group by segment_name
order by 1
/

prompt
prompt ############## UNDO SPACE USAGE ##############
prompt

col usn format 999,999 head “Segment#”
col shrinks format 999,999,999 head “Shrinks”
col aveshrink format 999,999,999 head “Avg Shrink Size”

select usn, shrinks, aveshrink from v$rollstat
/
spool off
set termout on
set trimout off
set trimspool off
clear col

自动undo管理模式下的undo错误分析常用脚本

归档日志间隙(Archive Gap)

 

归档日志间隙是在 Standby 端一系列丢失的重做日志,导致日志应用服务无法继续运行。这通常发生在 Standby 端无法从 Primary Database 接收重做日志或重做日志在 Standby Database 上不可用时。常见原因有:

  • 网络连接断开或者日志传输服务停止
  • Standby Database 不可用
  • 日志传输服务的配置错误
  • Standby 端的 IO 问题
  • 归档日志在应用到 Standby 前被手工删除
  • Primary 和 Standby 之间的网络带宽不足

一旦在 Standby Database 上存在归档间隙,Log Apply Services 就会卡住,直到日志间隙(Gap)被解决,例如。丢失的 Redo 被重新获取并且在 Standby 端可用。然后,日志应用服务可以选中它并继续处理。

 

解决日志间隙的方式

 

有4种方案来解决 Standby Database 上的日志间隙。这些方案在下面讨论。

自动日志间隙解决方案

自动日志间隙解决方案是由日志传输服务自动进行的。它会把当前正在传输的日志和最近收到的日志进行对比,如果有不匹配的情况出现,Standby 端的 RFS 进程就会检测到并自动发送 ARCH-RFS 心跳 Ping 请求来要求发送丢失的日志。这种类型的日志间隙解决方法使用了主数据库上的 log_archive_dest_n 中定义的 Service。另外 ARCH-RFS 心跳 Ping 可以通过对当前的日子序列号进行查询来检测日志间隙。如果存在日志间隙则仍通过ARCH-RFS 心跳 Ping 请求来解决它。在问题得到解决后,会通知日志传输进程(ARCH 或者 LGWR)。对于自动日志间隙解决方案,不需要额外的设置或者监控。

 

FAL (Fetch Archive Log)日志间隙解决方案

当一个归档日志在 Standby 数据库上被收到或者归档,它就会被注册到 Standby 的控制文件中。(您可以在物理 Standby 数据库上查询 v$archived_log 或在逻辑 Standby 数据库上查询 dba_logstdby_log 来获取这些注册信息)。如果这个文件因为某些原因丢失或者损坏(比如,它被意外删除),FAL 就会被调用来解决日志间隙问题。因为这些缺失的日志文件通常由 Standby 数据库上的日志应用服务检测到。它独立于日志传输服务,并且没有和主库之间的直接链接。要使用 FAL,必须在 Standby 数据库设置一个或者两个(11.2.0 之前的版本)初始化参数:

FAL_SERVER:设置 Oracle Net Service Name(TNS-Alias 或者 Connect Descriptor)指向用来获取丢失的归档日志的数据库。它可以是一个 Data Guard 环境的主库,或者是另一个备库,ArchiveLog Repository- 或者 Far Sync Standby (> Oracle 12.1.0) Database。可以指定多个 Service Names(逗号分隔)。FAL 会顺序的尝试这些数据库来解决日志间隙问题。

FAL_CLIENT (< Oracle 11.2.0):在 FAL_SERVER 数据库上设置 Oracle Net Service Name(TNS-Alias 或 Connect Descriptor)指向接收 REDO 的 Standby 数据库(比如,它是 FAL_SERVER 数据库需要发送 REDO 到的目标数据库)。确保这个 TNS-Alias 存在于 FAL_SERVER 数据库的 TNSNAMES.ORA 文件中。从 Oracle 11.2.0 开始,这个参数不再需要。但是需要确保 FAL_SERVER 数据库存在一个log_archive_dest_n 指向要解决日志间隙问题的 Standby 数据库。

当 Log ApplyServices 检测到日志间隙问题,它会发送一个 FAL 请求把 FAL_CLIENT 信息(Version > 11.1.0 则为 db_unique_name)给 FAL_SERVER。一个 FAL_SERVER 数据库上的 ARCH-Process 会尝试获取那个日志并发送回 FAL_CLIENT(或者 db_unique_name 对应的 Destination)。如果第一个 FAL_SERVER 无法解决日志间隙,会尝试列表中的下一个 FAL_SERVER。如果所有的 FAL_SERVERs 都无法解决,那么 FAL 请求会失败,并且一个对应的错误信息会写入对应 Standby 数据库的 ALERT.LOG。

为了成功解决日志间隙问题,需要的归档日志应当存在于 FAL_SERVER 数据库(存在于磁盘并且对应的信息同时存在于控制文件中)。

FAL 从 Oracle 9.2.0 开始可用于物理 Standby 数据库,从 Oracle 10.1.0 开始可用于逻辑 Standby 数据库。

 

手工解决日志间隙

如果日志间隙问题不能被上面提到方式解决,那么可以尝试手工解决。

可以通过查询物理 Standby 数据库的 $archive_gap 或者逻辑 Standby 数据库的 dba_logstdby_log 来确定当前的归档日志间隙,例如:

 

物理 Standby 数据库

SQL> select * from v$archive_gap;

逻辑 Standby 数据库

SQL> select thread#, sequence# from dba_logstdby_log l where next_change# not in

(select first_change# from dba_logstdby_log where l.thread#=thread#)

order by thread#, sequence#;

 

现在复制缺失的特定编号的 redo 日志到 Standby 数据库的对应位置。如缺失的日志尚未注册到 Standby 数据库,需要先注册它们才能让 Log Apply Services 处理这些日志文件。可以使用下面的命令注册:

物理 Standby:

SQL> alter database register logfile ‘<File-Specification>’;

逻辑 Standby:

SQL> alter database register logical logfile ‘<File-Specification>’;

 

在它们被注册后,Log Apply Services 就可以处理了。

 

使用增量备份前滚(仅适用于物理 Standby)

如果日志间隙无法被上面提到的方式解决,间隙太大需要太久时间才能解决或者丢失的日志无法被找到,您仍然可以通过使用 SCN 增量备份来前滚物理 Standby 数据库。这个功能从 Oracle 10.2.0 开始可以使用。这个功能通过记录最后应用到 Standby 数据库的SCN,然后使用 RMAN 以及当前控制文件的备份来对主库创建一个从那个 SCN 开始的增量备份。

之后首先用增量备份中的控制文件替换 Standby 的控制文件,之后应用增量备份到 Standby 数据库。这是一个把 Standby 数据库同步到最新的主库的状态的最快最简单的方式。因为采取的步骤各个版本都不同

Oracle dataguard 日志缺失的解决办法

概述

 

某客户由存储厂商维护人员将LUN存储划分到EMR数据库主机上;在此期间由于不当操作,导致数据库集群状态异常,之后重启了数据库主机节点2,重启主机后集群软件GRID无法正常启动,节点1仍可以正常提供数据库服务。通过远程方式接入数据库环境进行故障排查,通过对相关日志等信息的深入分析,最终找到问题点并进行处理,数据库集群恢复正常。

 

集群及数据库日志分析

1. 15:36分主机重启前的错误信息

从此处日志可以看出,在15:36分时数据库实例报IO读写错误,同时数据库集群日志中也在显示报错,之后发生了主机重启动作。

 

2. 节点2重启主机后相关日志信息

节点2重启主机后,检查集群启动状态可以发现在启动到ASM实例时停止,检查ASM ALERT日志有如下信息:

集群ALERT日志:

2021-12-12 15:48:34.081:

[client(10295)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxemrdb02/client/crsctl_oracle.log.

[client(10405)]CRS-10001:12-Dec-19 15:48 ACFS-9391: Checking for existing ADVM/ACFS installation.

[client(10410)]CRS-10001:12-Dec-19 15:48 ACFS-9392: Validating ADVM/ACFS installation files for operating system.

[client(10412)]CRS-10001:12-Dec-19 15:48 ACFS-9393: Verifying ASM Administrator setup.

[client(10415)]CRS-10001:12-Dec-19 15:48 ACFS-9308: Loading installed ADVM/ACFS drivers.

[client(10418)]CRS-10001:12-Dec-19 15:48 ACFS-9154: Loading ‘oracleoks.ko’ driver.

[client(10446)]CRS-10001:12-Dec-19 15:48 ACFS-9154: Loading ‘oracleadvm.ko’ driver.

[client(10473)]CRS-10001:12-Dec-19 15:48 ACFS-9154: Loading ‘oracleacfs.ko’ driver.

[client(10560)]CRS-10001:12-Dec-19 15:48 ACFS-9327: Verifying ADVM/ACFS devices.

[client(10564)]CRS-10001:12-Dec-19 15:48 ACFS-9156: Detecting control device ‘/dev/asm/.asm_ctl_spec’.

[client(10568)]CRS-10001:12-Dec-19 15:48 ACFS-9156: Detecting control device ‘/dev/ofsctl’.

[client(10573)]CRS-10001:12-Dec-19 15:48 ACFS-9322: completed

2021-12-12 15:48:43.930:

[client(10374)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxemrdb02/client/crsctl_oracle.log.

2021-12-12 15:48:53.698:

[client(10598)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxemrdb02/client/crsctl_oracle.log.

2021-12-12 15:49:03.129:

[client(10905)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxemrdb02/client/crsctl_oracle.log.

2021-12-12 15:50:00.511:

[client(12458)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/cxemrdb02/client/crsctl_oracle.log.

2021-12-12 15:51:07.276:

?

节点2:ASM ALERT日志

Thu Dec 12 16:22:11 2021

MMNL started with pid=21, OS id=26248

lmon registered with NM – instance number 2 (internal mem no 1)

Thu Dec 12 16:24:08 2021

LMON received an instance eviction notification from instance 1

The instance eviction reason is 0x20000000

The instance eviction map is 2

Thu Dec 12 16:24:11 2021

PMON (ospid: 26206): terminating the instance due to error 481

Thu Dec 12 16:24:11 2021

ORA-1092 : opitsk aborting process

Thu Dec 12 16:24:13 2021

System state dump requested by (instance=2, osid=26206 (PMON)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_26216_20211212162413.trc

Dumping diagnostic data in directory=[cdmp_20211212162411], requested by (instance=2, osid=26206 (PMON)), summary=[abnormal instance termination].

Instance terminated by PMON, pid = 26206

检查对应的LMON及DIAG日志:

[grid@cxemrdb02 trace]$ more ./+ASM2_diag_26216.trc

Trace file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_26216.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production

With the Real Application Clusters and Automatic Storage Management options

ORACLE_HOME = /u01/app/11.2.0/grid

System name: ???Linux

Node name: ?????cxemrdb02

Release: ???????2.6.32-431.el6.x86_64

Version: ???????#1 SMP Sun Nov 10 22:19:54 EST 2013

Machine: ???????x86_64

Instance name: +ASM2

Redo thread mounted by this instance: 0 <none>

Oracle process number: 6

Unix process pid: 26216, image: oracle@cxemrdb02 (DIAG)

*** 2021-12-12 16:22:10.788

*** SESSION ID:(373.1) 2021-12-12 16:22:10.788

*** CLIENT ID:() 2021-12-12 16:22:10.788

*** SERVICE NAME:() 2021-12-12 16:22:10.788

*** MODULE NAME:() 2021-12-12 16:22:10.788

*** ACTION NAME:() 2021-12-12 16:22:10.788

Node id: 1

List of nodes: 0, 1,

*** 2021-12-12 16:22:10.790

Reconfiguration starts [incarn=0]

*** 2021-12-12 16:22:10.790

I’m the voting node

Group reconfiguration cleanup

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

 

3. 问题分析描述

 

通过对数据库集群相关日志的分析,可以发现在15:36分时的非正常操作,导致了数据库及集群报错并状态异常,此时进行了数据库主机的重启动作;重启后集群日志中显示ASM无法启动,根据ASM日志及TRACE文件信息,结合ORACLE官方文档中ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481 (Doc ID 1383737.1)文档中描述的信息,可以判断是集群私网通信出现了问题,通过对集群私网相关配置进行分析、排查,最终确认是集群通信的socket文件(/var/tmp/.oracle)内容由于集群异常关闭导致不一致,从而引起集群节点2重启后不能正常启动。通过清理socket文件后集群可以正常启动 恢复正常。

 

 

 

 

 

三、总结与后续处理建议

1. 问题总结

 

通过对数据库集群相关日志的分析,可以发现在15:36分时的非正常操作,导致了数据库及集群报错并状态异常,此时进行了数据库主机的重启动作;重启后集群日志中显示ASM无法启动,根据ASM日志及TRACE文件信息,结合ORACLE官方文档中ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481 (Doc ID 1383737.1)文档中描述的信息,可以判断是集群私网通信出现了问题;通过对集群私网相关配置进行分析、排查,最终确认是集群通信的socket文件(/var/tmp/.oracle)内容由于集群异常关闭导致不一致,从而引起集群节点2重启后与节点1通信异常,不能正常启动。通过清理socket文件后集群可以正常启动,数据库集群恢复正常。

 

 

 

4. 后续对于集群添加磁盘的建议

对于生产环境数据库集群,向ASM环境中添加磁盘是一个重要且有一点风险的动作,由于数据库所有数据都存放在ASM磁盘组中,因此如果向已有ASM磁盘组添加磁盘不当,可能导致ASM磁盘组读写异常或不能正常MOUNT,从而影响正常的数据库业务运行。因此结合ORACLE官方的标准步骤”如何在 RAC 集群或单机 ASM 环境中对已经存在的 Diskgroup 添加新磁盘(最佳实践) (文档 ID 1946668.1)”以及我们多年的数据库维护经验,

建议后续添加ASM磁盘时遵从如下步骤:

步骤 操作明细
1 对现有集群状态、ASM磁盘状态、使用率、磁盘大小检查
2 存储层面进行相应的LUN划分,建议划分的LUN大小与原ASM磁盘组中的磁盘大小一致
3 存储层新划分LUN到主机操作系统层面的映射,确认集群中每个主机可以正常识别读写LUN
4 参照已有的ASM磁盘配置相应的别名或多路径映射,集群kfed命令可以读取
5 使用新划分LUN新建临时测试磁盘组确认新划分LUN可以被ASM使用
6 删除新加的测试磁盘组
7 将新划分LUN加入所需的ASM磁盘组中
8 检查ASM磁盘组rebalance完成状态,检查集群及数据库的状态
Oracle Asm 增加磁盘 操作不当引起ASM无法启动故障处理记录