Skip to content

Index Maintenance

SPM固定执行计划以及踩bug事一件

原有2个sql语句有多个表连接,执行计划一直在走错误的执行计划.表级统计信息以及索引规划都已经是最新(这里统计信息有狗血不做描述),只是SQL里还有六个绑定变量以及字段的柱状图影响了执行计划,在这个优化里没有删除柱状图和对绑定变量的影响进行处理(星形连接不建议使用绑定变量),现场环境微妙最终选择通过sql profile以及spm对这2个sql的执行计划进行固定处理.先用sqlprofile固定后让sql重新解析后发现未能生效,逐用spm的方式固定.

这里以其中一个sql_id为bwwnw7r1gzhdf的语句为例,这是收集到对应1个小时内的sqlrpt,其中plan_hash_value为711942702执行计划为正确的执行计划,从报告中可以看到这个sql选择了错误的执行计划,并且从中也可以看到sql有多个执行计划.当中执行计划正确与否的判断方式就不做描述.
 

SQL ID: bwwnw7r1gzhdf

# Plan Hash Value Total Elapsed Time(ms) Executions 1st Capture Snap ID Last Capture Snap ID
1 3052678239 13,512,877 10 25060 25060
2 3392573872 0 0 25060 25060
3 4134955434 0 0 25060 25060
4 1564064893 0 0 25060 25060
5 2504448979 0 0 25060 25060
6 147966509 0 0 25060 25060
7 711942702 0 0 25060 25060

 
通过coe_xfr_sql_profile.sql脚本对bwwnw7r1gzhdf的sql进行固定711942702,生成sql profile的名字为coe_bwwnw7r1gzhdf_711942702.
(该部分可以参考
1.Using Sqltxplain to create a ‘SQL Profile’ to consistently reproduce a good plan (文档 ID 1487302.1)
2.Automatic SQL Tuning and SQL Profiles (文档 ID 271196.1)
3.Correcting Optimizer Cost Estimates to Encourage Good Execution Plans Using the COE XFR SQL Profile Script (文档 ID 1955195.1))

让sql从新解析后从v$sql视图中的sql profile字段没有看到生效的迹象,原因是在脚本coe_xfr_sql_profile.sql中对创建的sqlprofile默认的生效是false的,所以创建出来的profile不会失效,监控中的执行计划未变(现场我对此处的profile drop).

 

SQL>  select name,created,status from dba_sql_profiles;

NAME                           CREATED                        STATUS
------------------------------ ------------------------------ --------
coe_bwwnw7r1gzhdf_711942702    26-JUN-15 02.09.30.000000 PM   ENABLED
coe_g87an0j5djjpm_334801256    26-JUN-15 11.30.25.000000 AM   ENABLED

SQL>  select SQL_ID, SQL_PROFILE,PLAN_HASH_VALUE from V$SQL where SQL_ID='bwwnw7r1gzhdf' and sql_profile is not null;

no rows

SQL>  select sql_profile,EXECUTIONS,PLAN_HASH_VALUE,parse_calls,ELAPSED_TIME/1000000,
ELAPSED_TIME/1000000/EXECUTIONS,LAST_LOAD_TIME,ROWS_PROCESSED
from v$sql where EXECUTIONS>0 and sql_id='bwwnw7r1gzhdf' order by LAST_LOAD_TIME desc;
...

逐对profile进行disable并drop

=====disable profile==============
BEGIN
DBMS_SQLTUNE.ALTER_SQL_PROFILE(
name => 'coe_bwwnw7r1gzhdf_711942702',
attribute_name => 'STATUS',
value => 'DISABLED');
END;
/

BEGIN
DBMS_SQLTUNE.ALTER_SQL_PROFILE(
name => 'coe_g87an0j5djjpm_334801256',
attribute_name => 'STATUS',
value => 'ENABLED');
END;
/

=====drop profile=================
begin
DBMS_SQLTUNE.DROP_SQL_PROFILE(name => 'coe_bwwnw7r1gzhdf_711942702');
end;
/

begin
DBMS_SQLTUNE.DROP_SQL_PROFILE(name => 'coe_g87an0j5djjpm_334801256');
end;
/

由于已经存在了正确的执行计划,所以通过DBMS_SPM直接创建baseline,并通过DBMS_SPM包对该sql的baseline的enable,accept,fixed三个属性指定为yes.

该部分可以参考:
Plan Stability Features (Including SQL Plan Management (SPM)) (文档 ID 1359841.1)

为sql创建baseline

variable cnt number;
execute :cnt :=DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE(SQL_ID => 'bwwnw7r1gzhdf', PLAN_HASH_VALUE => 711942702) ;

验证该baseline已经生成

SQL> set linesize 200
SQL> Select Sql_Handle, Plan_Name, Origin, Enabled, Accepted,Fixed,Optimizer_Cost,Sql_Text
From Dba_Sql_Plan_Baselines
Where Sql_Text Like '%FROM P1EDBADM.MES_PROCESSOPERATIONSPEC%' Order By Last_Modified;


SQL_HANDLE                     PLAN_NAME                      ORIGIN         ENA ACC FIX OPTIMIZER_COST SQL_TEXT
------------------------------ ------------------------------ -------------- --- --- --- -------------- --------------------------------------------------------------------------------
SQL_995463d3d1edd710           SQL_PLAN_9kp33ug8yvpsh4af503b5 MANUAL-LOAD    YES YES NO              69 SELECT D.LOTNAME LOT, D.PRODUCTNAME GLASS, TO_CHAR(D.CREATETIME, 'YYYY-MM-DD HH2

为sqlbaseline的fixed属性改为yes

variable cnt number;
execute :cnt :=DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE(SQL_ID => 'bwwnw7r1gzhdf', PLAN_HASH_VALUE => 711942702,fixed => 'yes') ;
验证修改完成
SQL> set linesize 200
SQL> Select Sql_Handle, Plan_Name, Origin, Enabled, Accepted,Fixed,Optimizer_Cost,Sql_Text
  2  From Dba_Sql_Plan_Baselines
  3  Where Sql_Text Like '%FROM P1EDBADM.MES_PROCESSOPERATIONSPEC%' Order By Last_Modified;

SQL_HANDLE                     PLAN_NAME                      ORIGIN         ENA ACC FIX OPTIMIZER_COST SQL_TEXT
------------------------------ ------------------------------ -------------- --- --- --- -------------- --------------------------------------------------------------------------------
SQL_995463d3d1edd710           SQL_PLAN_9kp33ug8yvpsh4af503b5 MANUAL-LOAD    YES YES YES            574 SELECT D.LOTNAME LOT, D.PRODUCTNAME GLASS, TO_CHAR(D.CREATETIME, 'YYYY-MM-DD HH2

最终验证生效

SQL> Select Sql_Handle, Plan_Name, Origin, Enabled, Accepted,Fixed,Optimizer_Cost,Sql_Text
  2  From Dba_Sql_Plan_Baselines
  3  Where Sql_Text Like '%FROM P1EDBADM.MES_PROCESSOPERATIONSPEC%' Order By Last_Modified;

SQL_HANDLE                     PLAN_NAME                      ORIGIN         ENA ACC FIX OPTIMIZER_COST SQL_TEXT
------------------------------ ------------------------------ -------------- --- --- --- -------------- --------------------------------------------------------------------------------
SQL_995463d3d1edd710           SQL_PLAN_9kp33ug8yvpsh4af503b5 MANUAL-LOAD    YES YES YES            574 SELECT D.LOTNAME LOT, D.PRODUCTNAME GLASS, TO_CHAR(D.CREATETIME, 'YYYY-MM-DD HH2
SQL_2e1c8025edb165b3           SQL_PLAN_2w7404rqv2tdm56eb6fa8 MANUAL-LOAD    YES YES YES            311 SELECT 1 " ", D.LOTNAME LOT, D.PRODUCTNAME GLASS, TO_CHAR(MAX(H.EVENTTIME), 'YYY

SPM主要和2个参数有关,一个是baseline生效(optimizer_user_sql_plan_baselines,前提是accept属性要为yes,否则会产生干扰),一个是捕获sql语句生成baseline(optimizer_cature_sql_plan_baselines).在数据库中我一般不开启捕获,但是开启baseline生效.
开启的语法:

alter system set optimizer_user_sql_plan_baselines=true scope=both;
alter system set optimizer_cature_sql_plan_baselines=true scope=both;

关闭的语法:

alter system set optimizer_user_sql_plan_baselines=false scope=both;
alter system set optimizer_cature_sql_plan_baselines=false scope=both;

开启捕获的情况在一些11g版本中会触发该bug
Bug 9910484 – SQL Plan Management Capture uses excessive space in SYSAUX (文档 ID 9910484.8)
此bug会造成sysaux的表空间暴增,主要为sqllob$data,我遇见的是在一天内从2g增长到4g.关闭了捕获后,该现象消失.
删除掉不必要的baseline后可以通过shrink的方式回收sysaux的空间,具体可以参考
Reducing the Space Usage of the SQL Management Base in the SYSAUX Tablespace (文档 ID 1499542.1)

skip_unusable_indexes参数使用建议

​SKIP_UNUSABLE_INDEXES的使用与索引失效是相关的,该参数10g开始引入,11g默认为TRUE.
当为TRUE时候,如果数据库中存在usable状态的索引,则会自动忽略该索引生成新的执行计划(不走该索引,也不提示该索引的异常);当为False时候,则会报错.我所运维的数据库在一些关键系统中,会将此参数设成False,让系统及时发现索引的异常以便及时去介入修复.
环境各有所异,设置值也可依据实际情况设置.如果sql使用了hint或者涉及到唯一索引的对应DML,该参数会失效.

该参数的一些使用场景可以参考如下的测试:

创建测试表和索引

SQL> conn test/test
已连接。
SQL> drop table a;
表已删除。
SQL> create table a(id number);
表已创建。
SQL> create unique index idx_a_id on a(id);
索引已创建。
SQL> declare
  2  begin
  3  for a in 1..1000 loop
  4  insert into a(id) values(a);
  5  end loop;
  6  end;
  7  /
PL/SQL 过程已成功完成。
SQL> commit;
提交完成。
SQL> show parameter SKIP_UNUSABLE_INDEXES;
NAME                                 TYPE        VALUE
------------------------------------ ----------- -------------------
skip_unusable_indexes                boolean     TRUE
SQL> select * from a where id=1;

执行计划
----------------------------------------------------------
Plan hash value: 277080427
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |     1 |    13 |     1   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| IDX_A_ID |     1 |    13 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("ID"=1)

统计信息
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
        124  redo size
        402  bytes sent via SQL*Net to client
        385  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

修改skip_unusable_indexes为false

SQL> alter system set skip_unusable_indexes=false scope=memory;
系统已更改。
将索引修改为不可用
SQL> alter index idx_a_id unusable;
索引已更改。
出现错误提示索引不可用
SQL> select * from a where id=1;
select * from a where id=1
*
第 1 行出现错误:
ORA-01502: 索引 'TEST.IDX_A_ID' 或这类索引的分区处于不可用状态

将skip_unusable_indexes修改为true

SQL> alter system set skip_unusable_indexes=true scope=memory;
系统已更改。

对于查询操作此时该sql能够正常运行,但是此时进行的是全表扫描

SQL> select * from a where id=1;

执行计划
----------------------------------------------------------
Plan hash value: 2248738933
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     4 |    52 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| A    |     4 |    52 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"=1)

统计信息
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
        402  bytes sent via SQL*Net to client
        385  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

使用hint强制使用索引,此时会提示索引无效

SQL> select /*+index(a)*/ * from a where id=1;
select /*+index(a)*/ * from a where id=1
*
第 1 行出现错误:
ORA-01502: 索引 'TEST.IDX_A_ID' 或这类索引的分区处于不可用状态
--插入操作会出错
SQL> insert into a values(1002);
insert into a values(1002)
*
第 1 行出现错误:
ORA-01502: 索引 'TEST.IDX_A_ID' 或这类索引的分区处于不可用状态
SQL> delete from a where id=1;
delete from a where id=1
*
第 1 行出现错误:
ORA-01502: 索引 'TEST.IDX_A_ID' 或这类索引的分区处于不可用状态

SQL>

解决方法,重建索引

SQL> alter index test.idx_a_id rebuild;
索引已更改。
SQL> select /*+index(a)*/ * from a where id=1;

执行计划
----------------------------------------------------------
Plan hash value: 277080427
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |     1 |    13 |     1   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| IDX_A_ID |     1 |    13 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("ID"=1)

统计信息
----------------------------------------------------------
         15  recursive calls
          0  db block gets
          5  consistent gets
          1  physical reads
          0  redo size
        402  bytes sent via SQL*Net to client
        385  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> drop index test.idx_a_id;
索引已删除。
SQL> create index test.idx_a_id on a(id);
索引已创建。
SQL> alter index test.idx_a_id unusable;
索引已更改。
SQL> insert into a values(1002);
已创建 1 行。
SQL> commit;

测试证明SKIP_UNUSABLE_INDEXES对于使用hint强制使用索引的语句和唯一索引的插入、删除语句却不能生效。

该测试摘自互联网,同时做了一些修改.

索引块上递归事务专用的itl slot争用的识别判断

承接《Oracle10g版本后enq: TX – allocate ITL entry等待事件的根源以及解决思路》

由于itl争用主要是initrans不足(10g后今本不受此影响maxtrans失效),块空间不足原因引起,但是特殊的情况下的索引递归事务引起的itl争用的识别也是需要掌握的技术,虽然大量的递归情况较少见,但如何区分相比前面的2种情况就相对复杂点。主要的思路是根据索引块的itl信息来识别,因为上一篇中讲到在索引的枝节点上,有且只有一个ITL slot,它是用于当发生节点分裂的递归事务(Recursive Transaction)。在叶子节点上,第一条ITL Slot也是用于分裂的递归事务的。只要根据相关索引块的负责递归事务的itl事务槽的使用情况就可以判断争用的情况。

思路如下:
1.找出被阻塞的事务
2.根据阻塞的事务找到相关的回滚块以及相关事务起始回滚编号
3.根据回滚块的内容识别相关的索引块
4.dump出相关的索引块识别对应的itl争用情况

例子:

1.测试阻塞

session 1 更新61080块的100行

SQL> update luda set a=a
  2  where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61080
  3  and dbms_rowid.ROWID_ROW_NUMBER(rowid)=100;

1 row updated.

commit;

session 2 更新61080块的200行

SQL> update luda set a=a
  2  where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61080
  3  and dbms_rowid.ROWID_ROW_NUMBER(rowid)=200;

1 row updated.

commit;

session 3 更新61080块的300行

SQL> update luda set a=a
  2  where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61080
  3  and dbms_rowid.ROWID_ROW_NUMBER(rowid)=300;

1 row updated.

session 4 更新61080块的400行

SQL> update luda set a=a
  2  where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61080
  3  and dbms_rowid.ROWID_ROW_NUMBER(rowid)=400;

1 row updated.

session 5 更新61080块的500行hang住

2.找出被阻塞的事务以及相关回滚段信息

确认itl阻塞:

SQL> select s.sid, s.event, s.row_wait_obj#
  2  from v$session s where s.sid=151;

       SID EVENT                          ROW_WAIT_OBJ#
---------- ------------------------------ -------------
       148 enq: TX - allocate ITL entry              -1

确认对应的阻塞事务的回滚相关信息,可以发现阻塞事务对应的回滚在2号数据文件的2109数据块中,起始的回滚记录是0x2e(46的16进制)

SQL> select l.sid req_session, s.sid lock_session, l.lmode, l.request, t.xidusn, t.xidslot, t.start_ubafil, t.start_ubablk, t.start_ubarec
  2  from v$lock l, v$transaction t, v$session s
  3  where l.type = 'TX'
  4  and trunc(id1/power(2,16)) = t.xidusn
  5  and l.id2 = t.xidsqn
  6  and id1 - power(2,16)*trunc(id1/power(2,16)) = t.xidslot
  7  and t.addr = s.taddr
  8  and l.request = 4;

REQ_SESSION LOCK_SESSION      LMODE    REQUEST     XIDUSN    XIDSLOT START_UBAFIL START_UBABLK START_UBAREC
----------- ------------ ---------- ---------- ---------- ---------- ------------ ------------ ------------
        148          159          0          4          5         26            2         2109           46

dump出2号数据文件的2109号数据块

SQL> alter system dump datafile 2 block 2109;

System altered.

确认相关对象的object_id

SQL> select object_name,object_id,data_object_id from dba_objects where object_name in ('LUDA','IDX_TEST') and owner='SYS';

OBJECT_NAME                               OBJECT_ID DATA_OBJECT_ID
---------------------------------------- ---------- --------------
IDX_TEST                                      51980          51980
LUDA                                          51978          51978

分析2号数据文件的2109号数据块dump文件,从Rec #0x2e部分开始到结束只有0x2e此条与对象号51980,51978相关回滚记录

*-----------------------------
* Rec #0x2e  slt: 0x1a  objn: 51978(0x0000cb0a)  objd: 51978  tblspc: 0(0x00000000)
*       Layer:  11 (Row)   opc: 1   rci 0x00
Undo type:  Regular undo    Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
rdba: 0x00000000
*-----------------------------
uba: 0x0080083d.014b.2d ctl max scn: 0x0000.000bb362 prv tx scn: 0x0000.000bb37b
txn start scn: scn: 0x0000.000bcdc1 logon user: 0
 prev brb: 8388917 prev bcl: 0
KDO undo record:
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x0002.010.0000013b uba: 0x00800051.0100.3c
                      flg: C---    lkc:  0     scn: 0x0000.000bcd90
KDO Op code: URP row dependencies Disabled
  xtype: XAxtype KDO_KDOM2 flags: 0x00000080  bdba: 0x0040ee98  hdba: 0x0040ee91
itli: 1  ispac: 0  maxfr: 4863
tabn: 0 slot: 400(0x190) flag: 0x2c lock: 0 ckix: 71
ncol: 1 nnew: 1 size: 0
Vector content:
col  0: [ 3]  c2 31 06

End dump data blocks tsn: 1 file#: 2 minblk 2109 maxblk 2109

从bdba地址分析可以获得当前更新对象对luda,前面更新提交的1条语句的部分就是luda的61080号块。

SQL> select dbms_utility.data_block_address_file(TO_NUMBER('0040ee98', 'XXXXXXXX')) file_id,
  2  dbms_utility.data_block_address_block(TO_NUMBER('0040ee98', 'XXXXXXXX')) block_id from dual;

   FILE_ID   BLOCK_ID
---------- ----------
         1      61080

在回滚信息中没有发现索引块类型,事务对象只有objno为51978的luda表,类似的索引对象也是用此方法分析。

关于索引重建是否需要更新统计信息

今天有一个dba问我关于索引重建是否需要更新统计信息的问题,其实这个问题有点儿经验的dba都会冒出,真正的答案的是不需要更新,包括创建索引.

具体见以下sql reference的说明:

COMPUTE STATISTICS Clause

This clause has been deprecated. Oracle Database now automatically collects statistics during index creation and rebuild. This clause is supported for backward compatibility and will not cause errors.

浅谈如何了解一个B-tree索引的变化过程

索引在我们日常维护数据库的性能中占用非常大的比例,如何维护好你的索引很大程度上关系着数据库的性能,当然了索引只是性能调整中数据model的一部分,本文只是大头根据对索引的理解结合脚本来浅浅的对b-tree索引的变化做一个描述,权当饭后谈资,有错误和需要弥补的地方还要靠大家指正.

b-tree索引是啥我就不解释了,不懂的朋友参考oracle concept或者摸下度娘.很多dba都知道,维护索引无外乎如下几种情况:

1.索引有没有
2.索引设计合不合理
3.索引碎片太多了
4.索引过大了,虚胖
5.索引聚簇因不同步

郁闷碰到故障了··rac disgroup 起不来~先去trouble shooting了.

后续补上.