1.死事务概念
死事务出现在异常关闭数据库或者事务进程不正常结束,比如KILL -9,shutdown abort的情况下。
由于死事务已经无法通过v$transaction来观察,所以必须通过内部表来进行判断。
这个内部表是x$ktuxe,该表会记录Dead事务的恢复进度:
死事务的回滚进程数可以通过参数fast_start_parallel_rollback来设置是否启用并行恢复。
此参数有3个值:
FALSE –>Parallel rollback is disabled 即FALSE时关闭并行回滚,使用SMON的串行恢复;
LOW –>Limits the maximum degree of parallelism to 2 * CPU_COUNT LOW是11GR2默认值,
HIGH –>Limits the maximum degree of parallelism to 4 * CPU_COUNT
对应的回滚进程是:ora_p025_orcl ora_p022_orcl 这种都是后台启动的并行回滚的进程。
修改此参数值,
2.关于并行回滚:
并行回滚是oracle的一个特性,初衷是为了加速失败事务的回滚,缩小回滚时间,Oracle设计这个特性的初衷大多时候都是实现了。
但是在一些情况下,并行回滚的进程间会产生资源的争用,
如buffer busy waits/wait for a undo record等,严重时会导致并行回滚互相阻塞,导致回滚恢复非常非常慢。
因为这个特性可能导致的数据库hang或者部分业务hang的情况。
根据经验总结一翻并行回滚的情况怕遇见2种情况:
1.并行回滚相关的bug,此类bug有不少
2.大事务的回滚
事务恢复相关的MOS文档:Master Note: Troubleshooting Database Transaction Recovery (文档 ID 1494886.1)。
对此并行回滚类问题的分析步骤:
1.分析alert.log
2.分析block=1的进程是否存在,是否存在死事务
3.分析发生情况时的awr以及ash
4.分析发生情况时的io/cpu情况
5.分析相关操作时的等待
而并发度如果是high或low,4倍或2倍的cpu数,也会受到另外一些参数的影响,如PARALLEL_MAX_SERVERS,这个才是真正最大的并发度设置。
PARALLEL_MAX_SERVERS这个参数的默认值为
PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 5。
PARALLEL_THREADS_PER_CPU 和CPU_COUNT都是初始化参数。
concurrent_parallel_users 根据是否启用automatic memory management ,如果禁用automatic memory management 则这个值为1,如果pga自动管理则这个值是2,如果是automatic memory management 启用,则这个值是4。
_cleanup_rollback_entries –>number of undo entries to apply per transaction clean,调大会增快回滚速度,但是是静态参数,需要重启设置。
当并行回滚HANG住时,可以关闭并行回滚:
Master Note: Troubleshooting Database Transaction Recovery (文档 ID 1494886.1)中说法:
If you change the value of this parameter, then transaction recovery will be stopped and restarted with the new implied degree of parallelism.
alter system set fast_start_parallel_rollback=false;
How to Disable Parallel Transaction Recovery When Parallel Txn Recovery is Active (文档 ID 238507.1)中做法是先禁用SMON恢复(可不进行而直接KILL并行恢复的进程),OS上KILL 并行恢复的slave进程,再修改fast_start_parallel_rollback=false,再开启SMON恢复。
过程如下:
SQL> select pid, program from v$process where program like ‘%SMON%’;
PID PROGRAM
———- ————————————————
6 oracle@stsun7 (SMON)
SQL> oradebug setorapid ‘SMON’s Oracle PID’;
SQL> oradebug event 10513 trace name context forever, level 2
select SPID from V$PROCESS where PID in (select PID from V$FAST_START_SERVERS);
then kill these processes from the OS level with : kill -9 n
alter system set fast_start_parallel_roll.lback=false;
SQL> oradebug setorapid ‘SMON’s Oracle PID’;
SQL> oradebug event 10513 trace name context off
跟踪smon进程
event = ‘10500 trace name context forever, level 1’
3.关于关闭事务回滚:
—是不建议的,一般用在UNDO段损坏数据库无法打开时。
Disable Transaction Recovery
Disabling Transaction recovery could temporarily open up the database.
SMON will stop performing all the transaction recovery.
There is no way to stop only one Transaction recovery / few transaction recoveries.
When the transaction recovery is disabled, any data modified by this dead transaction but required by other transactions will be recovered on “as needed” basis.
We always suggest to disable the transaction recovery only with the help of Oracle Support.
NOTE: Oracle does not recommend to Disable the Transaction recovery and open the Database for Business Use.
4.大事务回滚的常用脚本
1.查询死事务及其大小:
select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State, ktuxesiz Undo,to_char(sysdate,’DD-MON-YYYY HH24:MI:SS’) “Current_Time”
from x$ktuxe
where ktuxesta <> ‘INACTIVE’
and ktuxecfl like ‘%DEAD%’
order by ktuxesiz asc;
select distinct KTUXECFL,count(*) from x$ktuxe group by KTUXECFL;
KTUXECFL COUNT(*)
———————— ———-
DEAD 1
NONE 2393
SCO|COL 8
===========
ktuxeusn – Undo Segment Number
ktuxeslt – Slot number
ktuxesqn – Sequence
ktuxesta – State
ktuxesiz – Undo Blocks Remaining
ktuxecfl – Flag
===========
USN SLOT SEQ STATE UNDO
——- ———- ——- ———- ———-
9 9 335 ACTIVE 10337
2.查询包含死事务的UNDO段的信息:
select useg.segment_name, useg.segment_id, useg.tablespace_name, useg.status
from dba_rollback_segs useg
where useg.segment_id in (select unique ktuxeusn
from x$ktuxe
where ktuxesta <> ‘INACTIVE’
and ktuxecfl like ‘%DEAD%’);
SEGMENT_NAME SEGMENT_ID TABLESPACE_NAME STATUS
——————– —————– ————————— ————
_SYSSMU9$ 9 UNDOTBS1 ONLINE
select usn,name
from v$rollname
where usn in (select unique ktuxeusn
from x$ktuxe
where ktuxesta <> ‘INACTIVE’
and ktuxecfl like ‘%DEAD%’);
3.大事务恢复所需的时间
alter session set nls_date_format=’yyyymmdd hh24:mi:ss’;
select usn, state, undoblockstotal “Total”, undoblocksdone “Done”, undoblockstotal-undoblocksdone “ToDo”,
decode(cputime,0,’unknown’,sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) “Estimated time to complete”
from v$fast_start_transactions;
USN STATE Total Done ToDo Estimated time to Complete
——– —————- ——– ——– ——– ———————————
5 RECOVERING 16207 14693 1514 11-may-2012 08:05:40
4.对于死事务回滚所需时间的评估:
select distinct KTUXECFL,count(*) from x$ktuxe group by KTUXECFL;
KTUXECFL COUNT(*)
———————— ———-
DEAD 1
NONE 2393
SCO|COL 8
select ADDR,KTUXEUSN,KTUXESLT,KTUXESQN,KTUXESIZ from x$ktuxe where KTUXECFL=’DEAD’;
ADDR KTUXEUSN KTUXESLT KTUXESQN KTUXESIZ
—————- ———- ———- ———- ———-
00002B92FF5D5F68 15 12 314961 43611
–KTUXESIZ代表需要回滚的回滚块数。
declare
l_start number;
l_end number;
begin
select ktuxesiz
into l_start
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39; ———这里根据实际数字来填写
dbms_lock.sleep(60); ———单位是秒,建议适当增大一些,评估出的时间更准确。
select ktuxesiz
into l_end
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39; ———这里根据实际数字来填写
dbms_output.put_line(‘time cost Day:’ ||
round(l_end / (l_start – l_end) / 60, 2));
end;
/
(l_start – l_end) –>dbms_lock.sleep(60);指定的60秒间隔恢复的块数
l_end –>还需要恢复的块数
l_end / (l_start – l_end) /得出所需恢复的分钟数;
——————————————得出所需小时
declare
l_start number;
l_end number;
begin
select ktuxesiz
into l_start
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39;
dbms_lock.sleep(60);
select ktuxesiz
into l_end
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39;
dbms_output.put_line(‘time cost Day:’ ||
round(l_end / (l_start – l_end)/60, 2));
end;
/
5.查询并行恢复时Slaves进程数量
select * from v$fast_start_servers;
STATE UNDOBLOCKSDONE PID XID
———– ————————– ———- —————-
RECOVERING 54 18 001F005C00001BD6
RECOVERING 0 20 001F005C00001BD6
如果只有1个进程STATE 是RECOVERING ,其它都是IDLE; 建议关闭并行回滚–>FAST_START_PARALLEL_ROLLBACK=FALSE。