Skip to content

Redo and Undo

通过undo record找到对应回滚对象信息

通过下面的语句查到回滚的事务:

select * from v$fast_start_transactions;

或者

select * from x$ktuxe where KTUXECFL='DEAD' AND KTUXESTA!='INACTIVE'

根据上面的语句,我们可以查到事务的undo的segment id(USN或者KTUXEUSN),undo的slot(SLT或者KTUXESLT),和undo的sequence(SEQ或者KTUXESQN)。
根据USN,我们可以查到undo segment:

select * from v$rollname where usn=xxx

根据上面的语句,我们可以dump出undo block:

alter system dump undo block "<undo segment name>" XID <USN> <SLT> <SEQ>;

alter system dump undo block "_SYSSMU33$" XID 33 56 7463;

然后根据dump出来的trace file,可以判断object number(objn)或者object id(objd)


cat xxx.trc |grep objn

* Rec #0x45 slt:0x1f objn:223312(0x00092c2a) objd:223312 tblspc:20(0x00000014)

* Rec #0x44 slt:0x1f objn:223312(0x00092c2a) objd:223312 tblspc:20(0x00000014)

* Rec #0x43 slt:0x1f objn:223312(0x00092c2a) objd:223312 tblspc:20(0x00000014)

* Rec #0x42 slt:0x1f objn:223312(0x00092c2a) objd:223312 tblspc:20(0x00000014)

* Rec #0x41 slt:0x1f objn:223312(0x00092c2a) objd:223312 tblspc:20(0x00000014)

* Rec #0x40 slt:0x1f objn:223312(0x00092c2a) objd:223312 tblspc:20(0x00000014)

* Rec #0x3f slt:0x1f objn:223312(0x00092c2a) objd:223312 tblspc:20(0x00000014)

...

此时,我们根据object id,可以在数据库里面找到回滚的object的对象:

select * from dba_objects where object_id='223312';

介绍一个快速检查近日redo产生量以及等待判断是否需要增加redo size的脚本

通常我们定位一个数据库的redolog是否需要增加,都是通过观察online redolog的切换频率是否频繁,观察告警是否存在checkpoint incompleted,观察awr中的log file等待时间是否正常来判断是否需要增加日志大小或者日志组数,今天找到一个非常不错的脚本,分析过去几天内平均1小时产生的redo量,lgwr的等待方面的统计,也可以作为评估redo产生量的一个统计。

执行结果类似如下图:
redo_adjust

最后列为建议增加的值,还有ckpt的等待,总体的cpu开销,物理读和写消耗,日志等待事件,进程数等统计。

具体脚本内容如下:

 

set pagesize 50000 linesize 300

col instance_number             format 99                   head 'In|st'
col tim                                                     head 'Period end'
col cpu_sec                     format 999,999,999.9        head 'CPU used|sec'
col phy_reads                   format 999,999,999          head 'Physical|reads'
col phy_writes                  format 999,999,999          head 'Physical|writes'
col cr_served                   format 999,999,999          head 'CR blocks|served'
col current_served              format 999,999,999          head 'CUR blocks|served'
col redo_mb                     format 999,999,999.9        head 'Redo, MB'
col processes                   format 999,999              head 'Proc|esses'
col avg_df_seq                  format 9,999.9              head 'Avg 1|read'
col avg_df_scat                 format 9,999.9              head 'Avg N|read'
col redo_diff_to_md_pct         format 999,999              head 'Redo Diff|to median, %'
col avg_lfpw                    format 999.99               head 'Avg|LFPW'
col avg_log_sync                format 9,999.99             head 'Avg Log|Sync, ms'
col log_ckpt_sec                format 999,999              head 'CKPT|waits, s'
col redo_needed                 format 999,999              head 'Redo to|Add, MB'

compute max of cpu_sec          on instance_number
compute max of phy_reads        on instance_number
compute max of phy_writes       on instance_number
compute max of cr_served        on instance_number
compute max of current_served   on instance_number
compute max of phy_writes       on instance_number
compute max of redo_needed      on instance_number
compute max of log_ckpt_sec     on instance_number
compute max of avg_log_sync     on instance_number
compute max of avg_lfpw         on instance_number
compute max of redo_mb          on instance_number
compute max of processes        on instance_number
compute max of avg_df_seq       on instance_number
compute max of avg_df_scat      on instance_number

break on instance_number skip page

with t_interval as
(
 select /*+ inline */ sysdate-30 begin, sysdate as end from dual
)
select
  stats.dbid                                                                dbid,
  stats.instance_number                                                     instance_number,
  to_char(stats.begin_interval_time, 'YYYYMMDD HH24MI')                     tim,
  stats.cpu_used / 100                                                      cpu_sec,
  stats.phy_reads                                                           phy_reads,
  stats.phy_writes                                                          phy_writes,
  stats.cr_served                                                           cr_served,
  stats.current_served                                                      current_served,
  stats.redo_size / 1024 / 1024                                             redo_mb,
  procs.current_utilization                                                 processes,
--
  waits.df_seq_micro / 1000 / nullif(waits.df_seq_waits,0)                  avg_df_seq,
  waits.df_scat_micro / 1000 / nullif(waits.df_scat_waits,0)                avg_df_scat,
  (stats.redo_size - stats.md_redo_size) * 100 / stats.md_redo_size         redo_diff_to_md_pct,
  stats.redo_write_time*10/stats.redo_writes                                avg_lfpw,
  waits.log_sync_micro/nullif(waits.log_sync_waits, 0) / 1000               avg_log_sync,
  waits.log_ckpt_micro/1e6                                                  log_ckpt_sec,
  ( stats.redo_size /
     ( waits.snap_interval * 86400 ) ) *
   ( waits.log_ckpt_micro/1e6 ) / 1024 / 1024                               redo_needed,
  stats.is_restart
from
  (
   select
     snap_id,
     begin_interval_time,
     snap_interval,
     instance_number,
     dbid,
     log_sync_micro,
     log_sync_waits,
     log_ckpt_micro,
     log_ckpt_waits,
     df_seq_micro,
     df_seq_waits,
     df_scat_micro,
     df_scat_waits,
     direct_micro,
     direct_waits,
     median(log_sync_micro/nullif(log_sync_waits, 0)) over (partition by dbid, instance_number) md_log_sync_micro
   from
   (
      select
        snap_id,
        begin_interval_time,
        instance_number,
        dbid,
        max(snap_interval) snap_interval,
        max(decode(event_name, 'log file sync',                            wait_micro))    log_sync_micro,
        max(decode(event_name, 'log file sync',                            total_waits))   log_sync_waits,
        max(decode(event_name, 'log file switch (checkpoint incomplete)',  wait_micro))    log_ckpt_micro,
        max(decode(event_name, 'log file switch (checkpoint incomplete)',  total_waits))   log_ckpt_waits,
        max(decode(event_name, 'db file sequential read',                  wait_micro))    df_seq_micro,
        max(decode(event_name, 'db file sequential read',                  total_waits))   df_seq_waits,
        max(decode(event_name, 'db file scattered read',                   wait_micro))    df_scat_micro,
        max(decode(event_name, 'db file scattered read',                   total_waits))   df_scat_waits,
        max(decode(event_name, 'direct path read',                         wait_micro))    direct_micro,
        max(decode(event_name, 'direct path read',                         total_waits))   direct_waits
      from
      (
        select
          e.snap_id,
          e.instance_number,
          e.dbid,
          sn.begin_interval_time,
          cast(begin_interval_time as date) - cast(lag(begin_interval_time) over (partition by e.dbid, e.instance_number, e.event_name order by sn.begin_interval_time) as date) snap_interval,
          sn.startup_time,
          e.event_name,
          case when (sn.begin_interval_time >= sn.startup_time and lag(sn.begin_interval_time) over (partition by e.dbid, e.instance_number, e.event_name order by sn.begin_interval_time) < sn.startup_time)
            then e.time_waited_micro
            else e.time_waited_micro - lag(e.time_waited_micro) over (partition by e.dbid, e.instance_number, e.event_name order by sn.begin_interval_time)
          end wait_micro,
          case when (sn.begin_interval_time >= sn.startup_time and lag(sn.begin_interval_time) over (partition by e.dbid, e.instance_number, e.event_name order by sn.begin_interval_time) < sn.startup_time)
            then e.total_waits
            else e.total_waits - lag(e.total_waits) over (partition by e.dbid, e.instance_number, e.event_name order by sn.begin_interval_time)
          end total_waits
        from
          dba_hist_system_event e,
          dba_hist_snapshot     sn,
          t_interval            t
        where
          sn.snap_id = e.snap_id and
          sn.dbid = e.dbid and
          sn.instance_number = e.instance_number and
          sn.begin_interval_time between t.begin and t.end and
          e.event_name in (
            'log file sync',
            'log file switch (checkpoint incomplete)',
            'db file sequential read',
            'db file scattered read',
            'direct path read'
          )
      )
      group by dbid, instance_number, begin_interval_time, snap_id
    )
  ) waits,
  (
    select
      snap_id,
      begin_interval_time,
      instance_number,
      dbid,
      redo_size,
      redo_write_time,
      redo_writes,
      is_restart,
      cpu_used,
      phy_reads,
      phy_reads_cache,
      phy_writes,
      phy_writes_cache,
      cr_served,
      current_served,
      median(redo_size) over (partition by dbid, instance_number) md_redo_size
    from
    (
      select
        snap_id,
        begin_interval_time,
        instance_number,
        dbid,
        max(is_restart) is_restart,
        max(decode(stat_name, 'redo size',                       stat_diff)) redo_size,
        max(decode(stat_name, 'redo write time',                 stat_diff)) redo_write_time,
        max(decode(stat_name, 'redo writes',                     stat_diff)) redo_writes,
        max(decode(stat_name, 'CPU used by this session',        stat_diff)) cpu_used,
        max(decode(stat_name, 'physical read total IO requests', stat_diff)) phy_reads,
        max(decode(stat_name, 'physical reads cache',            stat_diff)) phy_reads_cache,
        max(decode(stat_name, 'physical write total IO requests',stat_diff)) phy_writes,
        max(decode(stat_name, 'physical writes from cache',      stat_diff)) phy_writes_cache,
        max(decode(stat_name, 'gc cr blocks served',             stat_diff)) cr_served,
        max(decode(stat_name, 'gc current blocks served',        stat_diff)) current_served
      from
      (
        select
          stats.snap_id,
          stats.instance_number,
          stats.dbid,
          sn.begin_interval_time,
          sn.startup_time,
          stats.stat_name,
          case when (sn.begin_interval_time >= sn.startup_time and lag(sn.begin_interval_time) over (partition by stats.dbid, stats.instance_number, stats.stat_id order by sn.begin_interval_time) < sn.startup_time)
            then stats.value
            else stats.value - lag(stats.value) over (partition by stats.dbid, stats.instance_number, stats.stat_id order by stats.snap_id)
          end stat_diff,
          case when (sn.begin_interval_time >= sn.startup_time and lag(sn.begin_interval_time) over (partition by stats.dbid, stats.instance_number, stats.stat_id order by sn.begin_interval_time) < sn.startup_time)
            then 'Yes'
          end is_restart
        from
          dba_hist_sysstat   stats,
          dba_hist_snapshot  sn,
          t_interval         t
        where
          sn.snap_id = stats.snap_id and
          sn.dbid = stats.dbid and
          sn.instance_number = stats.instance_number and
          sn.begin_interval_time between t.begin and t.end and
          stats.stat_name in (
            'redo size',
            'redo write time',
            'redo writes',
            'CPU used by this session',
            'physical read total IO requests',
            'physical reads cache',
            'physical write total IO requests',
            'physical writes from cache',
            'gc cr blocks served',
            'gc current blocks served'
          )
      )
      group by dbid, instance_number, begin_interval_time, snap_id
    )
  ) stats,
  (
    select
      stats.snap_id,
      stats.instance_number,
      stats.dbid,
      stats.resource_name,
      stats.current_utilization
    from
      dba_hist_resource_limit stats,
      dba_hist_snapshot       sn,
      t_interval              t
    where
      sn.snap_id = stats.snap_id and
      sn.dbid = stats.dbid and
      sn.instance_number = stats.instance_number and
      sn.begin_interval_time between t.begin and t.end and
      stats.resource_name = 'processes'
  ) procs
where
  waits.dbid = stats.dbid and
  waits.instance_number = stats.instance_number and
  waits.snap_id = stats.snap_id and
  waits.dbid = procs.dbid and
  waits.instance_number = procs.instance_number and
  waits.snap_id = procs.snap_id
order by
 stats.dbid, stats.instance_number, stats.begin_interval_time
;

该脚本转自pythian.

3.理解redo log file header,redo block header and redo record header

昨天有人问我redo log file header,redo block header, redo record header的区别,这三个确实容易混淆,简单聊聊这三个概念,This is importance.
以16进制格式来分析.偶尔会穿插点10进制的dump格式.
一.redo log file header

redo log file header即为重做日志文件头,文件头的内容部分其实包含2个,一个为file header block,一个为redo log header block.可能这么说还有点含糊,那么看下面这张图:

3

如上图所示,一个redo log文件的头部有2个块,第一个块为file header block,第二个块为redo log header block.下面大致的分析下每个块的存储内容.
截取第一个块的内容如下(我用UE打开格式):

log1

10进制的dump格式:

DUMP OF REDO FROM FILE '/oradata/ora10g/lu10g/redo03.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=2561616225=0x98af2961, Db Name='LU10G'
	Activation ID=2561579617=0x98ae9a61
	Control Seq=564=0x234, File size=102400=0x19000
	File Number=3, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000009, SCN 0x00000099c24d-0x0000009a40c4"
 thread: 1 nab: 0x16910 seq: 0x00000009 hws: 0x4 eot: 0 dis: 0
 resetlogs count: 0x31bca923 scn: 0x0000.000716f7 (464631)
 resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 prev resetlogs count: 0x268ea86b scn: 0x0000.00000001 (1)
 prev resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 Low  scn: 0x0000.0099c24d (10076749) 04/15/2014 12:08:04
 Next scn: 0x0000.009a40c4 (10109124) 04/16/2014 06:56:19
 Enabled scn: 0x0000.000716f7 (464631) 12/17/2013 23:00:51
 Thread closed scn: 0x0000.0099c24d (10076749) 04/15/2014 12:08:04
 Disk cksum: 0xeb7d Calc cksum: 0xeb7d
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 2134 blocks
 End-of-redo stream : No
 Unprotected mode
 Miscellaneous flags: 0x0
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000

在ue打开的16进制中每行是16bytes,一共32行,这为第一个块file header block.这个block包含的信息有限,逐一分析:
从第一个字节开始到第二个字节为0x22,这2个字节通常为oracle 文件类型的标识,像数据文件的开头这里就为0xA2,这里的0x22就代表redo log file.接着下来第一行就没有有意义的东西了其中的0xFFC0应该和scn中的base部分有关.再看看第二行的16个字节,可以发现在第21和22字节0x0200换算成10进制格式则为512,这里即代表block size.从第25到28字节0x00019000换算成10进制后为102400,而0x00019000代表redo block的数量,这里即为代表这个file的blocks或者理解为这个file的size.而在后续跟着的0x7a7b7c7d为文件标识符,为oracle快速识别文件的一种标识.到这之后第一个块的信息就没啦!

接下来分析第二个块redo log header block,第二个块的信息是在文件中的第33行开始,具体如下:
log2

后续再更新吧,有点耗时间.

How to Dump Redo Log File Information

此文章摘自id 28989.1,为解析redo部分必读的一个文章,没翻译,大家凑合吧.
====================================================================
You are working with Oracle Technical Support. As part of the diagnostic
process, you have been asked to take a dump of the redo log files. The
information in the logs is often used to help diagnose corruption issues.

The following commands will be used in this process:

1. The ‘alter session’ command is used to dump redo headers.

2. Use the ‘alter system dump logfile’ to dump log file contents.

This command requires ‘ALTER SYSTEM’ system privilege. The database can be in
mount, nomount or open state when the command is issued. An online log file
or an archived log file can be dumped. It is even possible to dump a
file from another database, as long as the operating systems are the same.

Output from the command is put into the session’s trace file.

The following ways of dumping a redo log file are covered:

1. To dump records based in DBA (Data Block Address)
2. To dump records based on RBA (Redo Block Address)
3. To dump records based on SCN
4. To dump records based on time
5. To dump records based on layer and opcode
6. Dump the file header information
7. Dump an entire log file:

1. To dump records based on DBA (Data Block Address)
————————————————–

This will dump all redo records for the range of data
blocks specified for a given file # and block # range.

From sqlplus (sqldba or svrmgr for older versions), issue the following command:

ALTER SYSTEM DUMP LOGFILE ‘filename’
DBA MIN fileno . blockno
DBA MAX fileno . blockno;

Example:
========
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’
DBA MIN 5 . 31125
DBA MAX 5 . 31150;

This will cause all the changes to the specified range of data blocks to be
dumped to the trace file. In the example given, all redo records for file #5,
blocks 31125 thru 31150 are dumped.

Note
====
For 10g:
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’
DBA MIN 5 . 31125 DBA MAX 5 . 31150;

will raise:
ORA-01963: Must specify a block number

In 10g we need to skip the dot ‘.’ while doing the redo dumps
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’
DBA MIN 5 31125 DBA MAX 5 31150;

@ As per Bug 3536989

2. To dump records based on RBA (Redo Block Address)
————————————————-

This will dump all redo records for the range of redo
addresses specified for the given sequence number and block number.

Syntax:
ALTER SYSTEM DUMP LOGFILE ‘filename’
RBA MIN seqno . blockno
RBA MAX seqno . blockno;

Example:
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’
RBA MIN 2050 . 13255
RBA MAX 2255 . 15555;

3. To dump records based on SCN
—————————-

Using this option will cause redo records owning changes within the SCN range
specified to be dumped to the trace file.

ALTER SYSTEM DUMP LOGFILE ‘filename’
SCN MIN minscn
SCN MAX maxscn;

Example:
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’
SCN MIN 103243
SCN MAX 103294;

If the purpose is to check the dumpfile you can rather do the following,
SQL> ALTER SYSTEM DUMP LOGFILE ‘filename’ SCN MIN 1 SCN MAX 1;

If the above completes sucessfully it ensures no issues with the archivelog.

4. To dump records based on time.
——————————

Using this option will cause redo records created within the time range
specified to be dumped to the trace file.

From sqlplus (sqldba or svrmgr for older versions), issue the following command:

ALTER SYSTEM DUMP LOGFILE ‘filename’
TIME MIN value
TIME MAX value;

Example:
========
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’
TIME MIN 299425687
TIME MAX 299458800;

Please Note: the time value is given in REDO DUMP TIME

@See Note 34026.1 for SQL script on converting date/time to REDO DUMP time.

5. To dump records based on layer and opcode.
——————————————

LAYER and OPCODE are used to dump all log records for a particular type of
redo record, such as all dropped row pieces.

From sqlplus (sqldba or svrmgr for older versions), issue the following command:

ALTER SYSTEM DUMP LOGFILE ‘filename’
LAYER value
OPCODE value;

Example:
========
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’
LAYER 11
OPCODE 3;

@Note: See Note 29733.1 for the list of LAYER and OPCODE values and their
@meaning.

6. Dump the file header information:
———————————

This will dump file header information for every
online redo log file.

From sqlplus (sqldba or svrmgr for older versions), issue the following command:

alter session set events ‘immediate trace name redohdr level 10’;

For dumping archivelog header,issue the following command:

ALTER SYSTEM DUMP LOGFILE ‘filename’ RBA MIN 1 1 RBA MAX 1 1;

7. Dump an entire log file:
————————

From sqlplus (sqldba or svrmgr for older versions), issue the following command:

ALTER SYSTEM DUMP LOGFILE ‘filename’;

Please note:
Fully qualify the filename, and include the single quotes.

Example:
========
ALTER SYSTEM DUMP LOGFILE ‘u01/oracle/V7323/dbs/arch1_76.dbf’;

The dump of the logfile will be written into a trace file in the udump destination.
Use the command ‘show parameters dump’ within an sqlplus session.
The ouput will show the location for the udump destination where
the trace file exists.

@ References:
@ ===========
@ Note 28989.1

4.解析Redo:change vectors

声明:我写这一系列文章纯作为个人兴趣研究,只代表个人观点,如有误欢迎大家斧正.
在说明vector之前我补充要对redo log file header以及redo block header还有redo record header做个区分

change vector顾名思义变更向量,redo record中除了header之外,存储的就是change vector.它