恢复被删除(损坏、丢失)的goldengate trailfile
1.1恢复目标被删除(损坏、丢失)的trailfile
原理:根据目标端trailfile最后应用的scn来去源端找到准确位置后重传数据
测试方法:手动删除目标端的最后一个trail file后进行测试(并不是完全恢复这个文件,只是将删除文件中未应用的数据找出,产生包含丢失数据的文件而已)
关于trailfile存在源、目标方式:
1、源端、目标拥有独立的trailfile.extract抽取后经过datapump传输到目标端
2、源端、目标共享trailfile.extract抽取被replicat进程直接读取.这种不需要配置datapump,这种情形使用不如第一种情形多.
测试场景:
采用第一种,源端和目标端都是拥有独立的trailfile.是否想过源、目的trailfile文件大小或文件个数能否对应上,恢复直接从源端复制过来就可以使用?
具体如下.
Ex开头是源端,dp是目标端.初步对比来看个数以及文件大小都存在差异,初步设想不成功.
操作步骤:
1、先停止目标端的replicat进程.
2、源端:对一个空表插入100条数据.然后手动rename目标端未应用的trailfile一个.
验证源端数据:
SQL> select count(*) from leo_rows; COUNT(*) ---------- 100
验证目标端数据:
SQL> select count(*) from leo_rows; COUNT(*) ---------- 0
通过这个信息对比,发现2边数据不同步.
3、rename目标端未应用的trailfile。例如dp000014变成dp000014.old
mv dp000014 dp000014.old
4、根据目标端分析上次成功checkpoint信息位置找出目标端实际应用位置.
从checkpoint信息中查出实际应用到位置是extseqno=14、extrba=8751283.
但是此时dp000014文件已经丢失(只是rename而已).10g以后版本trailfile记录csn信息,不需要通过@genenv(“ORATRANSACTION”,”SCN”)强制写个token到trailfile.SCN是指Oracle transaction id的对应数值.可以通过Oracle v$transaction视图查询.查看进程的checkpoint里面记录了
CSN stateinformation: CRC: 55-A2-8E-E4 Latest CSN: 1963641—这个就是对应Oraclescn. Latest TXN: 10.23.846 –这个是transactionid Latest CSN of finished TXNs: 1963641:这里面提示已经完成.不如此事务是不完整的,需要从重新同步这个事物. Completed TXNs: 10.23.846 GGSCI (ludatou) 4> inforepleo showch REPLICAT REPLEO Last Started 2014-04-22 20:27 Status STOPPED Checkpoint Lag 00:00:00 (updated 00:09:17 ago) Log Read Checkpoint File ./dirdat/dp000014 2014-04-22 20:32:05.801351 RBA 8751283 Current Checkpoint Detail: Read Checkpoint #1 GGS Log Trail Startup Checkpoint (starting position in thedata source): Sequence #: 14 RBA: 8748614 Timestamp: 2014-04-22 13:01:47.962664 Extract Trail: ./dirdat/dp Current Checkpoint (position of last recordread in the data source): Sequence #: 14 RBA: 8751283 Timestamp: 2014-04-22 20:32:05.801351 Extract Trail: ./dirdat/dp CSN state information: CRC: 55-A2-8E-E4 Latest CSN: 1963641 Latest TXN: 10.23.846 Latest CSN of finished TXNs: 1963641 Completed TXNs: 10.23.846 Header: Version = 2 Record Source = A Type = 1 # Input Checkpoints = 1 # Output Checkpoints = 0 File Information: Block Size = 2048 Max Blocks = 100 Record Length = 2048 Current Offset = 0 Configuration: Data Source = 0 Transaction Integrity = -1 Task Type = 0 Database Checkpoint: Checkpoint table = goldengate.ogg_checkpoint Key = 296275922 (0x11a8cfd2) Create Time = 2014-04-22 07:56:50 Status: Start Time = 2014-04-22 20:27:58 Last Update Time = 2014-04-22 20:37:41 Stop Status = G Last Result = 400
从上述看已经最后被应用的事务信息包括如下:
1、 csn=1963641
2、 TXN=10.23.846
根据如上信息登陆源端系统找到对应trailfile的rba信息.然后根据这个位置,重新修改datapump从这个位置传输trailfile到目标端.
开始恢复数据:
登陆源端主机:
[ludatou ~]$ cd $OGG [ludatou112101]$cd dirdat [ludatoudirdat]$ ls -lrt total 126000 -rw-rw-rw- 1oracle oinstall 15198 Apr 27 12:24ex000000 -rw-rw-rw- 1oracle oinstall 9999850 Apr 27 12:28 ex000001 -rw-rw-rw- 1oracle oinstall 9999928 Apr 27 12:28 ex000002 -rw-rw-rw- 1oracle oinstall 9999824 Apr 27 12:33 ex000003 -rw-rw-rw- 1oracle oinstall 9999838 Apr 27 12:33 ex000005 -rw-rw-rw- 1oracle oinstall 9999495 Apr 27 12:33 ex000004 -rw-rw-rw- 1oracle oinstall 9999904 Apr 27 12:33 ex000007 -rw-rw-rw- 1oracle oinstall 9999928 Apr 27 12:33 ex000006 -rw-rw-rw- 1oracle oinstall 9999863 Apr 27 12:41 ex000010 -rw-rw-rw- 1oracle oinstall 9999714 Apr 27 12:41 ex000009 -rw-rw-rw- 1oracle oinstall 9999792 Apr 27 12:41 ex000008 -rw-rw-rw- 1oracle oinstall 9999867 Apr 27 12:44 ex000012 -rw-rw-rw- 1oracle oinstall 9999804 Apr 27 12:44 ex000011 -rw-rw-rw- 1oracle oinstall 8762807 Apr 27 21:01 ex000013
从根据目标段的时间来源端寻找时机最靠近的trailfile.即ex000013是第一选择,如果对应csn信息不在000013,则相前反向推找.
[ludatou dirdat]$ cd $OGG [ludatou 112101]$./logdump –使用logdump对trailfile进行操作 Logdump 236>open ./dirdat/ex000013 Current LogTrailis /goldengate/112101/dirdat/ex000013 Logdump 237>ghdr on Logdump 238>detail data Logdump 239>usertoken detail Logdump 240>ggstoken detail Logdump 241>filter include csn = 1963641 --来自目标端的csn Logdump 242>n --查看满足条件的数据 Scanned 10000 records, RBA 3160036, 2014/04/27 12:44:51.000.000 Scanned 20000 records, RBA 6320505, 2014/04/27 12:44:51.000.000 ___________________________________________________________________ Hdr-Ind : E (x45) Partition : . (x00) UndoFlag : . (x00) BeforeAfter: A (x41) RecLength : 1204 (x04b4) IO Time : 2014/04/27 20:53:21.000.000 IOType : 160 (xa0) OrigNode : 0 (x00) TransInd : . (x03) FormatType : R (x52) SyskeyLen : 0 (x00) Incomplete : . (x00) AuditRBA : 0 AuditPos : 0 Continued : N (x00) RecCount : 1 (x01) 2014/04/2720:53:21.000.000 DDLOP Len 1204 RBA 8749251 Name: After Image: Partition 0 G s 2c43 353d 2732 3533 3127 2c2c 4237 3d27 32353331 | ,C5='2531',,B7='2531 272c 2c42 323d 2727 2c2c 4233 3d27 4c45 4f272c2c | ',,B2='',,B3='LEO',, 4234 3d27 4c45 4f5f 524f 5753 272c 2c43 31323d27 | B4='LEO_ROWS',,C12=' 272c 2c43 3133 3d27 272c 2c42 353d 2754 41424c45 | ',,C13='',,B5='TABLE 272c 2c42 363d 2743 5245 4154 4527 2c2c 42383d27 | ',,B6='CREATE',,B8=' 474f 4c44 454e 4741 5445 2e47 4753 5f44 444c5f48 | GOLDENGATE.GGS_DDL_H 4953 5427 2c2c 4239 3d27 4c45 4f27 2c2c 43373d27 | IST',,B9='LEO',,C7=' GGS tokens: TokenID x52 'R'ORAROWID Info x00 Length 20 4141 4156 5849 4141 4541 4141 4146 6741 41460001 | AAAVXIAAEAAAAFgAAF.. TokenID x44 'D'DDL Info x00 Length 36 4c45 4f00 4c45 4f5f 524f 5753 0031 312e 322e302e | LEO.LEO_ROWS.11.2.0. 342e 3000 3131 2e32 2e30 2e34 2e30 004e | 4.0.11.2.0.4.0.N TokenID x4c 'L'LOGCSN Info x00 Length 7 3139 3633 3634 31 | 1963641 –与目标csn符合 TokenID x36 '6'TRANID Info x00 Length 9 3130 2e32 332e 3834 36 | 10.23.846 –与目标tranid符合 Filteringsuppressed 27661 records
根据上述信息已经找到目标端应用的数据信息.
Datapump需要从满足条件下一条记录开始传输.所以继续输入N捕获下一条数据信息的rba
但是输入N却发现没有任何数据返回,是到达文件结尾了吗?根据判断这个属于中间数据.经过分析发现,原来是filter include csn = 1963641搞的鬼.现在清楚这个过滤条件.根据下面输出可以获得正确rba信息是:
Logdump 243 >n Filteringsuppressed 100 records Logdump 244>filter clear Logdump 246>pos 8749251 –来自上面的rba信息,是目标端已经应用的数据. Reading forwardfrom RBA 8749251 Logdump 247>n –表示输出这个数据的详细信息.下n才是我们需要的数据 ___________________________________________________________________ Hdr-Ind : E (x45) Partition : . (x00) UndoFlag : . (x00) BeforeAfter: A (x41) RecLength : 1204 (x04b4) IO Time : 2014/04/27 20:53:21.000.000 IOType : 160 (xa0) OrigNode : 0 (x00) TransInd : . (x03) FormatType : R (x52) SyskeyLen : 0 (x00) Incomplete : . (x00) AuditRBA : 0 AuditPos : 0 Continued : N (x00) RecCount : 1 (x01) 2014/04/2720:53:21.000.000 DDLOP Len 1204 RBA 8749251 Name: After Image: Partition 0 G s 2c43 353d 2732 3533 3127 2c2c 4237 3d27 32353331 | ,C5='2531',,B7='2531 272c 2c42 323d 2727 2c2c 4233 3d27 4c45 4f272c2c | ',,B2='',,B3='LEO',, 4234 3d27 4c45 4f5f 524f 5753 272c 2c43 31323d27 | B4='LEO_ROWS',,C12=' 272c 2c43 3133 3d27 272c 2c42 353d 2754 41424c45 | ',,C13='',,B5='TABLE 272c 2c42 363d 2743 5245 4154 4527 2c2c 4238 3d27| ',,B6='CREATE',,B8=' 474f 4c44 454e 4741 5445 2e47 4753 5f44 444c5f48 | GOLDENGATE.GGS_DDL_H 4953 5427 2c2c 4239 3d27 4c45 4f27 2c2c 43373d27 | IST',,B9='LEO',,C7=' GGS tokens: TokenID x52 'R'ORAROWID Info x00 Length 20 4141 4156 5849 4141 4541 4141 4146 6741 41460001 | AAAVXIAAEAAAAFgAAF.. TokenID x44 'D'DDL Info x00 Length 36 4c45 4f00 4c45 4f5f 524f 5753 0031 312e 322e302e | LEO.LEO_ROWS.11.2.0. 342e 3000 3131 2e32 2e30 2e34 2e30 004e | 4.0.11.2.0.4.0.N TokenID x4c 'L'LOGCSN Info x00 Length 7 3139 3633 3634 31 | 1963641 TokenID x36 '6'TRANID Info x00 Length 9 3130 2e32 332e 3834 36 | 10.23.846 Logdump 248>n --这里是正确的rba信息,同时判断这个条记录是否开始位置.根据transind后面数字x00表示是开始位置,同时也确认我们找的rba是正确的.否则就是错误的. TransInd : . (x00) 表示记录开始位置 TransInd : . (x01) 表示记录中间位置 TransInd : . (x02) 表示记录结尾位置 TransInd : . (x03) 表示这个事务只有1条记录 ___________________________________________________________________ Hdr-Ind : E (x45) Partition : . (x04) UndoFlag : . (x00) BeforeAfter: A (x41) RecLength : 31 (x001f) IO Time : 2014/04/27 21:01:11.000.000 IOType : 5 (x05) OrigNode : 255 (xff) TransInd : . (x00) FormatType : R (x52) SyskeyLen : 0 (x00) Incomplete : . (x00) AuditRBA : 161 AuditPos : 11343376 Continued : N (x00) RecCount : 1 (x01) 2014/04/2721:01:11.000.000 Insert Len 31 RBA 8750598 NameEO.LEO_ROWS After Image: Partition 4 G b 0000 0005 0000 0001 3100 0100 0700 0000 036c656f | ........1........leo 0002 0007 0000 0003 3130 30 | ........100 Column 0 (x0000), Len 5 (x0005) 0000 0001 31 | ....1 Column 1 (x0001), Len 7 (x0007) 0000 0003 6c65 6f | ....leo Column 2(x0002), Len 7 (x0007) 0000 0003 3130 30 | ....100 GGS tokens: TokenID x52 'R'ORAROWID Info x00 Length 20 4141 4156 5976 4141 4541 4141 4146 3141 41410001 | AAAVYvAAEAAAAF1AAA.. TokenID x4c 'L'LOGCSN Info x00 Length 7 3139 3635 3636 33 | 1965663 TokenID x36 '6'TRANID Info x00 Length 8 382e 342e 3130 3638 | 8.4.1068
停止源端的pump,修改配置
Logdump 249>exit [ludatou112101]$ ./ggsci OracleGoldenGate Command Interpreter for Oracle Version11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO Linux, x64,64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14 Copyright (C)1995, 2012, Oracle and/or its affiliates. All rights reserved. GGSCI (ludatou)1> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT RUNNING DPLEO 00:00:00 00:00:06 EXTRACT RUNNING EXTLEO 00:00:03 00:00:02 GGSCI (cifpay02)2> stop DPLEO Sending STOPrequest to EXTRACT DPLEO ... Requestprocessed. GGSCI (ludatou)3> alter extract dpleo,extrba 8750598 EXTRACT altered. GGSCI (cifpay02)4> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT STOPPED DPLEO 00:00:00 00:00:01 EXTRACT RUNNING EXTLEO 00:00:02 00:00:07 GGSCI (cifpay02)5> start dpleo Sending STARTrequest to MANAGER ... EXTRACT DPLEOstarting GGSCI (ludatou)14> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT ABENDED DPLEO 00:00:00 00:02:38 EXTRACT RUNNING EXTLEO 00:00:02 00:00:07
从上面看到dumpabend,主要目标端dp000014被手动删除,导致dump传输数据无法lock trailfile,当目标端达到设置大小时候才会切换到下一个或者我们手动切换到下一个.下面手动切换dump从新文件开始写,即dp000015.其实dump指定写目标端extseqno.
GGSCI (cifpay02)15> alter DPLEO,etrollover 2014-04-2803:43:52 INFO OGG-01520 Rollover performed. For eachaffected output trail of Version 10 or higher format, after starting the sourceextract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT orREPLICAT) to move the reader's scan to the new trail file; it will not happen automatically. EXTRACT altered.
目标端确认配置
GGSCI (ludatou)1> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT STOPPED REPLEO 00:00:00 06:56:12
因为dp000014已经不存在了,修改从dp000015开始读取.
GGSCI (cifpay01)2> shell ls -lrt ./dirdat/* -rw-rw-rw- 1oracle oinstall 13135 May 2 10:08 ./dirdat/dp000000 -rw-rw-rw- 1oracle oinstall 3120 May 2 12:03 ./dirdat/dp000001 -rw-rw-rw- 1oracle oinstall 9999708 May 2 12:07./dirdat/dp000002 -rw-rw-rw- 1oracle oinstall 9999967 May 2 12:07./dirdat/dp000003 -rw-rw-rw- 1oracle oinstall 9999491 May 2 12:13./dirdat/dp000005 -rw-rw-rw- 1oracle oinstall 9999677 May 2 12:13./dirdat/dp000004 -rw-rw-rw- 1oracle oinstall 9999879 May 2 12:13./dirdat/dp000006 -rw-rw-rw- 1oracle oinstall 9999975 May 2 12:13./dirdat/dp000007 -rw-rw-rw- 1oracle oinstall 9999943 May 2 12:13./dirdat/dp000008 -rw-rw-rw- 1oracle oinstall 9999749 May 2 12:21./dirdat/dp000009 -rw-rw-rw- 1oracle oinstall 9999908 May 2 12:21./dirdat/dp000011 -rw-rw-rw- 1oracle oinstall 9999743 May 2 12:21./dirdat/dp000010 -rw-rw-rw- 1oracle oinstall 9999841 May 2 12:24./dirdat/dp000012 -rw-rw-rw- 1oracle oinstall 9999908 May 2 12:24./dirdat/dp000013 -rw-rw-rw- 1 oracleoinstall 8763492 May 2 20:40./dirdat/dp000014.old -rw-rw-rw- 1oracle oinstall 13269 May 3 03:23 ./dirdat/dp000015 GGSCI (ludatou)4> alter REPLEO,extseqno 15,extrba 0 REPLICATaltered. GGSCI (ludatou)5> start repleo Sending STARTrequest to MANAGER ... REPLICAT REPLEOstarting GGSCI (ludatou)6> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING REPLEO 00:00:00 00:00:09
目标端验证数据是否同步
经过查看源端leo_rows表中100条数据已经成功同步到目标端.由此可以确认恢复被删除的trailfile成功.
GGSCI (ludatou)8> stats REPLEO Sending STATSrequest to REPLICAT REPLEO ... Start ofStatistics at 2014-05-03 03:36:13. Replicating fromLEO.LEO_ROWS to LEO.LEO_ROWS: *** Totalstatistics since 2014-05-03 03:35:23 *** Total inserts 100.00 Total updates 0.00 Total deletes 0.00 Total discards 0.00 Total operations 100.00 *** Dailystatistics since 2014-05-03 03:35:23 *** Total inserts 100.00 Total updates 0.00 Total deletes 0.00 Total discards 0.00 Total operations 100.00 *** Hourlystatistics since 2014-05-03 03:35:23 *** Total inserts 100.00 Total updates 0.00 Total deletes 0.00 Total discards 0.00 Total operations 100.00 *** Lateststatistics since 2014-05-03 03:35:23 *** Total inserts 100.00 Total updates 0.00 Total deletes 0.00 Total discards 0.00 Total operations 100.00 End ofStatistics.
至此目标端被删除的trailfile中数据已经恢复成功.