Skip to content

Oracle - 5. page

PGA自动管理的介绍

PGA–Process Global Area,常被称为Program Global Area。
In Oracle8i,DBA需要修改如下参数来调整PGA:
SORT_AREA_SIZE, HASH_AREA_SIZE,
BITMAP_MERGE_AREA_SIZE, and CREATE_BITMAP_AREA_SIZE, etc.
Oracle9i开始,使用PGA自动管理,只需要设置 PGA_AGGREGATE_TARGET. 参数。
PGA_AGGREGATE_TARGET的值在实例运行期间可以动态改变。

SQL执行内存自动管理功能是通过设置 WORKAREA_SIZE_POLICY 为AUTO并指定PGA_AGGREGATE_TARGET的值来实现的。这两个参数可以动态设置。
从9I R2开始,可以通过PGA Advisory 来设置更合理的PGA大小。

PGA_AGGREGATE_TARGET 参数并不是设置一个hard limit on pga size,像PL/SQL的工作区它影响不到。
它分为work area和work area之外的部分。
所谓work area,就是session要执行SQL,需要在数据库内部分配的,为了存放中间结果的内存。
比如sort area,为了排序用的内存,比如hash area,为了hash join用的内存,这一部分直接和SQL执行相关,影响SQL执行的效率,比如更大hash area会让hash join更快。
pga_aggregate_target实际上只限制work area的大小。当work area达到pga_aggregate_target的限制,则会产生4030错误。
隐含参数_pga_max_size只是限制单个进程使用的work area大小,也是在pga_aggregate_target的限制之内的更小限制。
而work area之外的内存,不被pga_aggregate_target和_pga_max_size所限制。
所以你经常会看到PGA的大小超过了pga_aggregate_target。这通常是因为PL/SQL中的变量和数组中装入了巨大的数据造成的。
通常的例子是bulk collect,imp/exp,sql loader等工具也会产生类似的问题。

在9iR2, PGA_AGGREGATE_TARGET 只对dedicated server起作用。
在10G,PGA_AGGREGATE_TARGET 对dedicated /ASHARED server都起伤作用。
在11G使用AMM时 PGA_AGGREGATE_TARGET的设置会使AMM设置PGA不会低于此值。
the size of a process没有数据库参数限制,只有通过OS的kernel limits or user shell limits。

参考ORACLE MOS文档:

Automatic PGA Memory Management (文档 ID 223730.1)

FAQ: ORA-4030 [Video] (文档 ID 399497.1)
Keep in mind that the PGA_AGGREGATE_TARGET does not limit the amount of PGA memory usage. It is only a target and is used to dynamically size the process work areas. It does not affect other areas of the PGA that are allowed to grow beyond this limit.

There are certain areas of PGA that cannot be controlled by initialization parameters. Such areas include PL/SQL memory collections such as PL/SQL tables and VARRAYs, and local PL/SQL variables. Depending on the programming code and amount of data being handled these areas can grow very large (up to 20G internal limit on 10) and can consume large amounts of memory. This memory growth can be controlled by good programming practices. As an example, use LIMIT clause with BULK COLLECT.

PGA相关查询脚步

##查AWR快照间隔中PGA的增长情况
SQL> set linesize 180
SQL> SELECT SNAP_ID,INSTANCE_NUMBER,name,value FROM DBA_HIST_PGASTAT where name in(‘max processes count’,’total PGA allocated’) and snap_id between 24943 and 24946;

SNAP_ID INSTANCE_NUMBER NAME VALUE
———- ————— —————————— ————————
24943 1 max processes count 126
24943 1 total PGA allocated 608,856,064
24944 1 max processes count 126
24944 1 total PGA allocated 630,258,688
24945 1 max processes count 126
24945 1 total PGA allocated 678,053,888
24946 1 max processes count 126
24946 1 total PGA allocated 618,394,624

8 rows selected.

SELECT a.SNAP_ID,a.INSTANCE_NUMBER,a.name,a.value,b.BEGIN_INTERVAL_TIME,b.END_INTERVAL_TIME FROM DBA_HIST_PGASTAT a,DBA_HIST_SNAPSHOT b
where a.name in(‘max processes count’,’total PGA allocated’) and a.snap_id=b.snap_id and b.BEGIN_INTERVAL_TIME>sysdate-1;

#####################################################3
2.PGA大小设置与调整
1.PGA设置的规则:
– For OLTP systems PGA_AGGREGATE_TARGET = (physical memory * 80%) * 20% 物理内存*0.16
– For DSS systems PGA_AGGREGATE_TARGET = (physical memory * 80%) * 50% 物理内存*0.4

2.查看PGA效率–
select
trunc (
(sum(case when name like ‘workarea executions – optimal’
then value else 0 end) *100) /
(
sum(case when name like ‘workarea executions – optimal’
then value else 0 end) +
sum(case when name like ‘workarea executions – one pass’
then value else 0 end) +
sum(case when name like ‘workarea executions – multipass’
then value else 0 end)
)
) optimal_percent
from v$sysstat
where name like ‘workarea executions – %’
/

workarea memory allocated 总的PGA KB
workarea executions – optimal 最佳值
workarea executions – onepass 单次不可避免的排序写入磁盘
workarea executions – multipass 多次排序写入磁盘–应该避免

3.实例级别的PGA使用统计信息
SELECT * FROM V$PGASTAT;
ORACLE 9IR2,10G,11G ,PGA命中率算法如下:
total bytes processed * 100
PGA Cache Hit Ratio = ——————————————————
(total bytes processed + total extra bytes read/written)

4.查看PGA中BUCKET分布、使用情况–碎片情况
SELECT LOW_OPTIMAL_SIZE/1024 low_kb,(HIGH_OPTIMAL_SIZE+1)/1024 high_kb,
optimal_executions, onepass_executions, multipasses_executions
FROM v$sql_workarea_histogram
WHERE total_executions != 0;

SELECT optimal_count, round(optimal_count*100/total, 2) optimal_perc,
onepass_count, round(onepass_count*100/total, 2) onepass_perc,
multipass_count, round(multipass_count*100/total, 2) multipass_perc
FROM
(SELECT decode(sum(total_executions), 0, 1, sum(total_executions)) total,
sum(OPTIMAL_EXECUTIONS) optimal_count,
sum(ONEPASS_EXECUTIONS) onepass_count,
sum(MULTIPASSES_EXECUTIONS) multipass_count
FROM v$sql_workarea_histogram
WHERE low_optimal_size > 64*1024); —- for 64 K optimal size

5.查看PGA中有哪些操作–是否写入临时段
SELECT to_number(decode(SID, 65535, NULL, SID)) sid,
operation_type OPERATION,trunc(EXPECTED_SIZE/1024) ESIZE,
trunc(ACTUAL_MEM_USED/1024) MEM, trunc(MAX_MEM_USED/1024) “MAX MEM”,
NUMBER_PASSES PASS, trunc(TEMPSEG_SIZE/1024) TSIZE
FROM V$SQL_WORKAREA_ACTIVE
ORDER BY 1,2;

6.pga_target_advice视图的使用
– PGA_AGGREGATE_TARGET
– STATISTICS_LEVEL. Set this to TYPICAL (the default) or ALL;

SELECT round(PGA_TARGET_FOR_ESTIMATE/1024/1024) target_mb,
ESTD_PGA_CACHE_HIT_PERCENTAGE cache_hit_perc,
ESTD_OVERALLOC_COUNT
FROM v$pga_target_advice;

#######################################
1.
TOP 10工作区
Finding top ten work areas requiring the most cache memory:

select *
from
(select workarea_address, operation_type, policy, estimated_optimal_size
from v$sql_workarea
order by estimated_optimal_size DESC)
where ROWNUM <=10;

2. 找出使用内存最多的工作区
Finding the percentage of work areas using maximum memory:

select operation_type, total_executions * 100 / optimal_executions “%cache”
From v$sql_workarea
Where policy=’AUTO’
And optimal_executions > 0
Order By operation_type;

3. 找出当前系统中分配内存最多的工作区
Finding the top ten biggest work areas currently allocated in the system:

select c.sql_text, w.operation_type, top_ten.wasize
From (Select *
From (Select workarea_address, actual_mem_used wasize
from v$sql_workarea_active
Order by actual_mem_used)
Where ROWNUM <=10) top_ten,
v$sql_workarea w,
v$sql c
Where w.workarea_address=top_ten.workarea_address
And c.address=w.address
And c.child_number = w.child_number
And c.hash_value=w.hash_value;

4. Finding the percentage of memory that is over and under allocated:

select total_used,
under*100/(total_used+1) percent_under_use,
over*100/(total_used+1) percent_over_used
From
( Select
sum(case when expected_size > actual_mem_used
then actual_mem_used else 0 end) under,
sum(case when expected_size < actual_mem_used
then actual_mem_used else 0 end) over,
sum(actual_mem_used) total_used
From v$sql_workarea_active
Where policy=’AUTO’) usage;

5.List largest process:
/* Do NOT eliminate all background process because certain background processes do need to be monitored at times */

SELECT pid, spid, substr(username,1,20) “USER”, program, pga_used_mem, pga_alloc_mem, pga_freeable_mem, pga_max_mem
FROM v$process
WHERE pga_alloc_mem = (SELECT max(pga_alloc_mem)
FROM v$process
WHERE program NOT LIKE ‘%LGWR%’);

Oracle PGA自动管理的介绍与相关查询方法

1.死事务概念

死事务出现在异常关闭数据库或者事务进程不正常结束,比如KILL -9,shutdown abort的情况下。
由于死事务已经无法通过v$transaction来观察,所以必须通过内部表来进行判断。
这个内部表是x$ktuxe,该表会记录Dead事务的恢复进度:

死事务的回滚进程数可以通过参数fast_start_parallel_rollback来设置是否启用并行恢复。
此参数有3个值:
FALSE –>Parallel rollback is disabled 即FALSE时关闭并行回滚,使用SMON的串行恢复;
LOW –>Limits the maximum degree of parallelism to 2 * CPU_COUNT LOW是11GR2默认值,
HIGH –>Limits the maximum degree of parallelism to 4 * CPU_COUNT
对应的回滚进程是:ora_p025_orcl ora_p022_orcl 这种都是后台启动的并行回滚的进程。
修改此参数值,

2.关于并行回滚:

并行回滚是oracle的一个特性,初衷是为了加速失败事务的回滚,缩小回滚时间,Oracle设计这个特性的初衷大多时候都是实现了。
但是在一些情况下,并行回滚的进程间会产生资源的争用,
如buffer busy waits/wait for a undo record等,严重时会导致并行回滚互相阻塞,导致回滚恢复非常非常慢。
因为这个特性可能导致的数据库hang或者部分业务hang的情况。
根据经验总结一翻并行回滚的情况怕遇见2种情况:
1.并行回滚相关的bug,此类bug有不少
2.大事务的回滚
事务恢复相关的MOS文档:Master Note: Troubleshooting Database Transaction Recovery (文档 ID 1494886.1)。

对此并行回滚类问题的分析步骤:
1.分析alert.log
2.分析block=1的进程是否存在,是否存在死事务
3.分析发生情况时的awr以及ash
4.分析发生情况时的io/cpu情况
5.分析相关操作时的等待

而并发度如果是high或low,4倍或2倍的cpu数,也会受到另外一些参数的影响,如PARALLEL_MAX_SERVERS,这个才是真正最大的并发度设置。
PARALLEL_MAX_SERVERS这个参数的默认值为
PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 5。
PARALLEL_THREADS_PER_CPU 和CPU_COUNT都是初始化参数。
concurrent_parallel_users 根据是否启用automatic memory management ,如果禁用automatic memory management 则这个值为1,如果pga自动管理则这个值是2,如果是automatic memory management 启用,则这个值是4。

_cleanup_rollback_entries –>number of undo entries to apply per transaction clean,调大会增快回滚速度,但是是静态参数,需要重启设置。

当并行回滚HANG住时,可以关闭并行回滚:
Master Note: Troubleshooting Database Transaction Recovery (文档 ID 1494886.1)中说法:
If you change the value of this parameter, then transaction recovery will be stopped and restarted with the new implied degree of parallelism.
alter system set fast_start_parallel_rollback=false;

How to Disable Parallel Transaction Recovery When Parallel Txn Recovery is Active (文档 ID 238507.1)中做法是先禁用SMON恢复(可不进行而直接KILL并行恢复的进程),OS上KILL 并行恢复的slave进程,再修改fast_start_parallel_rollback=false,再开启SMON恢复。
过程如下:
SQL> select pid, program from v$process where program like ‘%SMON%’;
PID PROGRAM
———- ————————————————
6 oracle@stsun7 (SMON)

SQL> oradebug setorapid ‘SMON’s Oracle PID’;
SQL> oradebug event 10513 trace name context forever, level 2

select SPID from V$PROCESS where PID in (select PID from V$FAST_START_SERVERS);
then kill these processes from the OS level with : kill -9 n

alter system set fast_start_parallel_roll.lback=false;

SQL> oradebug setorapid ‘SMON’s Oracle PID’;
SQL> oradebug event 10513 trace name context off
跟踪smon进程
event = ‘10500 trace name context forever, level 1’

3.关于关闭事务回滚:

—是不建议的,一般用在UNDO段损坏数据库无法打开时。
Disable Transaction Recovery
Disabling Transaction recovery could temporarily open up the database.
SMON will stop performing all the transaction recovery.
There is no way to stop only one Transaction recovery / few transaction recoveries.
When the transaction recovery is disabled, any data modified by this dead transaction but required by other transactions will be recovered on “as needed” basis.
We always suggest to disable the transaction recovery only with the help of Oracle Support.
NOTE: Oracle does not recommend to Disable the Transaction recovery and open the Database for Business Use.

 

4.大事务回滚的常用脚本

1.查询死事务及其大小:
select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State, ktuxesiz Undo,to_char(sysdate,’DD-MON-YYYY HH24:MI:SS’) “Current_Time”
from x$ktuxe
where ktuxesta <> ‘INACTIVE’
and ktuxecfl like ‘%DEAD%’
order by ktuxesiz asc;

select distinct KTUXECFL,count(*) from x$ktuxe group by KTUXECFL;

KTUXECFL COUNT(*)
———————— ———-
DEAD 1
NONE 2393
SCO|COL 8

===========
ktuxeusn – Undo Segment Number
ktuxeslt – Slot number
ktuxesqn – Sequence
ktuxesta – State
ktuxesiz – Undo Blocks Remaining
ktuxecfl – Flag
===========
USN SLOT SEQ STATE UNDO
——- ———- ——- ———- ———-

9 9 335 ACTIVE 10337

2.查询包含死事务的UNDO段的信息:
select useg.segment_name, useg.segment_id, useg.tablespace_name, useg.status
from dba_rollback_segs useg
where useg.segment_id in (select unique ktuxeusn
from x$ktuxe
where ktuxesta <> ‘INACTIVE’
and ktuxecfl like ‘%DEAD%’);

SEGMENT_NAME SEGMENT_ID TABLESPACE_NAME STATUS
——————– —————– ————————— ————
_SYSSMU9$ 9 UNDOTBS1 ONLINE

select usn,name
from v$rollname
where usn in (select unique ktuxeusn
from x$ktuxe
where ktuxesta <> ‘INACTIVE’
and ktuxecfl like ‘%DEAD%’);

3.大事务恢复所需的时间
alter session set nls_date_format=’yyyymmdd hh24:mi:ss’;
select usn, state, undoblockstotal “Total”, undoblocksdone “Done”, undoblockstotal-undoblocksdone “ToDo”,
decode(cputime,0,’unknown’,sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) “Estimated time to complete”
from v$fast_start_transactions;
USN STATE Total Done ToDo Estimated time to Complete
——– —————- ——– ——– ——– ———————————
5 RECOVERING 16207 14693 1514 11-may-2012 08:05:40

4.对于死事务回滚所需时间的评估:
select distinct KTUXECFL,count(*) from x$ktuxe group by KTUXECFL;
KTUXECFL COUNT(*)
———————— ———-
DEAD 1
NONE 2393
SCO|COL 8

select ADDR,KTUXEUSN,KTUXESLT,KTUXESQN,KTUXESIZ from x$ktuxe where KTUXECFL=’DEAD’;
ADDR KTUXEUSN KTUXESLT KTUXESQN KTUXESIZ
—————- ———- ———- ———- ———-
00002B92FF5D5F68 15 12 314961 43611
–KTUXESIZ代表需要回滚的回滚块数。

declare
l_start number;
l_end number;
begin
select ktuxesiz
into l_start
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39; ———这里根据实际数字来填写
dbms_lock.sleep(60); ———单位是秒,建议适当增大一些,评估出的时间更准确。
select ktuxesiz
into l_end
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39; ———这里根据实际数字来填写
dbms_output.put_line(‘time cost Day:’ ||
round(l_end / (l_start – l_end) / 60, 2));
end;
/
(l_start – l_end) –>dbms_lock.sleep(60);指定的60秒间隔恢复的块数
l_end –>还需要恢复的块数
l_end / (l_start – l_end) /得出所需恢复的分钟数;
——————————————得出所需小时
declare
l_start number;
l_end number;
begin
select ktuxesiz
into l_start
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39;
dbms_lock.sleep(60);
select ktuxesiz
into l_end
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39;
dbms_output.put_line(‘time cost Day:’ ||
round(l_end / (l_start – l_end)/60, 2));
end;
/


5.查询并行恢复时Slaves进程数量
select * from v$fast_start_servers;
STATE UNDOBLOCKSDONE PID XID
———– ————————– ———- —————-
RECOVERING 54 18 001F005C00001BD6
RECOVERING 0 20 001F005C00001BD6

如果只有1个进程STATE 是RECOVERING ,其它都是IDLE; 建议关闭并行回滚–>FAST_START_PARALLEL_ROLLBACK=FALSE。

Oracle大事务回滚的介绍与案例分析

1.kfod-Kernel Files OSM Disk工具介绍

 

KFOD(Kernel Files Osm Disk)用来从操作系统层面模拟disk discovery

 该工具可以从$GRID_HOME/bin目录(或者是10gR2ASM_HOME/bin目录)下找到。

注意:

1).KFOD也会在安装时使用(通过OUIDBCA或者ASMCA),为了执行disk discovery

2).在失败安装的情况下(例如,$GRID_HOME/bin不存在),KFOD能在stage文件夹下找到:/grid/stage/ext/bin/,同时需要设置LD_LIBRARY_PATH/grid/stage/ext/lib

 

需要GI启动时使用?

[grid@luda5 ~]$ kfod  asm_diskstring=’/dev/asmdisk*’

KFOD-00301: Unable to contact Cluster Synchronization Services (CSS). Return code 2 from kgxgncin.

 

参考文档:

ASM tools used by Support : KFOD, KFED, AMDU (文档 ID 1485597.1)

 

 

[grid@luda5 ~]$  kfod di=all group=diskgroup ds=true

——————————————————————————–

 Disk          Size Path                                     Disk Group   User     Group   

================================================================================

   1:       1024 Mb /dev/asm-diskb                           OCR          grid     asmadmin

   2:       4096 Mb /dev/asm-diskc                           DATA         grid     asmadmin

KFOD returned no data

[grid@luda5 ~]$ kfod di=all op=groups

——————————————————————————–

Group          Size          Free Redundancy Name           

================================================================================

   1:       1024 Mb        628 Mb     EXTERN OCR            

   2:       4096 Mb       3376 Mb     EXTERN DATA

[grid@luda5 ~]$ kfod disks=all status=true

——————————————————————————–

 Disk          Size Header    Path                                     User     Group   

================================================================================

   1:       1024 Mb MEMBER    /dev/asm-diskb                           grid     asmadmin

   2:       4096 Mb MEMBER    /dev/asm-diskc                           grid     asmadmin

——————————————————————————–

ORACLE_SID ORACLE_HOME                                                          

================================================================================

     +ASM1 /u01/11.2.0/grid

 

kfod status=TRUE asm_diskstring=’/dev/asmdisk*’ disk=all dscvgroup=TRUE

 

2.kfed-Kernel Files metadata EDitor工具介绍

 

 

KFED可以用来在ASM磁盘组无法mount时分析ASM磁盘头信息或者对磁盘头的修复。

KFED11.1开始安装; 对于旧版本,需要手动构建。

cd $ORACLE_HOME/rdbms/lib

make -f ins_rdbms.mk ikfed

但是它不包括GI软件安装包中,因此在GI安装完成之前不可用。

 

2.1 asm磁盘头的自动备份:

从版本10.2.0.5开始ASM 会对disk header做一个额外的备份。即aun=1 的最后第二个block中备份了一份KFBTYP_DISKHEAD 这个ASM Disk header的作用是当真的KFBTYP_DISKHEAD被意外覆盖或损坏时可以使用Oracle 工具 KFED使用repair选项来修复Disk header

ASM由于各种故障,非常容易导致disk header block损坏,对它的备份和回复显的尤为重要

 

 

ASM磁盘头自动备份的存放的块位置:可用kfed read /dev/asm-diskb blkn=510 验证

始终保存在第2AU的倒数第2个块上(PST)

对于默认的磁盘组,AU Size1M,每个AU中可以存储256个块,块号为0-255

1AU存储256个块,第2AU最后1个块号为255,倒数第2个块号是第2AU的第254个块,也就是整体的第510个块(从第1AU的第1个块往后算起)。

对于AU Size8M的磁盘组,每个AU可以存储2048个块,块号为0-2047。第1AU存储2048个块,第2AU最后1个块号为2047,倒数第2个块号是2046,也就是整体的第4094个块(从第1AU的第1个块往后算起)。

 

2.2 asm磁盘头手动备份的方法:

其它的RAC ASM磁盘头手动备份恢复方式–1/2较有用。

 

1.直接做dd来备份磁盘的前4K,磁盘头信息丢失时,dd回来

备份:dd if=/dev/asm-diskb    of=/home/oracle/asmheader-b.dd   bs=4096 count=1

恢复:dd if=/home/oracle/asmheader-b.dd    of=/dev/asm-diskb

 

2.利用oraclekfed工具来备份,将磁盘头信息转换成文本明文形式记录下来,恢复时使用kfed merge进去

备份:kfed read  /dev/asm-diskb      aunum=0 blknum=0 text=raw1.txt

恢复:kfed write /dev/asm-diskb      aunum=0 blknum=0 text=raw1.txt

 

3.11G asmcmd里面增加了md_backup,md_restore备份指令,但这个只是备份DGmeta结构信息,恢复必须是DG已经mount,然

后导入结构信息,类似于exp/imp,这种方式对于DG损坏是不起作用的

备份:asmcmd md_backup -b /home/oracle/asmheader-b.dd.txt

恢复:asmcmd md_restore -t full -g ocr -i -b /home/oracle/asmheader-b.dd.txt

3.kfed工具使用自动备份恢复损坏的asm磁盘头

LINUX64+11.2.0.4 RAC模拟ASM磁盘磁盘头损坏导致集群无法启动

使用命令:

[grid@luda5 ~]$ kfed read /dev/asm-diskb

[grid@luda5 oracle]$ kfed read /dev/asm-diskb blkn=510

[grid@luda5 ~]$ kfed repair /dev/asm-diskb

 

[grid@luda5 ~]$ dd if=/dev/asm-diskb skip=510 bs=4096 count=1   of=/dev/asm-diskb   bs=4096 count=1

1.模拟存放OCR/voting的磁盘头损坏

 

/dev/asm-diskb是存放OCR/voting的磁盘

[root@luda5 ~]# dd if=/dev/zero of=/dev/asm-diskb bs=4096 count=1   模拟磁盘头损坏

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 0.006142 s, 667 kB/s

[root@luda5 ~]# crsctl start has

CRS-4123: Oracle High Availability Services has been started.

 

[root@luda5 ~]# ps -ef|grep css

root      5642     1  0 09:57 ?        00:00:00 /u01/11.2.0/grid/bin/cssdmonitor

root      5666     1  0 09:57 ?        00:00:00 /u01/11.2.0/grid/bin/cssdagent

grid      5677     1  0 09:57 ?        00:00:00 /u01/11.2.0/grid/bin/ocssd.bin

root      5949  2285  0 09:58 pts/0    00:00:00 grep css

[root@luda5 ~]# su – grid

cd [grid@luda5 ~]$ cd /u01/11.2.0/grid/log/luda5/

 

[grid@luda5 luda5]$ tail alertluda5.log

2017-04-13 09:57:24.948:

[cssd(5677)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/11.2.0/grid/log/luda5/cssd/ocssd.log

2017-04-13 09:57:40.000:

[cssd(5677)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/11.2.0/grid/log/luda5/cssd/ocssd.log

2017-04-13 09:57:55.052:

[

 

[grid@luda5 cssd]$ gpnptool get

Warning: some command line parameters were defaulted. Resulting command line:

         /u01/11.2.0/grid/bin/gpnptool.bin get -o-

 

<?xml version=”1.0″ encoding=”UTF-8″?><gpnp:GPnP-Profile Version=”1.0″ xmlns=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:gpnp=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:orcl=”http://www.oracle.com/gpnp/2005/11/gpnp-profile” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd” ProfileSequence=”4″ ClusterUId=”0a9dcd2199ec4faaff5f633c29682816″ ClusterName=”luda-cluster1″ PALocation=””><gpnp:Network-Profile><gpnp:HostNetwork id=”gen” HostName=”*”><gpnp:Network id=”net1″ IP=”192.168.57.0″ Adapter=”eth0″ Use=”public”/><gpnp:Network id=”net2″ IP=”192.168.56.0″ Adapter=”eth1″ Use=”cluster_interconnect”/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/><orcl:ASM-Profile id=”asm” DiscoveryString=”/dev/asm-disk*” SPFile=”+OCR/luda-cluster1/asmparameterfile/registry.253.939414867″/><ds:Signature xmlns:ds=”http://www.w3.org/2000/09/xmldsig#”><ds:SignedInfo><ds:CanonicalizationMethod Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#”/><ds:SignatureMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#rsa-sha1″/><ds:Reference URI=””><ds:Transforms><ds:Transform Algorithm=”http://www.w3.org/2000/09/xmldsig#enveloped-signature”/><ds:Transform Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#”> <InclusiveNamespaces xmlns=”http://www.w3.org/2001/10/xml-exc-c14n#” PrefixList=”gpnp orcl xsi”/></ds:Transform></ds:Transforms><ds:DigestMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#sha1″/><ds:DigestValue>nrC6yv5F/a9GJYQH/+LQMk0vN88=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>gad/sRBw75TF/RMoZfUG7owOxxSndeUjdm4gPjBir42T0RR8qsQowSKPeJEp+ouBjAeVIrI+RyCb8roy8/28hrsQg/xJsbNoZGR6EFK8ZBZsos85kaIufwoTakiWCBPerDwSH6J2vZE9XeZqhYGwffSyF2WDKX7aafJhK+5qh/Y=</ds:SignatureValue></ds:Signature></gpnp:GPnP-Profile>

Success.

 

########################################################

########################################################

2.集群未启动问题排查:

kfed读取/dev/asm-diskb磁盘头,发现已经无法正常读取。

[grid@luda5 ~]$ kfed read /dev/asm-diskb

kfbh.endian:                          0 ; 0x000: 0x00

kfbh.hard:                            0 ; 0x001: 0x00

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:                       0 ; 0x008: file=0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

7F3310D66400 00000000 00000000 00000000 00000000  […………….]

  Repeat 255 times

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

 

—–>>关闭HAS

[root@luda5 ~]# crsctl stop has

CRS-2796: The command may not proceed when Cluster Ready Services is not running

CRS-4687: Shutdown command has completed with errors.

CRS-4000: Command Stop failed, or completed with errors.

[root@luda5 ~]# ps -ef|grep has

root      1165     1  0 09:49 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run

root      5455     1  0 09:56 ?        00:00:03 /u01/11.2.0/grid/bin/ohasd.bin reboot

root      7020  2285  0 10:02 pts/0    00:00:00 grep has

[root@luda5 ~]# kill -9 5455

[root@luda5 ~]# ps -ef|grep css

root      5642     1  0 09:57 ?        00:00:00 /u01/11.2.0/grid/bin/cssdmonitor

root      5666     1  0 09:57 ?        00:00:00 /u01/11.2.0/grid/bin/cssdagent

grid      5677     1  0 09:57 ?        00:00:01 /u01/11.2.0/grid/bin/ocssd.bin

root      7094  2285  0 10:02 pts/0    00:00:00 grep css

[root@luda5 ~]# kill -9 5677

 

[root@luda5 ~]# ps -ef|grep has

root      1165     1  0 09:49 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run

root      7160  2285  0 10:02 pts/0    00:00:00 grep has

 

 

3.asm磁盘头损坏用自动备份来恢复方法:

验证ASM磁盘头的自动备份是正常

首先使用如下命令验证ASM磁盘头的自动备份是正常的:–1M AUSIZE时,在第510数据块

ASM Disk Header Copy所处的位置:aun=1 的最后第二个block中备份了一份KFBTYP_DISKHEAD

即存放在PST的第blkn=254(AUSIZE=1M)blkn=1022(AUSIZE=4M),注意ASM一般从0开始计数

[grid@luda5 oracle]$ kfed read /dev/asm-diskb blkn=510

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                     254 ; 0x004: blk=254

kfbh.block.obj:              2147483648 ; 0x008: disk=0

kfbh.check:                  2208422217 ; 0x00c: 0x83a1d949

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000

kfdhdb.compat:                186646528 ; 0x020: 0x0b200000

kfdhdb.dsknum:                        0 ; 0x024: 0x0000

kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname:                OCR_0000 ; 0x028: length=8

kfdhdb.grpname:                     OCR ; 0x048: length=3

kfdhdb.fgname:                 OCR_0000 ; 0x068: length=8

kfdhdb.capname:                         ; 0x088: length=0

kfdhdb.crestmp.hi:             33050356 ; 0x0a8: HOUR=0x14 DAYS=0x17 MNTH=0x3 YEAR=0x7e1

kfdhdb.crestmp.lo:           2295894016 ; 0x0ac: USEC=0x0 MSEC=0x224 SECS=0xd MINS=0x22

kfdhdb.mntstmp.hi:             33051050 ; 0x0b0: HOUR=0xa DAYS=0xd MNTH=0x4 YEAR=0x7e1

kfdhdb.mntstmp.lo:            372738048 ; 0x0b4: USEC=0x0 MSEC=0x1e2 SECS=0x23 MINS=0x5

kfdhdb.secsize:                     512 ; 0x0b8: 0x0200

kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000       ————->>块大小

kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000   ————->>AUSIZE大小

kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80

kfdhdb.dsksize:                    1024 ; 0x0c4: 0x00000400

kfdhdb.pmcnt:                         2 ; 0x0c8: 0x00000002

kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001

kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002

kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002

kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000

kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000

kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000

kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000

kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000

kfdhdb.grpstmp.hi:             33050356 ; 0x0e4: HOUR=0x14 DAYS=0x17 MNTH=0x3 YEAR=0x7e1

kfdhdb.grpstmp.lo:           2295721984 ; 0x0e8: USEC=0x0 MSEC=0x17c SECS=0xd MINS=0x22

kfdhdb.vfstart:                     352 ; 0x0ec: 0x00000160

kfdhdb.vfend:                       384 ; 0x0f0: 0x00000180

kfdhdb.spfile:                       58 ; 0x0f4: 0x0000003a

kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001

kfdhdb.ub4spare[0]:                   0 ; 0x0fc: 0x00000000

kfdhdb.ub4spare[1]:                   0 ; 0x100: 0x00000000

kfdhdb.ub4spare[2]:                   0 ; 0x104: 0x00000000

kfdhdb.ub4spare[3]:                   0 ; 0x108: 0x00000000

kfdhdb.ub4spare[4]:                   0 ; 0x10c: 0x00000000

kfdhdb.ub4spare[5]:                   0 ; 0x110: 0x00000000

kfdhdb.ub4spare[6]:                   0 ; 0x114: 0x00000000

kfdhdb.ub4spare[7]:                   0 ; 0x118: 0x00000000

kfdhdb.ub4spare[8]:                   0 ; 0x11c: 0x00000000

kfdhdb.ub4spare[9]:                   0 ; 0x120: 0x00000000

kfdhdb.ub4spare[10]:                  0 ; 0x124: 0x00000000

kfdhdb.ub4spare[11]:                  0 ; 0x128: 0x00000000

kfdhdb.ub4spare[12]:                  0 ; 0x12c: 0x00000000

kfdhdb.ub4spare[13]:                  0 ; 0x130: 0x00000000

kfdhdb.ub4spare[14]:                  0 ; 0x134: 0x00000000

kfdhdb.ub4spare[15]:                  0 ; 0x138: 0x00000000

kfdhdb.ub4spare[16]:                  0 ; 0x13c: 0x00000000

kfdhdb.ub4spare[17]:                  0 ; 0x140: 0x00000000

kfdhdb.ub4spare[18]:                  0 ; 0x144: 0x00000000

kfdhdb.ub4spare[19]:                  0 ; 0x148: 0x00000000

kfdhdb.ub4spare[20]:                  0 ; 0x14c: 0x00000000

kfdhdb.ub4spare[21]:                  0 ; 0x150: 0x00000000

kfdhdb.ub4spare[22]:                  0 ; 0x154: 0x00000000

kfdhdb.ub4spare[23]:                  0 ; 0x158: 0x00000000

kfdhdb.ub4spare[24]:                  0 ; 0x15c: 0x00000000

kfdhdb.ub4spare[25]:                  0 ; 0x160: 0x00000000

kfdhdb.ub4spare[26]:                  0 ; 0x164: 0x00000000

kfdhdb.ub4spare[27]:                  0 ; 0x168: 0x00000000

kfdhdb.ub4spare[28]:                  0 ; 0x16c: 0x00000000

kfdhdb.ub4spare[29]:                  0 ; 0x170: 0x00000000

kfdhdb.ub4spare[30]:                  0 ; 0x174: 0x00000000

kfdhdb.ub4spare[31]:                  0 ; 0x178: 0x00000000

kfdhdb.ub4spare[32]:                  0 ; 0x17c: 0x00000000

kfdhdb.ub4spare[33]:                  0 ; 0x180: 0x00000000

kfdhdb.ub4spare[34]:                  0 ; 0x184: 0x00000000

kfdhdb.ub4spare[35]:                  0 ; 0x188: 0x00000000

kfdhdb.ub4spare[36]:                  0 ; 0x18c: 0x00000000

kfdhdb.ub4spare[37]:                  0 ; 0x190: 0x00000000

kfdhdb.ub4spare[38]:                  0 ; 0x194: 0x00000000

kfdhdb.ub4spare[39]:                  0 ; 0x198: 0x00000000

kfdhdb.ub4spare[40]:                  0 ; 0x19c: 0x00000000

kfdhdb.ub4spare[41]:                  0 ; 0x1a0: 0x00000000

kfdhdb.ub4spare[42]:                  0 ; 0x1a4: 0x00000000

kfdhdb.ub4spare[43]:                  0 ; 0x1a8: 0x00000000

kfdhdb.ub4spare[44]:                  0 ; 0x1ac: 0x00000000

kfdhdb.ub4spare[45]:                  0 ; 0x1b0: 0x00000000

kfdhdb.ub4spare[46]:                  0 ; 0x1b4: 0x00000000

kfdhdb.ub4spare[47]:                  0 ; 0x1b8: 0x00000000

kfdhdb.ub4spare[48]:                  0 ; 0x1bc: 0x00000000

kfdhdb.ub4spare[49]:                  0 ; 0x1c0: 0x00000000

kfdhdb.ub4spare[50]:                  0 ; 0x1c4: 0x00000000

kfdhdb.ub4spare[51]:                  0 ; 0x1c8: 0x00000000

kfdhdb.ub4spare[52]:                  0 ; 0x1cc: 0x00000000

kfdhdb.ub4spare[53]:                  0 ; 0x1d0: 0x00000000

kfdhdb.acdb.aba.seq:                  0 ; 0x1d4: 0x00000000

kfdhdb.acdb.aba.blk:                  0 ; 0x1d8: 0x00000000

kfdhdb.acdb.ents:                     0 ; 0x1dc: 0x0000

kfdhdb.acdb.ub2spare:                 0 ; 0x1de: 0x0000

[root@luda5 ~]# su – grid

[grid@luda5 ~]$ kfed repair -help

as/mlib         ASM Library [asmlib=’lib’]

aun/um          AU number to examine or update [AUNUM=number]

aus/z           Allocation Unit size in bytes [AUSZ=number]

blkn/um         Block number to examine or update [BLKNUM=number]

blks/z          Metadata block size in bytes [BLKSZ=number]

ch/ksum         Update checksum before each write [CHKSUM=YES/NO]

cn/t            Count of AUs to process [CNT=number]

de/v            ASM device to examine or update [DEV=string]

dm/pall         Don’t suppress repeated lines when dumping corrupt blocks [DMPALL=YES/NO]

o/p             KFED operation type [OP=READ/WRITE/MERGE/REPAIR/NEW/FORM/FIND/STRUCT]

p/rovnm         Name for provisioning purposes [PROVNM=string]

s/eek           AU number to seek to [SEEK=number]

te/xt           File name for translated block text [TEXT=string]

ty/pe           ASM metadata block type number [TYPE=number]

 

kfed修复磁盘头损坏命令:

[grid@luda5 ~]$ kfed repair /dev/asm-diskb

[grid@luda5 ~]$ kfed read /dev/asm-diskb

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:              2147483648 ; 0x008: disk=0

kfbh.check:                  1504642484 ; 0x00c: 0x59af05b4

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000

kfdhdb.compat:                186646528 ; 0x020: 0x0b200000

kfdhdb.dsknum:                        0 ; 0x024: 0x0000

kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname:                OCR_0000 ; 0x028: length=8

kfdhdb.grpname:                     OCR ; 0x048: length=3

kfdhdb.fgname:                 OCR_0000 ; 0x068: length=8

kfdhdb.capname:                         ; 0x088: length=0

kfdhdb.crestmp.hi:             33050356 ; 0x0a8: HOUR=0x14 DAYS=0x17 MNTH=0x3 YEAR=0x7e1

kfdhdb.crestmp.lo:           2295894016 ; 0x0ac: USEC=0x0 MSEC=0x224 SECS=0xd MINS=0x22

kfdhdb.mntstmp.hi:             33051049 ; 0x0b0: HOUR=0x9 DAYS=0xd MNTH=0x4 YEAR=0x7e1

kfdhdb.mntstmp.lo:           3426309120 ; 0x0b4: USEC=0x0 MSEC=0x255 SECS=0x3 MINS=0x33

kfdhdb.secsize:                     512 ; 0x0b8: 0x0200

kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000

kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000

kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80

kfdhdb.dsksize:                    1024 ; 0x0c4: 0x00000400

kfdhdb.pmcnt:                         2 ; 0x0c8: 0x00000002

kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001

kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002

kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002

kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000

kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000

kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000

kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000

kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000

kfdhdb.grpstmp.hi:             33050356 ; 0x0e4: HOUR=0x14 DAYS=0x17 MNTH=0x3 YEAR=0x7e1

kfdhdb.grpstmp.lo:           2295721984 ; 0x0e8: USEC=0x0 MSEC=0x17c SECS=0xd MINS=0x22

kfdhdb.vfstart:                     352 ; 0x0ec: 0x00000160

kfdhdb.vfend:                       384 ; 0x0f0: 0x00000180

kfdhdb.spfile:                       58 ; 0x0f4: 0x0000003a

kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001

kfdhdb.ub4spare[0]:                   0 ; 0x0fc: 0x00000000

kfdhdb.ub4spare[1]:                   0 ; 0x100: 0x00000000

kfdhdb.ub4spare[2]:                   0 ; 0x104: 0x00000000

kfdhdb.ub4spare[3]:                   0 ; 0x108: 0x00000000

kfdhdb.ub4spare[4]:                   0 ; 0x10c: 0x00000000

kfdhdb.ub4spare[5]:                   0 ; 0x110: 0x00000000

kfdhdb.ub4spare[6]:                   0 ; 0x114: 0x00000000

kfdhdb.ub4spare[7]:                   0 ; 0x118: 0x00000000

kfdhdb.ub4spare[8]:                   0 ; 0x11c: 0x00000000

kfdhdb.ub4spare[9]:                   0 ; 0x120: 0x00000000

kfdhdb.ub4spare[10]:                  0 ; 0x124: 0x00000000

kfdhdb.ub4spare[11]:                  0 ; 0x128: 0x00000000

kfdhdb.ub4spare[12]:                  0 ; 0x12c: 0x00000000

kfdhdb.ub4spare[13]:                  0 ; 0x130: 0x00000000

kfdhdb.ub4spare[14]:                  0 ; 0x134: 0x00000000

kfdhdb.ub4spare[15]:                  0 ; 0x138: 0x00000000

kfdhdb.ub4spare[16]:                  0 ; 0x13c: 0x00000000

kfdhdb.ub4spare[17]:                  0 ; 0x140: 0x00000000

kfdhdb.ub4spare[18]:                  0 ; 0x144: 0x00000000

kfdhdb.ub4spare[19]:                  0 ; 0x148: 0x00000000

kfdhdb.ub4spare[20]:                  0 ; 0x14c: 0x00000000

kfdhdb.ub4spare[21]:                  0 ; 0x150: 0x00000000

kfdhdb.ub4spare[22]:                  0 ; 0x154: 0x00000000

kfdhdb.ub4spare[23]:                  0 ; 0x158: 0x00000000

kfdhdb.ub4spare[24]:                  0 ; 0x15c: 0x00000000

kfdhdb.ub4spare[25]:                  0 ; 0x160: 0x00000000

kfdhdb.ub4spare[26]:                  0 ; 0x164: 0x00000000

kfdhdb.ub4spare[27]:                  0 ; 0x168: 0x00000000

kfdhdb.ub4spare[28]:                  0 ; 0x16c: 0x00000000

kfdhdb.ub4spare[29]:                  0 ; 0x170: 0x00000000

kfdhdb.ub4spare[30]:                  0 ; 0x174: 0x00000000

kfdhdb.ub4spare[31]:                  0 ; 0x178: 0x00000000

kfdhdb.ub4spare[32]:                  0 ; 0x17c: 0x00000000

kfdhdb.ub4spare[33]:                  0 ; 0x180: 0x00000000

kfdhdb.ub4spare[34]:                  0 ; 0x184: 0x00000000

kfdhdb.ub4spare[35]:                  0 ; 0x188: 0x00000000

kfdhdb.ub4spare[36]:                  0 ; 0x18c: 0x00000000

kfdhdb.ub4spare[37]:                  0 ; 0x190: 0x00000000

kfdhdb.ub4spare[38]:                  0 ; 0x194: 0x00000000

kfdhdb.ub4spare[39]:                  0 ; 0x198: 0x00000000

kfdhdb.ub4spare[40]:                  0 ; 0x19c: 0x00000000

kfdhdb.ub4spare[41]:                  0 ; 0x1a0: 0x00000000

kfdhdb.ub4spare[42]:                  0 ; 0x1a4: 0x00000000

kfdhdb.ub4spare[43]:                  0 ; 0x1a8: 0x00000000

kfdhdb.ub4spare[44]:                  0 ; 0x1ac: 0x00000000

kfdhdb.ub4spare[45]:                  0 ; 0x1b0: 0x00000000

kfdhdb.ub4spare[46]:                  0 ; 0x1b4: 0x00000000

kfdhdb.ub4spare[47]:                  0 ; 0x1b8: 0x00000000

kfdhdb.ub4spare[48]:                  0 ; 0x1bc: 0x00000000

kfdhdb.ub4spare[49]:                  0 ; 0x1c0: 0x00000000

kfdhdb.ub4spare[50]:                  0 ; 0x1c4: 0x00000000

kfdhdb.ub4spare[51]:                  0 ; 0x1c8: 0x00000000

kfdhdb.ub4spare[52]:                  0 ; 0x1cc: 0x00000000

kfdhdb.ub4spare[53]:                  0 ; 0x1d0: 0x00000000

kfdhdb.acdb.aba.seq:                  0 ; 0x1d4: 0x00000000

kfdhdb.acdb.aba.blk:                  0 ; 0x1d8: 0x00000000

kfdhdb.acdb.ents:                     0 ; 0x1dc: 0x0000

kfdhdb.acdb.ub2spare:                 0 ; 0x1de: 0x0000

 

[grid@luda5 ~]$ exit

logout

[root@luda5 ~]# crsctl start has

CRS-4123: Oracle High Availability Services has been started.

验证集群可以正常启动

 [grid@luda5 luda5]$ crsctl stat res -t

——————————————————————————–

NAME           TARGET  STATE        SERVER                   STATE_DETAILS       

——————————————————————————–

Local Resources

——————————————————————————–

ora.DATA.dg

               ONLINE  ONLINE       luda5                                         

ora.LISTENER.lsnr

               ONLINE  ONLINE       luda5                                         

ora.OCR.dg

               ONLINE  ONLINE       luda5                                         

ora.asm

               ONLINE  ONLINE       luda5                     Started             

ora.gsd

               OFFLINE OFFLINE      luda5                                         

ora.net1.network

               ONLINE  ONLINE       luda5                                         

ora.ons

               ONLINE  ONLINE       luda5                                         

——————————————————————————–

Cluster Resources

——————————————————————————–

ora.LISTENER_SCAN1.lsnr

      1        ONLINE  ONLINE       luda5                                         

ora.luda5.vip

      1        ONLINE  ONLINE       luda5                                         

ora.luda6.vip

      1        ONLINE  INTERMEDIATE luda5                     FAILED OVER         

ora.ludarac.db

      1        ONLINE  ONLINE       luda5                     Open                

      2        ONLINE  OFFLINE                                                   

ora.cvu

      1        ONLINE  ONLINE       luda5                                         

ora.oc4j

      1        ONLINE  ONLINE       luda5                                         

ora.scan1.vip

      1        ONLINE  ONLINE       luda5          

Oracle kfod及kfed的介绍及修复损坏的asm磁盘头

1.amdu-ASM Metadata Dump Utility工具介绍

 

AMDU是从一个或多个ASM磁盘中提取所有可用元数据或生成单个块的格式化打印输出的实用程序,不需要asm磁盘组mount。

Oracle Support可能会要求您提供AMDU输出,特别是在面临与ASM元数据相关的内部错误时.

注意:AMDU程序从11.1开始安装,在10g也可以使用,需要单独下载:Note 553639.1 – Placeholder for AMDU binaries and using with ASM 10g

 

AMDU具体以下三个主要功能:

  1. 将ASM DISK上的元数据转储到文件系统上以便分析
  2. 将ASM文件的内容抽取出来并写入到OS文件系统,Diskgroup是否mount均可
  3. 打印出块的元数据,以块中C语言结构或16进制的形式

 

参考文档:

ASM tools used by Support : KFOD, KFED, AMDU (文档 ID 1485597.1)

Note 553639.1 – Placeholder for AMDU binaries and using with ASM 10g

https://blogs.oracle.com/askmaclean/entry/使用amdu工具从无法mount的diskgroup中抽取数据文件

 

2.amdu工具抽取asm磁盘元数据信息

 

找出ASM磁盘对应的ASM磁盘组名称的方法:

kfed read /dev/asm-diskb |grep name

2.1抽取指定磁盘组的元数据信息:

[grid@luda5 ~]$ amdu -diskstring ‘/dev/asm-disk*’  -dump  ‘DATA’

amdu_2017_04_13_16_02_22/

AMDU-00204: Disk N0002 is in currently mounted diskgroup DATA

AMDU-00201: Disk N0002: ‘/dev/asm-diskc’

[grid@luda5 ~]$ ls

amdu_2017_04_13_16_02_22  ocm_temp.rsp  oradiag_grid  PatchSearch.xml

[grid@luda5 ~]$ cd amdu_2017_04_13_16_02_22/

[grid@luda5 amdu_2017_04_13_16_02_22]$ ls

DATA_0001.img  DATA.map  report.txt

[grid@luda5 amdu_2017_04_13_16_02_22]$ du -sh ./*

95M     ./DATA_0001.img

8.0K    ./DATA.map

4.0K    ./report.txt

 

Amdu命令的输出信息:

每次执行AMDU时都会在当前位置创建一个目录–格式amdu_YYYY_MM_DD_HH24_MM_SS,使用-directory参数可自定义此目录的信息。

默认在目录中将生成以下文件:

<diskgroup> _0001.img – 磁盘组的内容的转储;大小限制为2GB,但可以有多个

<diskgroup> .map – 可用于查找磁盘上ASM元数据的确切位置

report.txt – 包括有关扫描的磁盘的详细信息

2.2 查看amdu的report

[grid@luda5 amdu_2017_04_13_16_02_22]$ cat report.txt

-*-amdu-*-

 

******************************* AMDU Settings ********************************

ORACLE_HOME = /u01/11.2.0/grid

System name:    Linux

Node name:      luda5.luda.com

Release:        3.8.13-16.2.1.el6uek.x86_64

Version:        #1 SMP Thu Nov 7 17:01:44 PST 2013

Machine:        x86_64

amdu run:       13-APR-17 16:02:22

Endianess:      1

=====》》当前主机信息

——————————— Operations ———————————

-dump DATA

=====》》执行的操作

——————————- Disk Selection ——————————-

-diskstring ‘/dev/asm-disk*’

=====》》使用的ASM DISK信息

—————————— Reading Control ——————————-

 

——————————- Output Control ——————————-

 

********************************* DISCOVERY **********************************

 

—————————– DISK REPORT N0001 ——————————

Disk Path: /dev/asm-diskb

Unique Disk ID:

Disk Label:

Physical Sector Size: 512 bytes

Disk Size: 1024 megabytes

** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blk_kfbl **

=====》》这里报磁盘头损坏,是另一个OCR磁盘组的盘。

—————————– DISK REPORT N0002 ——————————

Disk Path: /dev/asm-diskc   =====》》asm_disk信息

Unique Disk ID:

Disk Label:

Physical Sector Size: 512 bytes   =====》》物理扇区大小

Disk Size: 4096 megabytes  =====》》块大小

Group Name: DATA  =====》》磁盘组名

Disk Name: DATA_0000

Failure Group Name: DATA_0000

Disk Number: 0

Header Status: 3

Disk Creation Time: 2017/03/23 20:51:44.637000

Last Mount Time: 2017/04/13 14:56:19.845000

Compatibility Version: 0x0b200000(11020000)   =====》》版本信息

Disk Sector Size: 512 bytes

Disk size in AUs: 4096 Aus

Group Redundancy: 1

Metadata Block Size: 4096 bytes

AU Size: 1048576 bytes  =====》》AUSIZE是1M

Stride: 113792 AUs

Group Creation Time: 2017/03/23 20:51:44.428000

File 1 Block 1 location: AU 2

OCR Present: NO  =====》》非OCR盘

 

***************** Slept for 6 seconds waiting for heartbeats *****************

 

************************** SCANNING DISKGROUP DATA ***************************

Creation Time: 2017/03/23 20:51:44.428000

Disks Discovered: 1

Redundancy: 1

AU Size: 1048576 bytes

Metadata Block Size: 4096 bytes

Physical Sector Size: 512 bytes

Metadata Stride: 113792 AU

Duplicate Disk Numbers: 0

 

 

—————————- SCANNING DISK N0002 —————————–

Disk N0002: ‘/dev/asm-diskc’

AMDU-00204: Disk N0002 is in currently mounted diskgroup DATA

AMDU-00201: Disk N0002: ‘/dev/asm-diskc’

** HEARTBEAT DETECTED **

Allocated AU’s: 720

Free AU’s: 3376

AU’s read for dump: 102

Block images saved: 24070

Map lines written: 102

Heartbeats seen: 1

Corrupt metadata blocks: 0

Corrupt AT blocks: 0

 

 

————————- SUMMARY FOR DISKGROUP DATA ————————-

Allocated AU’s: 720

Free AU’s: 3376

AU’s read for dump: 102

Block images saved: 24070

Map lines written: 102

Heartbeats seen: 1

Corrupt metadata blocks: 0

Corrupt AT blocks: 0

 

 

******************************* END OF REPORT ********************************

3.amdu工具抽取无法正常MOUNT的asm disk中数据文件来OPEN数据库

在linux64+11.2.0.4rac环境,模拟ASM磁盘无法正常mount时,通过amdu工具抽取asm disk中的数据文件,并异机恢复、open数据库。

 

1.关闭HAS集群件、数据库;

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘luda5’ has completed

CRS-4133: Oracle High Availability Services has been stopped.

[root@luda5 ~]#

1.1模拟ASM磁盘头损坏,此时amdu工具是无法读出磁盘信息的。

[grid@luda5 ~]$ dd if=/dev/zero of=/dev/asm-diskc bs=4096 count=1

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 0.00422028 s, 971 kB/s

 

[grid@luda5 ~]$ kfed read /dev/asm-diskc

kfbh.endian:                          0 ; 0x000: 0x00

kfbh.hard:                            0 ; 0x001: 0x00

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:                       0 ; 0x008: file=0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

7F0F828DA400 00000000 00000000 00000000 00000000  […………….]

Repeat 255 times

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

 

[grid@luda5 ~]$ amdu -diskstring ‘/dev/asm-diskc’ -extract ‘DATA.261’

amdu_2017_04_13_13_31_05/

AMDU-00210: No disks found in diskgroup DATA

AMDU-00210: No disks found in diskgroup DATA

 

 

1.2修复ASM磁盘头损坏

[grid@luda5 ~]$ kfed repair /dev/asm-diskc

[grid@luda5 ~]$ kfed read /dev/asm-diskc

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:              2147483648 ; 0x008: disk=0

kfbh.check:                  1931960167 ; 0x00c: 0x73275f67

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000

kfdhdb.compat:                186646528 ; 0x020: 0x0b200000

kfdhdb.dsknum:                        0 ; 0x024: 0x0000

kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname:               DATA_0000 ; 0x028: length=9

kfdhdb.grpname:                    DATA ; 0x048: length=4

kfdhdb.fgname:                DATA_0000 ; 0x068: length=9

kfdhdb.capname:                         ; 0x088: length=0

kfdhdb.crestmp.hi:             33050356 ; 0x0a8: HOUR=0x14 DAYS=0x17 MNTH=0x3 YEAR=0x7e1

kfdhdb.crestmp.lo:           3469341696 ; 0x0ac: USEC=0x0 MSEC=0x27d SECS=0x2c MINS=0x33

kfdhdb.mntstmp.hi:             33051050 ; 0x0b0: HOUR=0xa DAYS=0xd MNTH=0x4 YEAR=0x7e1

kfdhdb.mntstmp.lo:           2813672448 ; 0x0b4: USEC=0x0 MSEC=0x14f SECS=0x3b MINS=0x29

kfdhdb.secsize:                     512 ; 0x0b8: 0x0200

kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000

kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000

kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80

kfdhdb.dsksize:                    4096 ; 0x0c4: 0x00001000

kfdhdb.pmcnt:                         2 ; 0x0c8: 0x00000002

kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001

kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002

kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002

kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000

kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000

kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000

kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000

kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000

kfdhdb.grpstmp.hi:             33050356 ; 0x0e4: HOUR=0x14 DAYS=0x17 MNTH=0x3 YEAR=0x7e1

kfdhdb.grpstmp.lo:           3469127680 ; 0x0e8: USEC=0x0 MSEC=0x1ac SECS=0x2c MINS=0x33

kfdhdb.vfstart:                       0 ; 0x0ec: 0x00000000

kfdhdb.vfend:                         0 ; 0x0f0: 0x00000000

kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000

kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000

kfdhdb.ub4spare[0]:                   0 ; 0x0fc: 0x00000000

kfdhdb.ub4spare[1]:                   0 ; 0x100: 0x00000000

kfdhdb.ub4spare[2]:                   0 ; 0x104: 0x00000000

kfdhdb.ub4spare[3]:                   0 ; 0x108: 0x00000000

kfdhdb.ub4spare[4]:                   0 ; 0x10c: 0x00000000

kfdhdb.ub4spare[5]:                   0 ; 0x110: 0x00000000

kfdhdb.ub4spare[6]:                   0 ; 0x114: 0x00000000

kfdhdb.ub4spare[7]:                   0 ; 0x118: 0x00000000

kfdhdb.ub4spare[8]:                   0 ; 0x11c: 0x00000000

kfdhdb.ub4spare[9]:                   0 ; 0x120: 0x00000000

kfdhdb.ub4spare[10]:                  0 ; 0x124: 0x00000000

kfdhdb.ub4spare[11]:                  0 ; 0x128: 0x00000000

kfdhdb.ub4spare[12]:                  0 ; 0x12c: 0x00000000

kfdhdb.ub4spare[13]:                  0 ; 0x130: 0x00000000

kfdhdb.ub4spare[14]:                  0 ; 0x134: 0x00000000

kfdhdb.ub4spare[15]:                  0 ; 0x138: 0x00000000

kfdhdb.ub4spare[16]:                  0 ; 0x13c: 0x00000000

kfdhdb.ub4spare[17]:                  0 ; 0x140: 0x00000000

kfdhdb.ub4spare[18]:                  0 ; 0x144: 0x00000000

kfdhdb.ub4spare[19]:                  0 ; 0x148: 0x00000000

kfdhdb.ub4spare[20]:                  0 ; 0x14c: 0x00000000

kfdhdb.ub4spare[21]:                  0 ; 0x150: 0x00000000

kfdhdb.ub4spare[22]:                  0 ; 0x154: 0x00000000

kfdhdb.ub4spare[23]:                  0 ; 0x158: 0x00000000

kfdhdb.ub4spare[24]:                  0 ; 0x15c: 0x00000000

kfdhdb.ub4spare[25]:                  0 ; 0x160: 0x00000000

kfdhdb.ub4spare[26]:                  0 ; 0x164: 0x00000000

kfdhdb.ub4spare[27]:                  0 ; 0x168: 0x00000000

kfdhdb.ub4spare[28]:                  0 ; 0x16c: 0x00000000

kfdhdb.ub4spare[29]:                  0 ; 0x170: 0x00000000

kfdhdb.ub4spare[30]:                  0 ; 0x174: 0x00000000

kfdhdb.ub4spare[31]:                  0 ; 0x178: 0x00000000

kfdhdb.ub4spare[32]:                  0 ; 0x17c: 0x00000000

kfdhdb.ub4spare[33]:                  0 ; 0x180: 0x00000000

kfdhdb.ub4spare[34]:                  0 ; 0x184: 0x00000000

kfdhdb.ub4spare[35]:                  0 ; 0x188: 0x00000000

kfdhdb.ub4spare[36]:                  0 ; 0x18c: 0x00000000

kfdhdb.ub4spare[37]:                  0 ; 0x190: 0x00000000

kfdhdb.ub4spare[38]:                  0 ; 0x194: 0x00000000

kfdhdb.ub4spare[39]:                  0 ; 0x198: 0x00000000

kfdhdb.ub4spare[40]:                  0 ; 0x19c: 0x00000000

kfdhdb.ub4spare[41]:                  0 ; 0x1a0: 0x00000000

kfdhdb.ub4spare[42]:                  0 ; 0x1a4: 0x00000000

kfdhdb.ub4spare[43]:                  0 ; 0x1a8: 0x00000000

kfdhdb.ub4spare[44]:                  0 ; 0x1ac: 0x00000000

kfdhdb.ub4spare[45]:                  0 ; 0x1b0: 0x00000000

kfdhdb.ub4spare[46]:                  0 ; 0x1b4: 0x00000000

kfdhdb.ub4spare[47]:                  0 ; 0x1b8: 0x00000000

kfdhdb.ub4spare[48]:                  0 ; 0x1bc: 0x00000000

kfdhdb.ub4spare[49]:                  0 ; 0x1c0: 0x00000000

kfdhdb.ub4spare[50]:                  0 ; 0x1c4: 0x00000000

kfdhdb.ub4spare[51]:                  0 ; 0x1c8: 0x00000000

kfdhdb.ub4spare[52]:                  0 ; 0x1cc: 0x00000000

kfdhdb.ub4spare[53]:                  0 ; 0x1d0: 0x00000000

kfdhdb.acdb.aba.seq:                  0 ; 0x1d4: 0x00000000

kfdhdb.acdb.aba.blk:                  0 ; 0x1d8: 0x00000000

kfdhdb.acdb.ents:                     0 ; 0x1dc: 0x0000

kfdhdb.acdb.ub2spare:                 0 ; 0x1de: 0x0000

########################

2.恢复数据库控制文件并找出数据文件信息

修复磁盘头后,通过指定ASM磁盘位置信息,使用amdu工具读出磁盘上指定文件编号的控制文件

 

2.1找出控制文件信息

首先从ALERT日志中找到数据库之前启动时的参数文件信息:

Starting up:

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production

With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,

Data Mining, Oracle Database Vault and Real Application Testing options.

ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1

System name:    Linux

Node name:      luda5.luda.com

Release:        3.8.13-16.2.1.el6uek.x86_64

Version:        #1 SMP Thu Nov 7 17:01:44 PST 2013

Machine:        x86_64

Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_1/dbs/initludarac1.ora

System parameters with non-default values:

processes                = 300

sessions                 = 472

spfile                   = “+DATA/ludarac/spfileludarac.ora”

sga_target               = 600M

control_files            = “+DATA/ludarac/controlfile/current.261.939458967”

control_files            = “+DATA/ludarac/controlfile/current.260.939458967”

 

2.2 amdu恢复控制文件

从上面可以发现控制文件做了镜像,有两份。

这里来恢复261号文件,命令如下:

注意:-diskstring ‘/dev/asm-diskc’参数,可以写为-diskstring ‘/dev/asm-disk*等,即asm_diskstring 参数.

[grid@luda5 ~]$ amdu -diskstring ‘/dev/asm-diskc’ -extract ‘DATA.261’

amdu_2017_04_13_13_32_02/

[grid@luda5 ~]$ cd amdu_2017_04_13_13_32_02/

[grid@luda5 amdu_2017_04_13_13_32_02]$ ls

DATA_261.f  report.txt

[grid@luda5 amdu_2017_04_13_13_32_02]$ mv DATA_261.f /u01/app/oracle/oradata/

恢复完成后移动新规划的存放ORACLE数据文件到/u01/app/oracle/oradata/目录。

 

 

2.3 从控制文件中找出数据文件信息

方法有两种:

1.启动数据库到MOUNT状态后查询v$dbfile;

注意在RAC环境下需要启动集群后,可以启动数据库实例,不然报如下错误:

[oracle@luda5 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Thu Apr 13 13:39:42 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

 

SQL> startup nomount pfile=’/home/oracle/restore.ora’;

ORA-29702: error occurred in Cluster Group Service operation

这里使用在GI集群启动后,利用从alert日志中找到的上一次数据库启动时的参数信息,生成如下参数文件:

[oracle@luda5 ~]$ cat restore.ora

processes                = 300

sessions                 = 472

sga_target               = 600M

control_files            = “/u01/app/oracle/oradata/DATA_261.f”

db_block_size            = 8192

compatible               = “11.2.0.4.0”

log_archive_format       = “%t_%s_%r.dbf”

db_recovery_file_dest_size= 4407M

undo_tablespace          = “UNDOTBS1”

remote_login_passwordfile= “EXCLUSIVE”

audit_file_dest          = “/u01/app/oracle/admin/ludarac/adump”

audit_trail              = “DB”

db_name                  = “ludarac”

open_cursors             = 300

pga_aggregate_target     = 100M

diagnostic_dest          = “/u01/app/oracle”

[oracle@luda5 ~]$ sqlplus / as sysdba

 

SQL*Plus: Release 11.2.0.4.0 Production on Thu Apr 13 14:19:23 2017

 

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

 

Connected to an idle instance.

 

SQL> startup nomount pfile=’/home/oracle/recover2.ora’

ORACLE instance started.

 

Total System Global Area  626327552 bytes

Fixed Size                  2255832 bytes

Variable Size             339739688 bytes

Database Buffers          281018368 bytes

Redo Buffers                3313664 bytes

SQL> show parameter cont

 

NAME                                 TYPE        VALUE

———————————— ———– ——————————

control_file_record_keep_time        integer     7

control_files                        string      /u01/app/oracle/oradata/DATA_2

61.f

control_management_pack_access       string      DIAGNOSTIC+TUNING

global_context_pool_size             string

SQL> alter database mount;

alter database mount

*

ERROR at line 1:

ORA-00221: error on write to control file

ORA-00206: error in writing (block 1, # blocks 1) of control file

ORA-00202: control file: ‘/u01/app/oracle/oradata/DATA_261.f’

ORA-27041: unable to open file

Linux-x86_64 Error: 13: Permission denied

Additional information: 3

 

修改相应权限:

[oracle@luda5 ~]$ ls -al /u01/app/oracle/oradata/DATA_261.f

-rw-r–r– 1 grid oinstall 18497536 Apr 13 13:32 /u01/app/oracle/oradata/DATA_261.f

 

[root@luda5 ~]# cd /u01/app/oracle/oradata/

[root@luda5 oradata]# chown -R oracle:oinstall DATA_2*

重新执行数据库MOUNT命令:

SQL> alter database mount;

 

Database altered.

查出数据文件信息:

SQL> col name for a60

SQL> select * from v$dbfile;

 

FILE# NAME

———- ————————————————————

4 +DATA/ludarac/datafile/users.259.939458821

3 +DATA/ludarac/datafile/undotbs1.258.939458821

2 +DATA/ludarac/datafile/sysaux.257.939458821

1 +DATA/ludarac/datafile/system.256.939458821

5 +DATA/ludarac/datafile/undotbs2.267.939459205

 

 

2.使用 strings命令读取控制文件找出数据文件信息:

[oracle@luda5 oradata]$ cp DATA_261.f DATA_261.fbak

[oracle@luda5 oradata]$ strings DATA_261.fbak

…………..有如下有用信息

+DATA/ludarac/datafile/users.259.939458821

+DATA/ludarac/datafile/undotbs1.258.939458821

+DATA/ludarac/datafile/sysaux.257.939458821

+DATA/ludarac/datafile/system.256.939458821

+DATA/ludarac/tempfile/temp.266.939458999

+DATA/ludarac/datafile/undotbs2.267.939459205

……………….

 

 

3.恢复数据文件

根据上一步从控制文件中查出的数据文件信息,使用 amdu工具直接从磁盘中读出相应数据文件:

提取数据文件的格式是:dg_name.asmfile_id

kfed读取ASM磁盘头的字段kfdhdb.dskname可以发现此磁盘所属的磁盘组,从而使用正确的磁盘路径;

同时从控制文件中有数据文件名称,里面包含了数据文件所属的磁盘组名及在磁盘组中的FILE_ID。

3.1Amdu提取数据文件

 

[grid@luda5 ~]$ amdu -diskstring ‘/dev/asm-diskc’ -extract ‘DATA.256’

amdu_2017_04_13_13_48_58/

[grid@luda5 ~]$ cd amdu_2017_04_13_13_48_58/

[grid@luda5 amdu_2017_04_13_13_48_58]$ ls

DATA_256.f  report.txt

[grid@luda5 amdu_2017_04_13_13_48_58]$ du -sh ./*

751M    ./DATA_256.f

4.0K    ./report.txt

[grid@luda5 amdu_2017_04_13_13_48_58]$ mv DATA_256.f /u01/app/oracle/oradata/

[grid@luda5 amdu_2017_04_13_13_48_58]$ cd

[grid@luda5 ~]$ amdu -diskstring ‘/dev/asm-diskc’ -extract ‘DATA.257’

amdu_2017_04_13_13_51_22/

[grid@luda5 ~]$ amdu -diskstring ‘/dev/asm-diskc’ -extract ‘DATA.258’

amdu_2017_04_13_13_53_45/

[grid@luda5 ~]$ amdu -diskstring ‘/dev/asm-diskc’ -extract ‘DATA.259’

amdu_2017_04_13_13_55_45/

[grid@luda5 ~]$ amdu -diskstring ‘/dev/asm-diskc’ -extract ‘DATA.267’

amdu_2017_04_13_13_56_20/

 

3.2移动数据文件位置及修改权限

将amdu读出的数据文件move到恢复目录:/u01/app/oracle/oradata/,并修改权限如下:

[root@luda5 oradata]# chown -R oracle:oinstall DATA_2*

[root@luda5 oradata]# ls -al

total 1444184

drwxrwxr-x  2 oracle oinstall      4096 Apr 13 13:58 .

drwxrwxr-x 12 oracle oinstall      4096 Apr 13 13:35 ..

-rw-r–r–  1 oracle oinstall 786440192 Apr 13 13:50 DATA_256.f

-rw-r–r–  1 oracle oinstall 545267712 Apr 13 13:52 DATA_257.f

-rw-r–r–  1 oracle oinstall  78651392 Apr 13 13:53 DATA_258.f

-rw-r–r–  1 oracle oinstall   5251072 Apr 13 13:55 DATA_259.f

-rw-r–r–  1 oracle oinstall  18497536 Apr 13 13:32 DATA_261.f

-rw-r–r–  1 oracle oinstall  18497536 Apr 13 13:45 DATA_261.fbak

-rw-r–r–  1 oracle oinstall  26222592 Apr 13 13:56 DATA_267.f

 

########################

4.OPEN数据库

4.1 修改数据文件位置信息

首先修改控制文件中的数据文件位置信息:—从v$dbfile;中查找对应关系

alter database rename file ‘+DATA/ludarac/datafile/users.259.939458821’    to ‘/u01/app/oracle/oradata/DATA_259.f’;

alter database rename file ‘+DATA/ludarac/datafile/undotbs1.258.939458821’ to ‘/u01/app/oracle/oradata/DATA_258.f’;

alter database rename file ‘+DATA/ludarac/datafile/sysaux.257.939458821’   to ‘/u01/app/oracle/oradata/DATA_257.f’;

alter database rename file ‘+DATA/ludarac/datafile/system.256.939458821’   to ‘/u01/app/oracle/oradata/DATA_256.f’;

alter database rename file ‘+DATA/ludarac/datafile/undotbs2.267.939459205’ to ‘/u01/app/oracle/oradata/DATA_267.f’;

验证如下:

SQL> select * from v$dbfile order by 1;

 

FILE# NAME

———- ————————————————————

1 /u01/app/oracle/oradata/DATA_256.f

2 /u01/app/oracle/oradata/DATA_257.f

3 /u01/app/oracle/oradata/DATA_258.f

4 /u01/app/oracle/oradata/DATA_259.f

5 /u01/app/oracle/oradata/DATA_267.f

 

检查当前恢复的数据文件在OPEN时是否需要恢复—-取决于之前是否正常关闭数据库

SQL> select ctl.FILE#,ctl.CHECKPOINT_CHANGE# as “ctl-CHECKPOINT_CHANGE#”,ctl.LAST_CHANGE# as “ctl-LAST_CHANGE#”,

2  dbf.CHECKPOINT_CHANGE# as “dbf-CHECKPOINT_CHANGE#”,dbf.status,dbf.fuzzy from v$datafile ctl,v$datafile_header dbf where ctl.file#=dbf.file#;

 

 

FILE# ctl-CHECKPOINT_CHANGE# ctl-LAST_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS  FUZ

———- ———————- —————- ———————- ——- —

1                1153651                                 1153651 ONLINE  YES

2                1153651                                 1153651 ONLINE  YES

3                1153651                                 1153651 ONLINE  YES

4                1153651                                 1153651 ONLINE  YES

5                1153651                                 1153651 ONLINE  YES

从数据文件头v$datafile_header.fuzzy 标记可以发现数据库是异常关闭(如shutdown abort/kill关键进程、断电等)的,数据文件头检查点是一致的,因此需要实例恢复。

 

4.2 重建控制文件

这里就不模拟使用amdu从磁盘中提取redolog文件,演示如何在没有redo log情况下不完全恢复打开数据库。

这里通过resetlogs方式重建控制文件后打开数据库:

–后续resetlogs方式open后还需要创建临时表空间等就不写了。

SQL>  startup mount pfile=’/home/oracle/restore.ora’;

ORACLE instance started.

 

Total System Global Area  626327552 bytes

Fixed Size                  2255832 bytes

Variable Size             184550440 bytes

Database Buffers          436207616 bytes

Redo Buffers                3313664 bytes

Database mounted.

 

SQL> alter database backup controlfile to trace as ‘/home/oracle/ctl.bak’;

 

Database altered.

[oracle@luda5 ~]$ sqlplus / as sysdba

 

SQL*Plus: Release 11.2.0.4.0 Production on Thu Apr 13 15:02:35 2017

 

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

 

Connected to an idle instance.

 

SQL>  startup nomount pfile=’/home/oracle/restore.ora’;

ORACLE instance started.

 

Total System Global Area  626327552 bytes

Fixed Size                  2255832 bytes

Variable Size             184550440 bytes

Database Buffers          436207616 bytes

Redo Buffers                3313664 bytes

SQL> CREATE CONTROLFILE REUSE DATABASE “ludaRAC” RESETLOGS  ARCHIVELOG

2      MAXLOGFILES 192

3      MAXLOGMEMBERS 3

4      MAXDATAFILES 1024

5      MAXINSTANCES 32

6      MAXLOGHISTORY 292

7  LOGFILE

8    GROUP 1 (

9      ‘/u01/app/oracle/oradata/red01.log’

10    ) SIZE 50M BLOCKSIZE 512,

11    GROUP 2 (

12      ‘/u01/app/oracle/oradata/red02.log’

13    ) SIZE 50M BLOCKSIZE 512

14  — STANDBY LOGFILE

15  DATAFILE

16    ‘/u01/app/oracle/oradata/DATA_256.f’,

17    ‘/u01/app/oracle/oradata/DATA_257.f’,

18    ‘/u01/app/oracle/oradata/DATA_258.f’,

19    ‘/u01/app/oracle/oradata/DATA_259.f’,

20    ‘/u01/app/oracle/oradata/DATA_267.f’

21  CHARACTER SET AL32UTF8

22  ;

 

Control file created.

 

 

4.3 open resetlogs方式打开数据库

SQL> alter database open resetlogs;

 

Database altered.

 

 

SQL> select open_mode from v$database;

 

OPEN_MODE

——————–

READ WRITE

 

此阶段ALERT日志;

Thu Apr 13 15:03:35 2017

alter database open resetlogs

RESETLOGS after incomplete recovery UNTIL CHANGE 1181942

。。。。。

Thu Apr 13 15:03:38 2017

Checker run found 1 new persistent data failures

Clearing online redo logfile 2 complete

Online log /u01/app/oracle/oradata/red01.log: Thread 1 Group 1 was previously cleared

Online log /u01/app/oracle/oradata/red02.log: Thread 1 Group 2 was previously cleared

Thu Apr 13 15:03:41 2017

Setting recovery target incarnation to 2

Initializing SCN for created control file

Database SCN compatibility initialized to 1

Thu Apr 13 15:03:41 2017

Assigning activation ID 2696147166 (0xa0b3f0de)

LGWR: STARTING ARCH PROCESSES

Thu Apr 13 15:03:41 2017

ARC0 started with pid=24, OS id=26972

ARC0: Archival started

LGWR: STARTING ARCH PROCESSES COMPLETE

ARC0: STARTING ARCH PROCESSES

Thu Apr 13 15:03:41 2017

ARC1 started with pid=25, OS id=26974

Thread 1 opened at log sequence 1

Current log# 1 seq# 1 mem# 0: /u01/app/oracle/oradata/red01.log

Successful open of redo thread 1

MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set

Thu Apr 13 15:03:41 2017

SMON: enabling cache recovery

Thu Apr 13 15:03:41 2017

ARC2 started with pid=26, OS id=26976

ARC1: Archival started

ARC2: Archival started

ARC1: Becoming the ‘no FAL’ ARCH

ARC1: Becoming the ‘no SRL’ ARCH

ARC2: Becoming the heartbeat ARCH

Thu Apr 13 15:03:41 2017

ARC3 started with pid=27, OS id=26978

ARC3: Archival started

ARC0: STARTING ARCH PROCESSES COMPLETE

[26786] Successfully onlined Undo Tablespace 2.

Undo initialization finished serial:0 start:18610244 end:18611614 diff:1370 (13 seconds)

Dictionary check beginning

Tablespace ‘TEMP’ #3 found in data dictionary,

but not in the controlfile. Adding to controlfile.

Dictionary check complete

Verifying file header compatibility for 11g tablespace encryption..

Verifying 11g file header compatibility for tablespace encryption completed

SMON: enabling tx recovery

Database Characterset is AL32UTF8

No Resource Manager plan active

Thu Apr 13 15:03:46 2017

replication_dependency_tracking turned off (no async multimaster replication found)

Starting background process QMNC

Thu Apr 13 15:03:48 2017

QMNC started with pid=28, OS id=26989

LOGSTDBY: Validating controlfile with logical metadata

LOGSTDBY: Validation complete

Thu Apr 13 15:04:02 2017

Completed: alter database open resetlogs

Oracle amdu工具介绍及导出asm磁盘数据文件

RAC中数据库节点间的SCN同步方式
从10.2版本开始,默认使用BOC(broadcast-on-commit)方式; 这种方式前提是当某个节点的SCN发生改变(或者如commit)时才进行传播;
BOC方式传播SCN分为两种方式:直接方式、间接方式

1.用户进行提交事务–COMMIT;此时LGWR进程开始进行写日志操作;用户进行等待事件log file sync
2.LGWR如何将COMMIT SCN发送到其它节点,有如下两种方式:
LGWR进程将事务的COMMIT SCN发送给本地的LMS进程,;(此为间接方式,当有多个LMS进程时LGWR使用HASH方式决定使用哪一个-为了负载均衡)
LGWR进程直接将事务的COMMIT SCN发送给远程节点的LMS进程;(此为直接方式,同样HASH方式决定使用远程节点的哪个LMS进程)
3.LGWR将事务的redo entry写入REDO FILE;(此步骤和2是同时发生)
4.远程节点接收COMMIT SCN信息并确认:
直接方式:远程节点的LMS进程收到SCN信息并返回给本地LMS进程;
间接方式:本地节点LMS进程将SCN信息发给所有远程节点,远程节点的LMS进程收到SCN信息并返回给本地LMS进程;
5.LGWR将事务的redo entry写入REDO FILE的IO操作完成,并反馈给LGWR进程(4、5步骤的顺序与网络及IO速度相关,顺序是变化的)
6.本地节点LMS进程将所有远程节点对COMMIT SCN确认信息通知LGWR;
7.本地LGWR通知用户进程事务完成。

#################
步骤1:实验语句
SQL> alter system set events=’trace[GCS_BSCN] disk=low’; –两个节点均需要进行

System altered.

insert into t2 select dbms_flashback.get_system_change_number from dual;
commit;

SQL> select * from t2;

ID
———-
1536519
1536528
1538467
1538851
1540593

SQL> select to_number(‘1781f2′,’xxxxxxxxx’) from dual;

TO_NUMBER(‘1781F2′,’XXXXXXXXX’)
——————————-
1540594
2. 查询节点1的LGWR进行TRACE文件:
1.lgwr
*** 2016-12-17 20:42:03.068
2016-12-17 20:42:03.068586 : kjbbcastscn[0x0.1781f2][2][4] –直接广播发送SCN–1781f2–对应SCN为1540594,与实验中的可以对应。
2016-12-17 20:42:03.068650 : kjbsentscn[0x0.1781f3][to 2]
2016-12-17 20:42:03.068785 : kjbmscscn(to 2.1)(nrcvr 1)(ack 0)(scn 0x0.1781f2)

3.查询节点2的LMS进程的TRACE文件:
2.lms
*** 2016-12-17 20:42:03.069
2016-12-17 20:42:03.069106 : kjbrcvdscn[0x0.1781f3][from 1][idx 2016-12-17 20:42:03.069265 : kjbrcvdscn[no bscn <= rscn 0x0.1781f3][from 1]
2016-12-17 20:42:03.069282 : kjbmpscn(nnodes 2)(from 1)(scn 0x0.1781f2) –收到节点1发来的SCN信息1781f2
2016-12-17 20:42:03.069327 : kjbmscscn(to 1.1)(nrcvr 1)(ack 1)(scn 0x0.1781f2) –对消息进行了响应ACK
2016-12-17 20:42:03.069370 : kjbsentscn[0x0.1781f3][to 1]

4.查看节点1的LMS进程的TRACE文件(间接模式时LMS进程TRACE里有kjbsendscn 日志信息):

*** 2016-12-17 20:42:03.069
2016-12-17 20:42:03.069810 : kjbrcvdscn[0x0.1781f3][from 2][idx 2016-12-17 20:42:03.069872 : kjbrcvdscn[no bscn <= rscn 0x0.1781f3][from 2]
2016-12-17 20:42:03.069886 : kjbmpscnack(nnodes 2)(from 2)(scn 0x0.1781f2) –收到节点2对SCN 1781f2的ACK

*** 2016-12-17 20:42:12.459

5.查询节点1的LGWR进行TRACE文件:
*** 2016-12-17 20:42:03.154
2016-12-17 20:42:03.154085 : kjbmaxackscn[mscn 0x0.1781f3][nnodes 2][pend -2115384694] –可以理解为收到节点2对SCN ACK信息?
2016-12-17 20:42:03.154140 : kcrfw_post[0x0.1781f2][slot 0][mscn 0x0.1781f3][real 1] –LGWR写入完成
2016-12-17 20:42:03.154161 : kcrfw_post[lgwr post][lscn 0x0.1781f3][oods 0x0.1781f1][nods 0x0.1781f3]

————————————-
1.
lgwr —这次测试的时没有看到LGWR进程收到远程节点确认收到SCN信息的日志scn not acked:
*** 2016-12-17 20:08:00.032
2016-12-17 20:08:00.032857 : kjbbcastscn[0x0.177b24][2][4]
2016-12-17 20:08:00.032916 : kjbsentscn[0x0.177b25][to 2]
2016-12-17 20:08:00.033108 : kjbmscscn(to 2.1)(nrcvr 1)(ack 0)(scn 0x0.177b24)
2016-12-17 20:08:00.033644 : kjbmaxackscn[mscn 0x0.177b23][nnodes 2][pend 1]
2016-12-17 20:08:00.033665 : kcrfw_post[0x0.177b24][slot 0][mscn 0x0.177b23][real 1]
2016-12-17 20:08:00.033674 : kcrfw_post[scn not acked][lscn 0x0.177b25][oods 0x0.177b23][nods 0x0.177b24] –这里是scn not acked

2.2.lms
*** 2016-12-17 20:08:00.033
2016-12-17 20:08:00.033787 : kjbmpscn(nnodes 2)(from 1)(scn 0x0.177b24)
2016-12-17 20:08:00.033813 : kjbmscscn(to 1.1)(nrcvr 1)(ack 1)(scn 0x0.177b24)
2016-12-17 20:08:00.033839 : kjbsentscn[0x0.177b25][to 1]

3.1,lms
*** 2016-12-17 20:08:00.034
2016-12-17 20:08:00.034879 : kjbrcvdscn[0x0.177b25][from 2][idx 2016-12-17 20:08:00.034933 : kjbmaxackscn[mscn 0x0.177b25][nnodes 2][pend 1]
2016-12-17 20:08:00.034941 : kjbrcvdscn[old ods 0x0.177b24][new 0x0.177b25][mscn 0x0.177b25]
2016-12-17 20:08:00.034962 : kjb_post_fg[kjbrcvdscn][scn 0x0.177b24][wait time 1319us]
2016-12-17 20:08:00.034971 : kjbmpscnack(nnodes 2)(from 2)(scn 0x0.177b24)

4.1.lgwr
2016-12-17 20:08:00.033665 : kcrfw_post[0x0.177b24][slot 0][mscn 0x0.177b23][real 1]
2016-12-17 20:08:00.033674 : kcrfw_post[scn not acked][lscn 0x0.177b25][oods 0x0.177b23][nods 0x0.177b24]

Oracle RAC环境节点直接SCN同步方式介绍与测试