Skip to content

Oracle - 22. page

使用kfed修复损坏的asm disk header以及恢复原理测试.(disk header backup in au no.1)

In ASM versions 11.1.0.7 and later, the ASM disk header block is backed up in the second last ASM metadata block in the allocation unit 1.
Kfed parameters

aun – Allocation Unit (AU) number to read from. Default is AU0, or the very beginning of the ASM disk.
aus – AU size. Default is 1048576 (1MB). Specify the aus when reading from a disk group with non-default AU size.
blkn – block number to read. Default is block 0, or the very first block of the AU.
dev – ASM disk or device name. Note that the keyword dev can be omitted, but the ASM disk name is mandatory.
Understanding ASM disk layout

Read ASM disk header block from AU[0]

[root@grac41 Desktop]# kfed read  /dev/asm_test_1G_disk1 | egrep 'name|size|type'
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD   <-- ASM disk header
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9          <-- ASM disk name
kfdhdb.grpname:                    TEST ; 0x048: length=4          <-- ASM DG name
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9          <-- ASM Failgroup
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200            <-- Disk sector size
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000            <-- ASM block size
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000        <-- AU size : 1 Mbyte
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff        <-- ASM disk size : 1 GByte

Check ASM block types for the first 2 AUs
AU[0] :

[root@grac41 Desktop]# kfed find /dev/asm_test_1G_disk1
Block 0 has type 1
Block 1 has type 2
Block 2 has type 3
Block 3 has type 3
Block 4 has type 3
Block 5 has type 3
Block 6 has type 3
Block 7 has type 3
Block 8 has type 3
Block 9 has type 3
Block 10 has type 3
..
Block 252 has type 3
Block 253 has type 3
Block 254 has type 3
Block 255 has type 3

AU[1] :

[root@grac41 Desktop]#  kfed find /dev/asm_test_1G_disk1 aun=1
Block 256 has type 17
Block 257 has type 17
Block 258 has type 13
Block 259 has type 18
Block 260 has type 13
..
Block 508 has type 13
Block 509 has type 13
Block 510 has type 1
Block 511 has type 19

Summary :

–> Disk header size is 512 bytes
AU size = 1Mbyte –> AU block size = 4096
This translates to 1048576 / 4096 = 256 blocks to read an AU ( start with block 0 – 255 )
Block 510 and block 0 storing an ASM disk header ( == type 1 )

Run the kfed command below if you interested in a certain ASM block type ( use output from kfed find to the type info )
[root@grac41 Desktop]# kfed read /dev/asm_test_1G_disk1 aun=1 blkn=255 | egrep ‘type’
kfbh.type: 19 ; 0x002: KFBTYP_HBEAT

Some ASM block types

[root@grac41 Desktop]# kfed read  /dev/asm_test_1G_disk1 aun=0 blkn=0  | egrep 'type'
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.type:                            5 ; 0x002: KFBTYP_LISTHEAD
kfbh.type:                           13 ; 0x002: KFBTYP_PST_NONE
kfbh.type:                           18 ; 0x002: KFBTYP_PST_DTA
kfbh.type:                           19 ; 0x002: KFBTYP_HBEAT

Repair ASM disk header block in AU[0] with kfed repair

In ASM versions 11.1.0.7 and later, the ASM disk header block is backed up in the second last ASM metadata block in the allocation unit 1.
Verify ASM DISK Header block located in AU[0] and AU[1]
AU[0] :

[root@grac41 Desktop]# kfed read  /dev/asm_test_1G_disk1 aun=0 blkn=0 | egrep 'name|size|type'
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9
kfdhdb.grpname:                    TEST ; 0x048: length=4
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff

AU[1] :

[root@grac41 Desktop]# kfed read  /dev/asm_test_1G_disk1 aun=1 blkn=254  | egrep 'name|size|type'
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9
kfdhdb.grpname:                    TEST ; 0x048: length=4
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff

Erase Disk header block in first AU ( aun=0 blkn=0 )

# dd if=/dev/zero of=/dev/asm_test_1G_disk1  bs=4096 count=1

Verify ASM disk header

# kfed read /dev/asm_test_1G_disk1 aun=0 blkn=0
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
--> Corrupted ASM disk header detected in AU [0]

Repair disk header in AU[0] with kfed

[grid@grac41 ASM]$ kfed repair  /dev/asm_test_1G_disk1
[grid@grac41 ASM]$ kfed read /dev/asm_test_1G_disk1 aun=0 blkn=0
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9
kfdhdb.grpname:                    TEST ; 0x048: length=4
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff
--> kfed repair worked - Disk header restored

Can kfed repair the Disk header block stored in the 2.nd AU ?

Delete Disk header block in AU[1]
First use dd to figure out whether we are getting the correct block

[grid@grac41 ASM]$  dd if=/dev/asm_test_1G_disk1 of=-  bs=4096 count=1 skip=510 ; strings block1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000464628 s, 8.8 MB/s
ORCLDISK
TEST_0000
TEST
TEST_0000
--> looks like an ASM disk header - go ahead and erase that block

[grid@grac41 ASM]$  dd if=/dev/zero of=/dev/asm_test_1G_disk1  bs=4096 count=1  seek=510
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00644028 s, 636 kB/s

Verify ASM disk header block in AU[1]

[grid@grac41 ASM]$ kfed read /dev/asm_test_1G_disk1 aun=1 blkn=254
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
--> Corrupted ASM disk header detected

[grid@grac41 ASM]$ kfed repair  /dev/asm_test_1G_disk1
KFED-00320: Invalid block num1 = [0], num2 = [1], error = [endian_kfbh]
--> kfed repair doesn' work

Repair block with dd

grid@grac41 ASM]$ dd if=/dev/asm_test_1G_disk1  bs=4096  count=1 of=/dev/asm_test_1G_disk1  bs=4096 count=1  seek=510
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.0306682 s, 134 kB/s
[grid@grac41 ASM]$ kfed read /dev/asm_test_1G_disk1 aun=0 blkn=0
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9
kfdhdb.grpname:                    TEST ; 0x048: length=4
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff

# kfed read /dev/asm_test_1G_disk1 aun=1 blkn=254
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9
kfdhdb.grpname:                    TEST ; 0x048: length=4
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff

Summary:

to fix the backup block or the ASM disk header in AU 1 block you need to use dd

Reference:

http://laurent-leturgez.com/2012/11/12/how-asm-disk-header-block-repair-works/
http://asmsupportguy.blogspot.fr/2010/04/kfed-asm-metadata-editor.html
http://asmsupportguy.blogspot.co.uk/2011/08/asm-disk-header.html

_use_adaptive_log_file_sync参数的使用建议

_use_adaptive_log_file_sync该参数在commit次数较多的系统中,post/wait and polling特性对性能影响明显,主要是在写日志上的等待上影响了事务的提交速度.建议在11g中关闭(设置为false).

该参数可以参考以下解述:

Adaptive Log File sync was introduced in 11.2. the feature is exactly enabled since release 11.2.0.3 , It’s enabled through an underscore parameter called _use_adaptive_log_file_sync and the description of this parameter is: adaptively switch between post/wait and polling.

Oracle can switches between the 2 methods:
Post/wait, traditional method for posting completion of writes to redo log

LGWR explicitly posts all processes waiting for the commit to complete.
The advantage of the post/wait method is that sessions should find out almost immediately when the redo has been flushed to disk.

Polling, a new method where the foreground process checks if the LGWR has completed the write.

Foreground processes sleep and poll to see if the commit is complete. The advantage of this new method is to free LGWR from having to inform many processes waiting on commit to complete thereby freeing high CPU usage by the LGWR.If post/wait is selected and the foreground processes fail to receive a post from LGWR, an incident is recorded.diagnostic traces are performed, and polling is used instead of post/wait.

Oracle uses semaphores extensively, If you look for references to “commit” in the Oracle docs, you’ll find the word “post” everywhere when they talk about communication between the foreground processes and LGWR. Now, remember that a COMMIT has two options: first, IMMEDIATE or BATCH and second, WAIT or NOWAIT. It looks like this to me:

Immediate: FG process will post to LGWR, triggering I/O (default)
Batch: FG process will not post LGWR
Wait: LGWR will post FG process when I/O is complete (default)
NoWait: LGWR will not post FG process when I/O is complete

There are five more hidden parameters with that string

select a.ksppinm name, b.ksppstvl value, a.ksppdesc description
from sys.x$ksppi a, sys.x$ksppcv b
where a.indx = b.indx and a.ksppinm like ‘\_%adaptive\_log%’ escape ‘\’
order by name;

sys@ANBOB>/

NAME VALUE DESCRIPTION
————————————————– ———- ———————————————————————-
_adaptive_log_file_sync_high_switch_freq_threshold 3 Threshold for frequent log file sync mode switches (per minute)
_adaptive_log_file_sync_poll_aggressiveness 0 Polling interval selection bias (conservative=0, aggressive=100)
_adaptive_log_file_sync_sched_delay_window 60 Window (in seconds) for measuring average scheduling delay
_adaptive_log_file_sync_use_polling_threshold 200 Ratio of redo synch time to expected poll time as a percentage
_adaptive_log_file_sync_use_postwait_threshold 50 Percentage of foreground load from when post/wait was last used
_use_adaptive_log_file_sync TRUE Adaptively switch between post/wait and polling

So how do you know if you’re using this feature? The most obvious sign will be in the LGWR trace file.
There will be messages each time that a switch happens, looking something like this:
in the case my db name and instance name both is kaoshi .

[oracle@anbob trace]$ pwd
/oracle/diag/rdbms/kaoshi/kaoshi/trace

[oracle@anbob trace]$ grep -n -B 4 “kcrfw_update_adaptive_sync_mode” *_lgwr_*

Adaptive Switching Between Log Write Methods can Cause ‘log file sync’ Waits (文档 ID 1462942.1)

kaoshi_lgwr_3823.trc-147-*** 2013-11-23 12:06:23.288
kaoshi_lgwr_3823.trc-148-Log file sync switching to polling
kaoshi_lgwr_3823.trc-149-Current scheduling delay is 1 usec
kaoshi_lgwr_3823.trc-150-Current approximate redo synch write rate is 30 per sec
kaoshi_lgwr_3823.trc:151:kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=0 switch_sync_count_delta=90
kaoshi_lgwr_3823.trc-152-
kaoshi_lgwr_3823.trc-153-*** 2013-11-23 12:56:23.567
kaoshi_lgwr_3823.trc-154-Log file sync switching to post/wait
kaoshi_lgwr_3823.trc-155-Current approximate redo synch write rate is 0 per sec
kaoshi_lgwr_3823.trc:156:kcrfw_update_adaptive_sync_mode: post->poll long#=36 sync#=131 sync=390 poll=1058 rw=217 ack=0 min_sleep=1058

kaoshi_lgwr_3823.trc-160-Current scheduling delay is 1 usec
kaoshi_lgwr_3823.trc-161-Current approximate redo synch write rate is 43 per sec
kaoshi_lgwr_3823.trc-162-
kaoshi_lgwr_3823.trc-163-*** 2013-11-23 14:37:46.978
kaoshi_lgwr_3823.trc:164:kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=25 switch_sync_count_delta=131
kaoshi_lgwr_3823.trc-165-
kaoshi_lgwr_3823.trc-166-*** 2013-11-23 14:37:46.978
kaoshi_lgwr_3823.trc-167-Log file sync switching to post/wait
kaoshi_lgwr_3823.trc-168-Current approximate redo synch write rate is 8 per sec
kaoshi_lgwr_3823.trc:169:kcrfw_update_adaptive_sync_mode: post->poll long#=86 sync#=121 sync=2 poll=1058 rw=152 ack=0 min_sleep=1058
Disable adaptive log file sync

If there is a need to disable adaptive log file sync this can be done by setting _use_adaptive_log_file_sync = false either in the spfile and restarting the database or dynamically in memory.

“_use_adaptive_log_file_sync” is a dynamic parameter that can be changed at system level.

ALTER SYSTEM SET “_use_adaptive_log_file_sync”= false;
Known Issues with “_use_adaptive_log_file_sync” set to TRUE

See the following documents with information on bugs involving Adaptive Log File Sync:
Document 1462942.1 Adaptive Switching Between Log Write Methods can Cause ‘log file sync’ Waits
Document 13707904.8 Bug 13707904 – LGWR sometimes uses polling, sometimes post/wait
Document 13074706.8 Bug 13074706 – Long “log file sync” waits in RAC not correlated with slow writes

References
Waits for “log file sync” with Adaptive Polling vs Post/Wait Choice Enabled (文档 ID 1541136.1)

关于11g密码大小写验证以及密码延迟验证特性引发的血案所思

在Oracle的11g之前的版本中密码是不区分大小写的(使用双引号强制除外)。在Oracle的11g版本中对此有所增强。从此密码有了大小写的区分,这个大小写敏感特性是通过SEC_CASE_SENSITIVE_LOGON参数来控制的。该参数默认设置为true。

前阵子客户数据从10g迁往11g后,这个参数未修改,新上线第二天全国业务人员输入密码时候参照以前习惯密码全部小写,导致部分业务无法登陆,由于反复尝试登陆导致触发了著名的登陆延迟验证的特性延伸出了library cache lock(Delay after three failed login attempts),导致全线应用响应缓慢假死。当然解决过程当时根据现象,进程达到限制值,做了等待事件模型的分析,发现大量的session都没有成功登陆,很多信息都为null逐判断因为密码延迟验证的原因导致,而密码错误的事情再经过现场验证后发现了大小写敏感的问题。后面就是很熟套的剧情,领导抽了根烟,痛骂了原厂一顿,把此2个特性都关掉后,没有再出现过进程累积的现象。由此问题引发了对密码大小写验证以及密码延迟验证特性的思考.

一:关于sec_case_sensitive_logon的设置

关于sec_case_sensitive_logon参数所关联的密码大小写敏感,我建议在升级数据库系统时候关闭,在新建数据库系统时候关闭。可以配合应用密码策略使用,默认时候可以商讨关闭。

二:密码延迟验证的特性使用与否
关于Delay after three failed login attempts的特性参考yangtingkun的http://blog.itpub.net/4227/viewspace-672925

这个密码延迟特性特性的解决办法有几种,这里我大概描述下:

    1.设置登陆密码验证失败次数超过几次后锁定用户,该办法可以防止用户反复登陆验证,但是也会增加一定的维护工作
    2.通过诊断事件28401关闭密码延迟验证

同时我并不建议用第二种办法,密码延迟验证的特性可以有效遏制恶意破解密码的行为,因此从数据库安全的角度我建议设置密码失败超数锁用户。

关于event 28401

ALTER SYSTEM SET EVENT = '28401 TRACE NAME CONTEXT FOREVER, LEVEL 1' SCOPE = SPFILE

[oracle@ludatou ~]$ oerr ora 28401
28401, 00000, "Event to disable delay after three failed login attempts"
// *Document: NO
// *Cause: N/A
// *Action: Set this event in your environment to disable the login delay
//          which will otherwise take place after three failed login attempts.
// *Note: THIS IS NOT A USER ERROR NUMBER/MESSAGE. THIS DOES NOT NEED TO BE
//        TRANSLATED OR DOCUMENTED.