Skip to content

Grid and Cluster

使用kfed修复损坏的asm disk header以及恢复原理测试.(disk header backup in au no.1)

In ASM versions 11.1.0.7 and later, the ASM disk header block is backed up in the second last ASM metadata block in the allocation unit 1.
Kfed parameters

aun – Allocation Unit (AU) number to read from. Default is AU0, or the very beginning of the ASM disk.
aus – AU size. Default is 1048576 (1MB). Specify the aus when reading from a disk group with non-default AU size.
blkn – block number to read. Default is block 0, or the very first block of the AU.
dev – ASM disk or device name. Note that the keyword dev can be omitted, but the ASM disk name is mandatory.
Understanding ASM disk layout

Read ASM disk header block from AU[0]

[root@grac41 Desktop]# kfed read  /dev/asm_test_1G_disk1 | egrep 'name|size|type'
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD   <-- ASM disk header
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9          <-- ASM disk name
kfdhdb.grpname:                    TEST ; 0x048: length=4          <-- ASM DG name
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9          <-- ASM Failgroup
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200            <-- Disk sector size
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000            <-- ASM block size
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000        <-- AU size : 1 Mbyte
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff        <-- ASM disk size : 1 GByte

Check ASM block types for the first 2 AUs
AU[0] :

[root@grac41 Desktop]# kfed find /dev/asm_test_1G_disk1
Block 0 has type 1
Block 1 has type 2
Block 2 has type 3
Block 3 has type 3
Block 4 has type 3
Block 5 has type 3
Block 6 has type 3
Block 7 has type 3
Block 8 has type 3
Block 9 has type 3
Block 10 has type 3
..
Block 252 has type 3
Block 253 has type 3
Block 254 has type 3
Block 255 has type 3

AU[1] :

[root@grac41 Desktop]#  kfed find /dev/asm_test_1G_disk1 aun=1
Block 256 has type 17
Block 257 has type 17
Block 258 has type 13
Block 259 has type 18
Block 260 has type 13
..
Block 508 has type 13
Block 509 has type 13
Block 510 has type 1
Block 511 has type 19

Summary :

–> Disk header size is 512 bytes
AU size = 1Mbyte –> AU block size = 4096
This translates to 1048576 / 4096 = 256 blocks to read an AU ( start with block 0 – 255 )
Block 510 and block 0 storing an ASM disk header ( == type 1 )

Run the kfed command below if you interested in a certain ASM block type ( use output from kfed find to the type info )
[root@grac41 Desktop]# kfed read /dev/asm_test_1G_disk1 aun=1 blkn=255 | egrep ‘type’
kfbh.type: 19 ; 0x002: KFBTYP_HBEAT

Some ASM block types

[root@grac41 Desktop]# kfed read  /dev/asm_test_1G_disk1 aun=0 blkn=0  | egrep 'type'
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.type:                            5 ; 0x002: KFBTYP_LISTHEAD
kfbh.type:                           13 ; 0x002: KFBTYP_PST_NONE
kfbh.type:                           18 ; 0x002: KFBTYP_PST_DTA
kfbh.type:                           19 ; 0x002: KFBTYP_HBEAT

Repair ASM disk header block in AU[0] with kfed repair

In ASM versions 11.1.0.7 and later, the ASM disk header block is backed up in the second last ASM metadata block in the allocation unit 1.
Verify ASM DISK Header block located in AU[0] and AU[1]
AU[0] :

[root@grac41 Desktop]# kfed read  /dev/asm_test_1G_disk1 aun=0 blkn=0 | egrep 'name|size|type'
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9
kfdhdb.grpname:                    TEST ; 0x048: length=4
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff

AU[1] :

[root@grac41 Desktop]# kfed read  /dev/asm_test_1G_disk1 aun=1 blkn=254  | egrep 'name|size|type'
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9
kfdhdb.grpname:                    TEST ; 0x048: length=4
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff

Erase Disk header block in first AU ( aun=0 blkn=0 )

# dd if=/dev/zero of=/dev/asm_test_1G_disk1  bs=4096 count=1

Verify ASM disk header

# kfed read /dev/asm_test_1G_disk1 aun=0 blkn=0
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
--> Corrupted ASM disk header detected in AU [0]

Repair disk header in AU[0] with kfed

[grid@grac41 ASM]$ kfed repair  /dev/asm_test_1G_disk1
[grid@grac41 ASM]$ kfed read /dev/asm_test_1G_disk1 aun=0 blkn=0
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9
kfdhdb.grpname:                    TEST ; 0x048: length=4
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff
--> kfed repair worked - Disk header restored

Can kfed repair the Disk header block stored in the 2.nd AU ?

Delete Disk header block in AU[1]
First use dd to figure out whether we are getting the correct block

[grid@grac41 ASM]$  dd if=/dev/asm_test_1G_disk1 of=-  bs=4096 count=1 skip=510 ; strings block1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000464628 s, 8.8 MB/s
ORCLDISK
TEST_0000
TEST
TEST_0000
--> looks like an ASM disk header - go ahead and erase that block

[grid@grac41 ASM]$  dd if=/dev/zero of=/dev/asm_test_1G_disk1  bs=4096 count=1  seek=510
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00644028 s, 636 kB/s

Verify ASM disk header block in AU[1]

[grid@grac41 ASM]$ kfed read /dev/asm_test_1G_disk1 aun=1 blkn=254
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
--> Corrupted ASM disk header detected

[grid@grac41 ASM]$ kfed repair  /dev/asm_test_1G_disk1
KFED-00320: Invalid block num1 = [0], num2 = [1], error = [endian_kfbh]
--> kfed repair doesn' work

Repair block with dd

grid@grac41 ASM]$ dd if=/dev/asm_test_1G_disk1  bs=4096  count=1 of=/dev/asm_test_1G_disk1  bs=4096 count=1  seek=510
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.0306682 s, 134 kB/s
[grid@grac41 ASM]$ kfed read /dev/asm_test_1G_disk1 aun=0 blkn=0
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9
kfdhdb.grpname:                    TEST ; 0x048: length=4
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff

# kfed read /dev/asm_test_1G_disk1 aun=1 blkn=254
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname:               TEST_0000 ; 0x028: length=9
kfdhdb.grpname:                    TEST ; 0x048: length=4
kfdhdb.fgname:                TEST_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                    1023 ; 0x0c4: 0x000003ff

Summary:

to fix the backup block or the ASM disk header in AU 1 block you need to use dd

Reference:

http://laurent-leturgez.com/2012/11/12/how-asm-disk-header-block-repair-works/
http://asmsupportguy.blogspot.fr/2010/04/kfed-asm-metadata-editor.html
http://asmsupportguy.blogspot.co.uk/2011/08/asm-disk-header.html

由繁化简:快速解析asm disk header结构

在asm instance crash,特定的一些场景下无法启动asm,为了恢复数据或者其他需要则需要直接从disk中读取数据,此时asm disk的number则为必需获取的信息,特别是在一些不规范的disk命名环境,因此需要用kfed读取磁盘头依次获取信息。

在disk中

kfdhdb.dsknum 代表磁盘号
kfdhdb.grptyp 代表磁盘类型
kfdhdb.dskname 代表磁盘名字
kfdhdb.grpname 代表磁盘所在asm diskgroup 名字
kfdhdb.blksize 代表磁盘的块单元大小
kfdhdb.dsksize 代表磁盘的全部大小

因此识别asm disk所属的asm diskgroup number以及asm disk number只需要在用kfed read asm disk时候加上dsknum以及grpname来区分即可
例如

 kfed read /dev/oracleasm/disks/VOL01   | grep dsknum

或者写成批量处理的脚本,效率会更高。
通过此种方式找到0号磁盘的情况下,通过识别2号au的的file 1,就可以找到想恢复的文件了。

案例:

[oracle@oradb bin]$ kfed read /dev/oracleasm/disks/VOL01
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                  3091711072 ; 0x00c: 0xb847c460
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:    ORCLDISKVOL01 ; 0x000: length=13
kfdhdb.driver.reserved[0]:    810307414 ; 0x008: 0x304c4f56
kfdhdb.driver.reserved[1]:           49 ; 0x00c: 0x00000031
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000                  //这里是磁盘号。
kfdhdb.grptyp:                        3 ; 0x026: KFDGTP_HIGH             //这里是磁盘类型。
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:                   VOL01 ; 0x028: length=5                //磁盘名称。
kfdhdb.grpname:                    DATA ; 0x048: length=4                //磁盘所属的磁盘组。
kfdhdb.fgname:                    VOL01 ; 0x068: length=5
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             32942006 ; 0x0a8: HOUR=0x16 DAYS=0x1d MNTH=0x9 YEAR=0x7da
kfdhdb.crestmp.lo:            449689600 ; 0x0ac: USEC=0x0 MSEC=0x36e SECS=0x2c MINS=0x6
kfdhdb.mntstmp.hi:             32942646 ; 0x0b0: HOUR=0x16 DAYS=0x11 MNTH=0xa YEAR=0x7da
kfdhdb.mntstmp.lo:           1573951488 ; 0x0b4: USEC=0x0 MSEC=0x26 SECS=0x1d MINS=0x17
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000                    //这里指每个block的大小。
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                   10236 ; 0x0c4: 0x000027fc             //这里是磁盘的大小。
kfdhdb.pmcnt:                         2 ; 0x0c8: 0x00000002
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000
kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi:             32942006 ; 0x0e4: HOUR=0x16 DAYS=0x1d MNTH=0x9 YEAR=0x7da
kfdhdb.grpstmp.lo:            448217088 ; 0x0e8: USEC=0x0 MSEC=0x1d0 SECS=0x2b MINS=0x6
kfdhdb.ub4spare[0]:                   0 ; 0x0ec: 0x00000000
kfdhdb.ub4spare[1]:                   0 ; 0x0f0: 0x00000000
kfdhdb.ub4spare[2]:                   0 ; 0x0f4: 0x00000000
kfdhdb.ub4spare[3]:                   0 ; 0x0f8: 0x00000000
kfdhdb.ub4spare[4]:                   0 ; 0x0fc: 0x00000000
kfdhdb.ub4spare[5]:                   0 ; 0x100: 0x00000000
kfdhdb.ub4spare[6]:                   0 ; 0x104: 0x00000000
kfdhdb.ub4spare[7]:                   0 ; 0x108: 0x00000000
kfdhdb.ub4spare[8]:                   0 ; 0x10c: 0x00000000
kfdhdb.ub4spare[9]:                   0 ; 0x110: 0x00000000
kfdhdb.ub4spare[10]:                  0 ; 0x114: 0x00000000
kfdhdb.ub4spare[11]:                  0 ; 0x118: 0x00000000
kfdhdb.ub4spare[12]:                  0 ; 0x11c: 0x00000000
kfdhdb.ub4spare[13]:                  0 ; 0x120: 0x00000000
kfdhdb.ub4spare[14]:                  0 ; 0x124: 0x00000000
kfdhdb.ub4spare[15]:                  0 ; 0x128: 0x00000000
kfdhdb.ub4spare[16]:                  0 ; 0x12c: 0x00000000
kfdhdb.ub4spare[17]:                  0 ; 0x130: 0x00000000
kfdhdb.ub4spare[18]:                  0 ; 0x134: 0x00000000
kfdhdb.ub4spare[19]:                  0 ; 0x138: 0x00000000
kfdhdb.ub4spare[20]:                  0 ; 0x13c: 0x00000000
kfdhdb.ub4spare[21]:                  0 ; 0x140: 0x00000000
kfdhdb.ub4spare[22]:                  0 ; 0x144: 0x00000000
kfdhdb.ub4spare[23]:                  0 ; 0x148: 0x00000000
kfdhdb.ub4spare[24]:                  0 ; 0x14c: 0x00000000
kfdhdb.ub4spare[25]:                  0 ; 0x150: 0x00000000
kfdhdb.ub4spare[26]:                  0 ; 0x154: 0x00000000
kfdhdb.ub4spare[27]:                  0 ; 0x158: 0x00000000
kfdhdb.ub4spare[28]:                  0 ; 0x15c: 0x00000000
kfdhdb.ub4spare[29]:                  0 ; 0x160: 0x00000000
kfdhdb.ub4spare[30]:                  0 ; 0x164: 0x00000000
kfdhdb.ub4spare[31]:                  0 ; 0x168: 0x00000000
kfdhdb.ub4spare[32]:                  0 ; 0x16c: 0x00000000
kfdhdb.ub4spare[33]:                  0 ; 0x170: 0x00000000
kfdhdb.ub4spare[34]:                  0 ; 0x174: 0x00000000
kfdhdb.ub4spare[35]:                  0 ; 0x178: 0x00000000
kfdhdb.ub4spare[36]:                  0 ; 0x17c: 0x00000000
kfdhdb.ub4spare[37]:                  0 ; 0x180: 0x00000000
kfdhdb.ub4spare[38]:                  0 ; 0x184: 0x00000000
kfdhdb.ub4spare[39]:                  0 ; 0x188: 0x00000000
kfdhdb.ub4spare[40]:                  0 ; 0x18c: 0x00000000
kfdhdb.ub4spare[41]:                  0 ; 0x190: 0x00000000
kfdhdb.ub4spare[42]:                  0 ; 0x194: 0x00000000
kfdhdb.ub4spare[43]:                  0 ; 0x198: 0x00000000
kfdhdb.ub4spare[44]:                  0 ; 0x19c: 0x00000000
......

通过x$KFDAT确认asm file的au信息

X$KFDAT (metadata, disk-to-AU mapping table)

该视图的结构图,以及字段含义
x$kfdat

example:

查找spfile的au信息:

sys@+ASM1> select GROUP_KFDAT,NUMBER_KFDAT,AUNUM_KFDAT from x$kfdat where
   fnum_kfdat=(select file_number from v$asm_alias where name='spfiletest1.ora');
GROUP_KFDAT NUMBER_KFDAT AUNUM_KFDAT
----------- ------------ -----------
          1            3         101
          1           20         379


通过以上可以知道,spfile存在在1号diskgroup的3号磁盘和20号磁盘的2个au,文件所在位置在3号磁盘的相对硬盘第一个au的位置为101号,在20号磁盘相对的位置为379号.x$kfdat的AUNUM_KFDAT字段与x$kffxp视图中的au_kffxp为同一信息.我的diskgroup的为normal冗余模式,所以3.101au和20.379au为mirror关系,可以通过x$kffxp.xnum_kffxp验证,mirror的2个au相关的extent number 一致.x$kfdat与x$kffxp有些字段是关联的.

在12c的asm中,oracle在asmcmd中新增2个命令来确认asm file相关au信息,分别为mapextent以及map au:

example:

ASMCMD>  mapextent '+ORCL_MYTEST/ORCL/DATAFILE/mytest.256.8332901607' 1
Disk_Num         AU      Extent_Size
1                211     1
0                211     1

ASMCMD> mapau
usage: mapau [--suppressheader] <dg number> <disk number> <au>
help:  help mapau
ASMCMD> mapau 1 1 107
File_Num         Extent          Extent_Set
261              1273            636

ORA-63999 data file suffered media failure 导致实例Crash

KCF: read, write or open error, block=0xb79ab online=1
        file=85 '/dev/vgpmesdb12/rLV_FEM_PRD_I02'
        error=27063 txt: 'HPUX-ia64 Error: 11: Resource temporarily unavailable
Additional information: -1
Additional information: 8192'
Encountered write error

*** 2015-06-17 20:06:22.714
DDE rules only execution for: ORA 1110
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
----- START DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----
Successfully dispatched
----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (SUCCESS, 0 csec) -----
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
error 63999 detected in background process
ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 85 (block # 752043)
ORA-01110: data file 85: '/dev/vgpmesdb12/rLV_FEM_PRD_I02'
ORA-27063: number of bytes read/written is incorrect
HPUX-ia64 Error: 11: Resource temporarily unavailable
Additional information: -1
Additional information: 8192
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+544<-kjzdssdmp()+400<-kjzduptcctx()+432<-kjzdicrshnfy()+128<-$cold_ksuitm()+5872<-$cold_ksbrdp()+2704<-opirip()+1296<-opidrv()+1152<-sou2o()+256<-opimai_real()+352<-ssthrdmain()+576<-main()+336<-main_opd_entry()+80
----- End of Abridged Call Stack Trace -----

*** 2015-06-17 20:06:23.172
DBW1 (ospid: 5833): terminating the instance due to error 63999
ksuitm: waiting up to [5] seconds before killing DIAG(5807)

数据库文件所在设备的部分io错误,导致实例宕机。由于报错ORA-27063: number of bytes read/written is incorrect,跟踪下来是初步怀疑坏块导致,通过效验后未发现坏块,HPUX-ia64 Error: 11: Resource temporarily unavailable的错误引入考虑,初步怀疑是hp的系统内部在做一些操作时候导致/dev/vgpmesdb12/rLV_FEM_PRD_I02设备无法被访问到。
随后在官方找到相似bug 16884689 : DATABASE CRASH DUE TO ORA-27063 HPUX-IA64 ERROR: 11.从整体的诊断看,问题的原因还是因为出现了io问题导致的,而且集群内部是在发生io问题后才发现数据库资源的问题,所以需判断是否hp系统或者io系统各模块的问题.与发现的bug不同场景.

文件io错误时候实例重启受_datafile_write_errors_crash_instance控制影响。

参考:
1.


Description

This fix introduces a notable change in behaviour in that
from 11.2.0.2 onwards an I/O write error to a datafile will
now crash the instance.

Before this fix I/O errors to datafiles not in the system tablespace
offline the respective datafiles when the database is in archivelog mode.
This behavior is not always desirable. Some customers would prefer
that the instance crash due to a datafile write error.

This fix introduces a new hidden parameter to control if the instance
should crash on a write error or not:
 _datafile_write_errors_crash_instance



With this fix:
 If _datafile_write_errors_crash_instance = TRUE (default) then
  any write to a datafile which fails due to an IO error causes
  an instance crash.

 If _datafile_write_errors_crash_instance = FALSE then the behaviour
  reverts to the previous behaviour (before this fix) such that
  a write error to a datafile offlines the file (provided the DB is
  in archivelog mode and the file is not in SYSTEM tablespace in
  which case the instance is aborted)

2.

This is due to a problem with the I/O subsystem.
Issues of this nature are common when there is a problem in the I/O subsystem.
This can include, but is not limited to:

2.1 A bad sector on disk
2.2 An I/O card that is starting to fail
2.3 A bad array cable
2.4 An interruption in network connectivity, in the case of NFS mounts
2.5 Could also be caused by a OS level bug.
etc.
Review the OS Messages file as this will almost certainly reflect errors (for example   Error for Command: write(10) )