Skip to content

未分类 - 8. page

正常恢复控制文件并MOUNT数据库
环境:
生产环境进行备份数据库、控制文件
测试环境在之前执行过异机恢复任务,并进行过RESETLOGS操作,并遗留下每次RESETLOGS后的归档日志。

在使用生产环境的备份进行恢复时,报错:
RMAN-06026: some targets not found – aborting restore
RMAN-06100: no channel to

故障现象:
RMAN> restore database; —>此时进行恢复,报错检测此时RMAN-06026

Starting restore at 12-JUN-16
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=156 devtype=DISK

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 06/12/2016 14:27:13
RMAN-06026: some targets not found – aborting restore
RMAN-06100: no channel to restore a backup or copy of datafile 7
RMAN-06100: no channel to restore a backup or copy of datafile 6
RMAN-06100: no channel to restore a backup or copy of datafile 5
RMAN-06100: no channel to restore a backup or copy of datafile 4
RMAN-06100: no channel to restore a backup or copy of datafile 3
RMAN-06100: no channel to restore a backup or copy of datafile 2
RMAN-06100: no channel to restore a backup or copy of datafile 1

使用生产环境的备份手动传输到本地时遇到此问题,使用带库(保存15天备份集)时则去查找5/29号的备份。

—————-排查过程
1.查看此时ALERT日志:
Sun Jun 12 13:18:38 2016
Starting ORACLE instance (normal)
……………………省略实例启动的输出
Sun Jun 12 13:18:39 2016
alter database mount —>RMAN中进行恢复控制文件restore controlfile from ‘/oracle/ctlbak’并发出MOUNT命令
Sun Jun 12 13:18:43 2016
Setting recovery target incarnation to 2 —>控制文件中记录的incarnation信息
Sun Jun 12 13:18:43 2016
Successful mount of redo thread 1, with mount id 2622485423
Sun Jun 12 13:18:43 2016
Database mounted in Exclusive Mode
Completed: alter database mount
Sun Jun 12 13:20:52 2016
New incarnation branch detected in ArchiveLog, filename /oracle/product/10.2.0/flash_recovery_area/TESTA/archivelog/2016_05_30/o1_mf_
1_1_cnsfxbfv_.arc
Inspection of file changed rdi from 2 to 3 —>检测到闪回区中的归档信息,incarnation信息增加
Setting recovery target incarnation to 3
Sun Jun 12 13:20:52 2016
Setting recovery target incarnation to 3
Setting recovery target incarnation to 3
Sun Jun 12 13:20:52 2016
New incarnation branch detected in ArchiveLog, filename /oracle/product/10.2.0/flash_recovery_area/TESTA/archivelog/2016_06_09/o1_mf_
1_1_comr5bsb_.arc
Inspection of file changed rdi from 3 to 4—>检测到闪回区中的归档信息,incarnation信息增加
Setting recovery target incarnation to 4
Sun Jun 12 13:20:52 2016

2.RMAN中信息的查询:
RMAN> list incarnation; —>检测此时incarnation信息

using target database control file instead of recovery catalog

List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
——- ——- ——– —————- — ———- ———-
1 1 TESTA 2458714232 PARENT 1 22-JUL-05
2 2 TESTA 2458714232 PARENT 551299 07-MAY-11
3 3 TESTA 2458714232 PARENT 100554368532 30-MAY-16
4 4 TESTA 2458714232 CURRENT 100913181743 09-JUN-16

3.对RESTORE命令进行TRACE并分析:
rman target / debug trace=/oracle/olm/rman_debug
登陆后执行:RMAN> restore database;
TRACE文件内容分析:–259行
DBGMISC: ENTERED krmice [14:34:31.468]
DBGMISC: command to be compiled and executed is: restore [14:34:31.468] (krmice)
DBGMISC: command after this command is: NONE [14:34:31.468] (krmice)
DBGMISC: current incarnation does not matter for restore [14:34:31.468] (krmice)
DBGMISC: ENTERED krmkcrsr [14:34:31.468]

解决办法:
1.第一步MOVE或删除闪回区的归档
2.删除已经恢复的控制文件,重新对控制文件进行恢复。
此时观察ALRET日志中信息,未出现有“检测到闪回区中的归档信息,incarnation信息增加”的情况。
Sun Jun 12 14:44:26 2016
alter database mount
Sun Jun 12 14:44:30 2016
Setting recovery target incarnation to 2
Sun Jun 12 14:44:30 2016
Successful mount of redo thread 1, with mount id 2622505418
Sun Jun 12 14:44:30 2016
Database mounted in Exclusive Mode
Completed: alter database mount
Sun Jun 12 14:46:34 2016
Full restore complete of datafile 2

3.此时进行RECOVER操作,可以正常进行。
后续实验中发现,归档放在非闪回区时,没有这个问题。

因incarnation导致恢复时出现RMAN-06026错误

1. 11gR2 RAC 角色分离信息
11gR2中安装oracle 集群件和数据库软件中的一些group进行简单的介绍。
oinstall : 这个group是GI 和RDBMS软件的拥有者。
dba : 这个group是数据库的dba group, 对数据库具有最高权限。
asmdba : 这个group是asm实例的dba group, 可以启动/关闭实例,挂载/卸载asm 磁盘组。
asmadmin: 这个group是asm的管理员group,它包含asmdba的全部权限,同时还可以增加/删除 asm 磁盘,磁盘组等。

2. ASM共享磁盘及orale/grid用户及GI_HOME/RDBMS_HOME bin目录的oracle程序正确权限
————GI及RDBMS软件安装完成,未DBCA创建数据库时的权限:
[root@bys1 ~]# su – oracle
[oracle@bys1 ~]$ cd $ORACLE_HOME/bin
[oracle@bys1 bin]$ ls -al oracle –RDBMS_HOME的
-rwsr-s–x 1 oracle oinstall 239626665 Nov 9 21:31 oracle
[grid@bys1 ~]$ cd $ORACLE_HOME/bin
[grid@bys1 bin]$ ls -al oracle –GRID_HOME的
-rwsr-s–x 1 grid oinstall 209914471 Nov 9 19:07 oracle
[grid@bys1 bin]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.4.0]

————DBCA创建数据库后的正确权限示例, 即正常的权限:
[oracle@bys1 bin]$ ls -al oracle
-rwsr-s–x 1 oracle asmadmin 239626665 Nov 9 21:31 oracle

[grid@bys1 ~]$ cd $ORACLE_HOME/bin
[grid@bys1 bin]$ ls -al oracle
-rwsr-s–x 1 grid oinstall 209914471 Nov 9 19:07 oracle
[grid@bys1 bin]$ id oracle
uid=502(oracle) gid=501(oinstall) groups=501(oinstall),502(dba),506(asmdba)
[grid@bys1 bin]$ id grid
uid=501(grid) gid=501(oinstall) groups=501(oinstall),504(asmadmin),506(asmdba),507(asmoper)

3.RAC中常见的因GI/RDBMS HOME中oracle程序权限或者ASM使用的磁盘权限有问题引起的问题汇总:

3.1 sqlplus登陆时报错:ORA-12547: TNS:lost contact
在RAC中常见的还有oracle程序的权限不对,
ORA-12547 Errors
The error ORA-12547 indicates that the communication channel has been broken. It’s most often thrown because the other end of the process went away unexpectedly.

参考MOS文档:
Note 1307075.1 Oracle Database Fails to Start with Error ORA-12547
Note 381566.1 connect / as sysdba Fails with Ora-12547 And Tns-12514
ORA-12537 / ORA-12547 or TNS-12518 if Listener (including SCAN Listener) and Database are Owned by Different OS User (文档 ID 1069517.1)
Note 744512.1 Ora-12547: Tns:Lost Contact Creating Database After Clean Installation
导致 Scan VIP 和 Scan Listener(监听程序)出现故障的最常见的 5 个问题 (文档 ID 1602038.1)

3.2 安装完GI与RDBMS软件,未使用DBCA创建数据库。使用手动恢复数据库方法,此时rdmbs_home下oracle程序权限问题引发的错误
RMAN> restore controlfile from ‘/home/oracle/fulldb_SCTTEST_900418795_84′;

Starting restore at 06-JAN-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=98 instance=scttest1 device type=DISK

channel ORA_DISK_1: restoring control file
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 01/06/2016 13:20:29
ORA-19870: error while restoring backup piece /home/oracle/fulldb_SCTTEST_900418795_84
ORA-19504: failed to create file “+DATA/scttest/controlfile/control01.ctl”
ORA-17502: ksfdcre:3 Failed to create file +DATA/scttest/controlfile/control01.ctl
ORA-15001: diskgroup “DATA” does not exist or is not mounted
ORA-15055: unable to connect to ASM instance
ORA-01034: ORACLE not available
ORA-27123: unable to attach to shared memory segment
其它类似报错类型:
SQL> create spfile=’+DATA/jdedb/spfilejdedb1.ora’ from pfile;
create spfile=’+DATA/jdedb/spfilejdedb1.ora’ from pfile
*
ERROR at line 1:
ORA-17502: ksfdcre:4 Failed to create file +DATA/jdedb/spfilejdedb1.ora
ORA-15001: diskgroup “DATA” does not exist or is not mounted
ORA-15040: diskgroup is incomplete

查看alert日志:
Mon Nov 07 14:12:19 2016
Decreasing number of real time LMS from 2 to 0
Mon Nov 07 14:17:50 2016
ORA-15025: could not open disk “/dev/raw/raw2”
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
Mon Nov 07 14:17:50 2016
SUCCESS: diskgroup DATA was dismounted
ERROR: diskgroup DATA was not mounted

3.3 ASM使用的磁盘权限问题导致的报错
11.2.0.3中报错信息如下:
ORA-15045:ASM file name ‘+DATA1’ is not in reference form
ORA-17502:ksfdcre:5 Failed to create file +DATA1
ORA-15081:failed to submit an I/0 operation to a disk;

[oracle@bys1 bin]$ ls -al /dev/sdc /dev/sdd –这可以看到只有GRID用户可读写,要修改。
brw-r—– 1 grid asmadmin 8, 32 Apr 11 19:24 /dev/sdc

4.LINUX “suid”和“sgid”权限简介
Linux 权限模型有两个专门的位,叫做“suid”和“sgid”。当设置了一个可执行程序的“suid”这一位时,它将代表可执行文件的所有者运行,而不是代表启动程序的人运行。
关于6751权限的说明:
6751分别指定了ugoa的权限:
第一位6是suid+sgid权限
第二位7代表g(组)有读、写、执行权限
第三位5代表o(其它用户)有读、执行权限
第四位1代表a(所有者、组、其它用户)有执行权限
详细可以参考其它BLOG:

http://www.cnblogs.com/snake-hand/p/3161511.html

http://blog.csdn.net/xiaocainiaoshangxiao/article/details/17378611
—————————-本文内容参考MOS文档:
Oracle Database Fails to Start with Error ORA-12547 (文档 ID 1307075.1)
“Connected to an Idle Instance” Message when Connecting Bequeath to a Running Instance (文档 ID 435044.1)
Troubleshooting when srvctl can’t start RAC instance, but sqlplus can start it (文档 ID 844272.1)
10gR2 Database Creation Fails with 11gR2 ASM storage: ORA-15045, ORA-17502, ORA-15081 [ID 1384180.1]
Database Creation on 11.2 Grid Infrastructure with Role Separation ( ORA-15025, KFSG-00312, ORA-15081 ) (文档 ID 1084186.1)
ORA-00600 [kfioTranslateIO03] [17090] (Doc ID 1336846.1)
ORA-15183 Unable to Create Database on Server using 11.2 ASM and Grid Infrastructure (文档 ID 1054033.1)
Startup Instance Failed with ORA-27140 ORA-27300 ORA-27301 ORA-27302 and ORA-27303 on skgpwinit6 (文档 ID 1274030.1)
https://blogs.oracle.com/Database4CN/entry/%E4%BB%BB%E5%8A%A1%E8%A7%92%E8%89%B2%E5%88%86%E7%A6%BB_job_role_separation_%E7%AE%80%E4%BB%8B

11gR2 rac中用户角色分离及常见oracle bin程序及ASM磁盘权限问题汇总

如果授予用户connect,resource时,此时用户已经有了UNLIMITED TABLESPACE权限。
此时如果授予用户DBA权限,然后再回收DBA角色;此时会收回UNLIMITED TABLESPACE权限。
近期遇到过此种情况,导致业务用户无法使用表空间,造成较为严重的事故,记录一下。

测试如下;
1.创建用户并授予connect,resource、dba权限并验证
SQL> create user test identified by test;

User created.

SQL> grant connect,resource to test;

Grant succeeded.

SQL>
SQL> select * from dba_role_privs where grantee=’TEST’;

GRANTEE GRANTED_ROLE ADM DEF
—————————— —————————— — —
TEST RESOURCE NO YES
TEST CONNECT NO YES

SQL> select * from dba_sys_privs where grantee=’TEST’;

GRANTEE PRIVILEGE ADM
—————————— —————————————- —
TEST UNLIMITED TABLESPACE NO

SQL> grant dba to test;

Grant succeeded.

SQL> select * from dba_role_privs where grantee=’TEST’;

GRANTEE GRANTED_ROLE ADM DEF
—————————— —————————— — —
TEST RESOURCE NO YES
TEST DBA NO YES
TEST CONNECT NO YES

SQL> select * from dba_sys_privs where grantee=’TEST’;

GRANTEE PRIVILEGE ADM
—————————— —————————————- —
TEST UNLIMITED TABLESPACE NO

2.回收dba权限并检查权限
SQL> revoke dba from test;

Revoke succeeded.

SQL> select * from dba_role_privs where grantee=’TEST’;

GRANTEE GRANTED_ROLE ADM DEF
—————————— —————————— — —
TEST RESOURCE NO YES
TEST CONNECT NO YES

SQL> select * from dba_sys_privs where grantee=’TEST’;

no rows selected

SQL> grant connect,resource to test;

Grant succeeded.

SQL> select * from dba_role_privs where grantee=’TEST’;

GRANTEE GRANTED_ROLE ADM DEF
—————————— —————————— — —
TEST RESOURCE NO YES
TEST CONNECT NO YES

SQL> select * from dba_sys_privs where grantee=’TEST’;

GRANTEE PRIVILEGE ADM
—————————— —————————————- —
TEST UNLIMITED TABLESPACE NO

用户有connect,resource,dba角色权限后回收dba权限导致无UNLIMITED TABLESPACE权限造成业务中断

从OS层面上,建议您监控内存的使用是否逐步在增加,是否达到某一个阈值的时候就会触发该错误。
svmon -G -i 2 2
ipcs -a

按照内存使用对进程进行排序,以发现是哪些进程使用了较多的内存,
su –
#ps avx |head -1 ;ps avx |grep -v PID |sort -rn +6 > ps_avx.output

对于这些进程,查看它们究竟为什么使用了较多内存:
svmon -P <PID>

相关文档可以参考:
AIX: Determining Oracle Memory Usage On AIX [ID 123754.1]
Diagnosing Oracle Memory on AIX using SVMON ( Doc ID 166491.1 )
AIX: Database performance gets slower the longer the database is running [ID 316533.1]

运行oracle数据库的AIX系统内存使用率高时的排查思路

遇到11gr2 rac WRH$_ACTIVE_SESSION_HISTORY未自动清理导致SYSAUX空间过度增长;但是和网上其它的不太一样,查询SNAP快照信息,和select a.snap_interval,a.retention,a.topnsql from dba_hist_wr_control a; 中配置未发生变化,分区和SNAP对应信息也是最近7天的,没问题;查询基线的信息,也没问题。

只有WRH$_ACTIVE_SESSION_HISTORY的一个WRH$_ACTIVE_SES_MXDB_MXSN分区异常,是6GB。但是查不出此分区对应的SNAP信息。

使用MOS文档:WRH$_ACTIVE_SESSION_HISTORY Does Not Get Purged Based Upon the Retention Policy (文档 ID 387914.1)
中的方法alter session set “_swrf_test_action” = 72;解决了此问题。

附一下此MOS文档的解决方法:

Cause
Oracle decides what rows need to be purged based on the retention policy. There is a special mechanism which is used in the case of the large AWR tables where we store the snapshot data in partitions. One method of purging data from these tables is by removing partitions that only contain rows that have exceeded the retention criteria. During the nightly purge task, we only drop the partition if all the data in the partition has expired. If the partition contains at least one row which, according to the retention policy shouldn’t be removed, then the partition won’t be dropped and as such the table will contain old data.

If partition splits do not occur (for whatever reason), then we can end up with a situation where we have to wait for the latest entries to expire before the partition that they sit in can be removed. This can mean that some of the older entries can be retained significantly past their expiry date. The result of this is that the data is not purged as expected.

Solution
A potential solution to this issue is to manually split the partitions of the partitioned AWR objects such that there is more chance of the split partition being purged.You will still have to wait for all the rows in the new partitions to reach their retention time but with split partitions there is more chance of this happening. you can manually split the partitions using the following undocumented command:

alter session set “_swrf_test_action” = 72;
To perform a single split of all the AWR partitions.

Check the partition details for the offending table before the split:
SELECT owner,
segment_name,
partition_name,
segment_type,
bytes/1024/1024/1024 Size_GB
FROM dba_segments
WHERE segment_name=’WRH$_ACTIVE_SESSION_HISTORY’;
Split the partitions so that there is more chance of the smaller partition being purged:
alter session set “_swrf_test_action” = 72;
NOTE: This command will split partitions for ALL partitioned AWR objects. It also initiates a single split; it does not need to be disabled and will need to be repeated if multiple splits are required.

Check the partition details for the offending table after the split:
SELECT owner,
segment_name,
partition_name,
segment_type,
bytes/1024/1024/1024 Size_GB
FROM dba_segments
WHERE segment_name=’WRH$_ACTIVE_SESSION_HISTORY’;

With smaller partitions it is expected that some will be automatically removed when the retention period of all the rows within each partition is reached.

As an alternative, you could purge data based upon a snapshot range. Depending on the snapshots chosen, this may remove data that has not yet reached the retention limit so this may not be suitable for all cases. The following output shows the min and max snapshot_id in each partition.

set serveroutput on
declare
CURSOR cur_part IS
SELECT partition_name from dba_tab_partitions
WHERE table_name = ‘WRH$_ACTIVE_SESSION_HISTORY’;

query1 varchar2(200);
query2 varchar2(200);

TYPE partrec IS RECORD (snapid number, dbid number);
TYPE partlist IS TABLE OF partrec;

Outlist partlist;
begin
dbms_output.put_line(‘PARTITION NAME SNAP_ID DBID’);
dbms_output.put_line(‘————————— ——- ———-‘);

for part in cur_part loop
query1 := ‘select min(snap_id), dbid from sys.WRH$_ACTIVE_SESSION_HISTORY partition (‘||part.partition_name||’) group by dbid’;
execute immediate query1 bulk collect into OutList;

if OutList.count > 0 then
for i in OutList.first..OutList.last loop
dbms_output.put_line(part.partition_name||’ Min ‘||OutList(i).snapid||’ ‘||OutList(i).dbid);
end loop;
end if;

query2 := ‘select max(snap_id), dbid from sys.WRH$_ACTIVE_SESSION_HISTORY partition (‘||part.partition_name||’) group by dbid’;
execute immediate query2 bulk collect into OutList;

if OutList.count > 0 then
for i in OutList.first..OutList.last loop
dbms_output.put_line(part.partition_name||’ Max ‘||OutList(i).snapid||’ ‘||OutList(i).dbid);
dbms_output.put_line(‘—‘);
end loop;
end if;

end loop;
end;
/

Once you have split the partitions and identified a partition with a range of snap ids that can be deleted, you can free up the memory by dropping a snapshot range than matches the high and low snap_ids for the partition:

DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE(
low_snap_id IN NUMBER,
high_snap_id IN NUMBER
dbid IN NUMBER DEFAULT NULL);

11gr2 rac WRH$_ACTIVE_SESSION_HISTORY未自动清理导致SYSAUX空间过度增长