Skip to content

Database - 49. page

浅谈ORA-12545 / TNS-12545故障诊断思路

前一个刚经历rac安装完后遭遇的ORA-12545错误,这里就顺便把这个错误的诊断思路理出来,毕竟这个错误在10g后还是比较常见,本文只是对这个故障的处理诊断思路做一些经验上的讨论,纯为有趣。

12545的报错提示为如下:

ORA-12545 / TNS-12545 Connect failed because target host or object does not exist

先往下聊吧,这类错误通常是和hostname相关配置不正确有关,举个例子

[ora10g@ludatou ~]$ tnsping lu10g
TNS Ping Utility for Linux: Version 10.2.0.4.0 - Production on 15-APR-2014 12:08:27
Copyright (c) 1997,  2007, Oracle.  All rights reserved.
Used parameter files:

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = ludatou)(PORT = 1523)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = lu10g)))
OK (10 msec)

[ora10g@ludatou ~]$ sqlplus luda/luda@lu10g
SQL*Plus: Release 10.2.0.4.0 - Production on Tue Apr 15 12:12:15 2014
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

=====这里我的tnsnames配置的解析主机名是ludatou,而且采用网络验证方式登录成功
[ora10g@ludatou ~]$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
192.168.102.128         ludatou   luda
=====在这边我把/etc/hosts文件的主机名ludatou变更为ludatouxx,变更结果如下
[ora10g@ludatou ~]$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
192.168.102.128         ludatouxx   luda
 

再次在客户端(另外一个主机)远程登录,在响应了很久之后报错

[root@ludatou ~]# su - oracle
[oracle@ludatou ~]$ sqlplus luda/luda@lu10g
SQL*Plus: Release 11.1.0.7.0 - Production on Tue Apr 15 12:13:40 2014
Copyright (c) 1982, 2008, Oracle.  All rights reserved.

ERROR:
ORA-12545 Connect failed because target host or object does not exist
 

下来以分类场景的方式来描述这类故障的诊断:
场景1:
当连接数据库不通过listener的时候,而是使用BEQ协议来连接数据库,这个时候如果$ORACLE_HOME/bin目录下oracle文件丢失或者损坏,用户执行权限不足,以及Oracle_home(多个安装版本)的路径设置错误的情况下,使用beq协议连接oracle的用户就会遭遇ORA/TNS-12545的错误。
关于BEQ协议可以参考如下解释:

This is the bequeth oracle process. The bequeth process starts first. Later all the processes related to particular database are controlled by its bequeth process. 

场景2:
这个场景就是上面我所做的测试场景,由客户端通过listner连接服务端oracle数据库,这个时候如果客户端的tnsnames解析文件中对应service name部分的hostname和服务端的hostname不匹配,就会报错ora-12545.
在这个场景中可以使用nslookup命令来搜索对应的主机,按照上面的例子,这里使用nslookup命令搜索ludatou主机时候则会出现以下情况:

$ nslookup ludatou
Server: 192.168.102.128
Address: 192.168.1.102#53
** server can't find ludatou: ==>Indicates ludatou is not resolvable on the machine

这个场景的解决办法就是把客户端的tnsnames中的hostname部分写为ip,避免服务端主机名变更导致配置不符的情况出现。

场景3:
参考 RAC实施完遭遇ORA-12545

Cause: One of the hostname (which corresponds to public IP or VIP) is not reachable from this client machine.
When the server side load balancing is enabled in the RAC setup, the listener will redirect the connection to the least loaded node.While doing so, the server sends the packet NSPTRD containing the hostname of the corresponding machine.

The Remote Service Handler value registered with the remote Listener process via the REMOTE_LISTENER parameter is built by the LOCAL_LISTENER value on the local server.So, its necessary to check whether the local_listener / remote listener information are reflected properly in the listener services output as well.

Diagnosis:
Enable the oracle sqlnet client tracing at support level, and reproduce the issue.In the generated client trace, you would see the below information:
After NSPTCN Connect to the listener , listener sends

场景4:
listerner.ora的hostname配置错误,导致监听无法找寻到正确的监听地址,也会产生ora/tns-12545的错误。具体监听日志报错如下:

Error listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=abc)(PORT=1522)))
TNS-12545: Connect failed because target host or object does not exist
TNS-12560: TNS:protocol adapter error
TNS-00515: Connect failed because target host or object does not exist

如果打开了sqlnet的trace跟踪,可以发现类似如下的报错:

nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=abc)(PORT=1522)))
nttbnd2addr: entry
snlinGetAddrInfo: entry
snlinGetAddrInfo: Invalid IP address string abc
snlinFreeAddrInfo: entry
snlinFreeAddrInfo: exit
snlinGetAddrInfo: exit
nttbnd2addr: looking up IP addr for host: abc
snlinGetAddrInfo: entry
snlinGetAddrInfo: Name resolution failed for abc
snlinFreeAddrInfo: entry
snlinFreeAddrInfo: exit
snlinGetAddrInfo: exit
nttbnd2addr: *** hostname lookup failure! ***
nttbnd2addr: exit
nserror: entry
nserror: nsres: id=0, op=78, ns=12545, ns2=12560; nt[0]=515, nt[1]=0, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0

这类问题解决办法就是检测监听的配置是否和主机的ip,hostname配置吻合,不吻合的情况下会造成12545的错误。

场景5:
操作系统或者硬件问题也会导致产生ora/tns-12545的产生。
5.1windows 2000 with Service Pack 3 会导致这个ora/tns-12545 的产生
http://support.microsoft.com/default.aspx?scid=kb;en-us;329405
5.2Solaris系统在配置不正确情况下也会导致ora/tns-12545的产生

以上的场景描述是对1254错误的故障做一些分类分析,大概的诊断思路也是这样。

当然以上只是针对12545,oracle网络的错误有很多,对于诊断oracle网络错误,我个人的习惯诊断流程是如下
1.check listener.log确认是否有报错信息
2.check syslog是否有设备或者系统的报错信息
3.check $ORACLE_HOME/bin/oracle文件是否存在以及权限正确与否
4.check tnsnames,sqlnet,listener的配置是否与主机对应
5.在rac环境下,我会优先检查节点和client的通信情况,网络配置文件是否配置正确,最后才是taf和ld的影响
6.必要时候开启trace进一步跟踪详细信息

ORA-01628: max # extents (32765) for rollback seg

这个故障的情形是这样的,不停报错01628,业务挂起,回滚表空间从10g扩到32g后,也会产生同样的报错,而且回滚使用率100%。
Oracle版本: 10201
系统平台: HP-UN
这个错误我曾经在9i的版本上遭遇,但是不是bug,当时是更改回滚的storage参数设置extent限制为unlimited解决,而这里10201版本下设置为unlimited则无用,与官方文档反复效验后确认和bug 6499872吻合。
最终通过升级数据库的办法规避此bug,同时也再次说明,选择一个稳定的数据库版本是多么的重要。

官方描述如下:

Applies to:
Oracle Database - Enterprise Edition - Version 9.2.0.1 to 11.1.0.7 [Release 9.2 to 11.1]
Information in this document applies to any platform.
Details
Bug 6499872  ORA-01628: max # extents (32765) for rollback seg

 This note gives a brief overview of bug 6499872.
 The content was last updated on: 12-DEC-2008
 Click here for details of each of the sections below.
Affects:

    Product (Component) 	Oracle Server (Rdbms)
    Range of versions believed to be affected 	Versions >= 9.2 but BELOW 11.2
    Versions confirmed as being affected

        10.2.0.3
        10.2.0.2
        10.1.0.5
        9.2.0.8

    Platforms affected 	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in

        11.1.0.7 (Server Patch Set)
        10.2.0.4 (Server Patch Set)

Symptoms:

Related To:

    ORA-1628



    System Managed Undo (SMU)

Description

    Once an undo segment was filled up with maxextents and the transaction got an
    ORA-1628, other transactions that bound to the same undo segment also will get same
    error code.

    REDISCOVERY INFORMATION:
    Repeated ORA-1628 errors with AUM and

    * many offline undo segments.  More than the average number of transactions
    * The undo segment referenced on error ORA-1628 will have high number of unexpired extents

    The following queries can be used to identify the bug:

    alter session set nls_date_format='mm/dd/yy hh24:mi:ss';
    set pagesize 200
    set linesize 200
    set echo on

    select sysdate from dual;

    COL undots NOPRINT NEW_VALUE p_undots FORMAT A20;
    SELECT value FROM gv$parameter WHERE name='undo_tablespace' and inst_id=sys_context('userenv','instance');

    select (nvl(sum(bytes),0)) from dba_free_space  where  tablespace_name = '&&p_undots';

    select sum(bytes)  from dba_data_files where tablespace_name = (select value from gv$parameter
    where name='undo_tablespace' and inst_id=sys_context('userenv','instance') );

    select segment_name,  nvl(sum(act),0) "ACT BYTES",    nvl(sum(unexp),0) "UNEXP BYTES", nvl(sum(exp),0) "EXP BYTES"
    from ( select segment_name,          nvl(sum(bytes),0) act,00 unexp, 00 exp
           from    DBA_UNDO_EXTENTS
           where status='ACTIVE'  and tablespace_name =   '&&p_undots'
           group by segment_name
           union
           select segment_name,00 act, nvl(sum(bytes),0) unexp, 00 exp from    DBA_UNDO_EXTENTS
           where status='UNEXPIRED'  and tablespace_name =  '&&p_undots'
           group by segment_name
           union
           select segment_name,            00 act, 00 unexp, nvl(sum(bytes),0) exp
           from    DBA_UNDO_EXTENTS
           where status='EXPIRED'  and tablespace_name =   '&&p_undots'
           group by segment_name
          ) group by segment_name;

    select status,count(*)  from dba_rollback_segs  group by status;



Please note: The above is a summary description only. Actual symptoms can vary. Matching to any symptoms here does not confirm that you are encountering this problem. For questions about this bug please consult Oracle Support.
References

    Bug:6499872 (This link will only work for PUBLISHED bugs)
    Note:245840.1 Information on the sections in this article

RAC安装完成之后客户端连接数据库报错ORA-12545

日前一次一个朋友装完一套rac后,客户端连接rac数据库时候报错,错误如下:

ORA-12545: Connect failed because target host or object does not exist./pre>

Rac的监听机制我就不解释了,12545的错误基本和Tnsnames里的连接通配符里的host设置有关,刚装完的rac报错,大部分都是因为采用了节点的hostname作为解析字符串中的host,而客户端一般从业务的连续冗余性角度应该要用vip对应的ip或者hostname来匹配网络连接字符串,这2者不一致导致了12545的出现;当然也有可能是设置了VIP,但是客户端与服务端的VIP无法通信,也会造成ora-12545的错误,这种情况是因为负载均衡打开来后,listener会分配新的请求进程到负载较低的节点,如果负载低的节点正好和客户端无法通信,就造成了12545的错误。现在问题基本上解决思路就有一些了,遇到rac的这个错误,首先就是检查你本机的local_listener,remote_listener是否设置的对应host为其他的错误的ip或者主机名,一般新装的rac这个一般都为localhost。明白了为题所在,解决思路也就出来了,只要把实例的local_listener设置为对应vip,客户端采用vip的解析方式即可解决,也应配合检查rac的配置以及TAF和LOAD BLANCE的开启情况。

这个情况一般出现在刚装完rac之后。报错如果采用trace level 16的跟踪可以发现类似如下的日志:

[05-APR-2004 11:32:55] nttbnd2addr: looking up IP addr for host: myhost.oracle.com

[05-APR-2014 11:32:55] nttbnd2addr: *** hostname lookup failure! ***
[05-APR-2014 11:32:55] nttbnd2addr: exit
[05-APR-2014 11:32:55] nserror: nsres: id=0, op=77, ns=12545, ns2=12560; nt[0]=515, nt[1]=145,
nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
[05-APR-2014 11:32:55] nsmfr: 207 bytes at 0xf2a18
[05-APR-2014 11:32:55] nsmfr: 140 bytes at 0xef078
[05-APR-2014 11:32:55] nladtrm: entry
[05-APR-2014 11:32:55] nladtrm: exit
[05-APR-2014 11:32:55] nioqper: error from nscall
[05-APR-2014 11:32:55] nioqper: nr err code: 0
[05-APR-2014 11:32:55] nioqper: ns main err code: 12545
[05-APR-2014 11:32:55] nioqper: ns (2) err code: 12560
[05-APR-2014 11:32:55] nioqper: nt main err code: 515
[05-APR-2014 11:32:55] nioqper: nt (2) err code: 145
[05-APR-2014 11:32:55] nioqper: nt OS err code: 0
[05-APR-2014 11:32:55] niomapnserror: entry
[05-APR-2014 11:32:55] niqme: entry
[05-APR-2014 11:32:55] niqme: reporting NS-12545 error as ORA-12545
[05-APR-2014 11:32:55] niqme: exit
[05-APR-2014 11:32:55] niomapnserror: returning error 12545
[05-APR-2014 11:32:55] niomapnserror: exit
[05-APR-2014 11:32:55] niotns: Couldn't connect, returning 12545
...

正确的local_listener配置为如下:

alter system set local_listener='(ADDRESS =(PROTOCOL=TCP)(HOST=vip_host1)(PORT=1521))' scope=both sid='luda1';
alter system set local_listener='(ADDRESS =(PROTOCOL=TCP)(HOST=vip_host2)(PORT=1521))' scope=both sid='luda2';

客户端采用VIP的解析方式:

LUDA =
  (DESCRIPTION =
     (ADDRESS = (PROTOCOL = TCP)(HOST = vip_host1)(PORT = 1521))
     (ADDRESS = (PROTOCOL = TCP)(HOST = vip_host2)(PORT = 1521))
     (LOAD_BALANCE=YES)
      (CONNECT_DATA=
        (SERVER=DEDICATED)
         (SERVICE_NAME=LUDA)
      )
  )

OUI-0094 安装11g和10g同平台安装遇错

在安装11g数据库后,再另外一个用户下安装oracle 10g数据库报错OUI0094错误。这个错误是因为oraInst.log文件已经写入了11g数据库的信息导致,既然知道了问题所在那解决起来就水到渠成了,具体解决方法如下:

标注:以下步骤都是在安装10g oracle用户下操作

1. 创建新的oraInst.loc文件在 $ORACLE_HOME,并更新为如下

inventory_loc=$ORACLE_HOME/oraInventory  -- 必须用绝对路径
inst_group=oinstall

2. 创建完$ORACLE_HOME/oraInst.loc后启动OUI具体如下:

./runInstaller -invPtrLoc $ORACLE_HOME/oraInst.loc

The -invPtrLoc flag is used to locate the oraInst.loc file.

3.经过以上步骤以后就可以顺利安装oracle 10g

statement suspended, wait error to be cleared

impdp 时候遭遇等待事件statement suspended, wait error to be cleared。

问题分析:

经过查验为导入设定的表空间空间不足造成的,只需要把表空间扩大这个错误就自然消失,导入不会终端。

具体的日志如下:

21:18:00 (2.0 min) 1,204 statement suspended, wait erro 939 8.21
CPU + Wait for CPU 264 2.31
db file sequential read 1 0.01
21:20:00 (2.0 min) 1,201 statement suspended, wait erro 951 8.32
CPU + Wait for CPU 249 2.18
control file sequential read 1 0.01
21:22:00 (2.0 min) 1,197 statement suspended, wait erro 942 8.24
CPU + Wait for CPU 255 2.23
21:24:00 (2.0 min) 1,201 statement suspended, wait erro 960 8.40
CPU + Wait for CPU 241 2.11
21:26:00 (2.0 min) 1,215 statement suspended, wait erro 963 8.42
CPU + Wait for CPU 250 2.19
log file parallel write 2 0.02
21:28:00 (2.0 min) 1,201 statement suspended, wait erro 952 8.33
CPU + Wait for CPU 249 2.18
21:30:00 (2.0 min) 1,202 statement suspended, wait erro 956 8.36
CPU + Wait for CPU 246 2.15
21:32:00 (2.0 min) 1,202 statement suspended, wait erro 950 8.31
CPU + Wait for CPU 252 2.20
21:34:00 (2.0 min) 1,199 statement suspended, wait erro 944 8.26
CPU + Wait for CPU 254 2.22
db file sequential read 1 0.01
21:36:00 (1 secs) 10 statement suspended, wait erro 8 0.07
CPU + Wait for CPU 2 0.02
-------------------------------------------------------------

End of Report
Report written to awrrpt_1_4_5.txt
SQL> SQL>
SQL>
SQL> select event from v$session_wait where wait_class#<>6;

EVENT
----------------------------------------------------------------
SQL*Net message to client
statement suspended, wait error to be cleared
db file scattered read
statement suspended, wait error to be cleared
statement suspended, wait error to be cleared
statement suspended, wait error to be cleared
statement suspended, wait error to be cleared
statement suspended, wait error to be cleared
db file sequential read
statement suspended, wait error to be cleared
statement suspended, wait error to be cleared

Completed: alter tablespace lisdata add datafile '+DATA01' size 4116m
Mon Dec 30 21:40:17 2013
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was resumed
Mon Dec 30 21:40:17 2013
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was resumed
Mon Dec 30 21:40:18 2013
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was resumed
Mon Dec 30 21:40:18 2013
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was resumed
Mon Dec 30 21:40:18 2013
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was resumed
Mon Dec 30 21:40:18 2013
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was resumed
Mon Dec 30 21:40:18 2013
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was resumed
Mon Dec 30 21:40:18 2013
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was resumed
Mon Dec 30 21:40:50 2013
alter tablespace lisdata add datafile '+DATA01' size 4116m
Mon Dec 30 21:41:02 2013
Completed: alter tablespace lisdata add datafile '+DATA01' size 4116m
Mon Dec 30 21:41:44 2013
alter tablespace lisdata add datafile '+DATA01' size 4116m
Mon Dec 30 21:41:59 2013
Completed: alter tablespace lisdata add datafile '+DATA01' size 4116m
Mon Dec 30 21:44:25 2013
alter tablespace lisdata add datafile '+DATA01' size 4116m
Mon Dec 30 21:44:31 2013
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was suspended due to
ORA-01652: unable to extend temp segment by 1024 in tablespace LISDATA
Mon Dec 30 21:44:31 2013
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was suspended due to
ORA-01652: unable to extend temp segment by 1024 in tablespace LISDATA
Mon Dec 30 21:44:31 2013
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was suspended due to
ORA-01652: unable to extend temp segment by 1024 in tablespace LISDATA
Completed: alter tablespace lisdata add datafile '+DATA01' size 4116m
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was resumed
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was resumed
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_02.1' was resumed