Skip to content

Dataguard - 2. page

遭遇ORA-00603/27102|Linux-x86_64 Error: 12

这个问题来的比较急,来邮件时候是下午4点多,但是数据库运行是正常的,从告警日志上看是进程无法获取内存了,导致客户端进程宕掉。

27102错误一般是oracle用户的系统资源限制设置过小导致,但是这里的情况却是不一样。

具体先看如下告警日志:
 

Fri Jul 25 12:51:08 CST 2014
Thread 1 advanced to log sequence 5862 (LGWR switch)
  Current log# 3 seq# 5862 mem# 0: +DG1/wasudb/onlinelog/group_3.263.842714473
Fri Jul 25 12:51:10 CST 2014
ARC1: Standby redo logfile selected for thread 1 sequence 5861 for destination LOG_ARCHIVE_DEST_2
Fri Jul 25 12:51:57 CST 2014
Thread 1 cannot allocate new log, sequence 5863
Checkpoint not complete
  Current log# 3 seq# 5862 mem# 0: +DG1/wasudb/onlinelog/group_3.263.842714473
Fri Jul 25 12:51:59 CST 2014
Thread 1 advanced to log sequence 5863 (LGWR switch)
  Current log# 1 seq# 5863 mem# 0: +DG1/wasudb/onlinelog/group_1.261.842714471
Fri Jul 25 12:52:02 CST 2014
ARC1: Standby redo logfile selected for thread 1 sequence 5862 for destination LOG_ARCHIVE_DEST_2
Fri Jul 25 13:01:45 CST 2014
Errors in file /oracle/admin/wasudb/udump/wasudb_ora_13870.trc:
ORA-27102: out of memory
Linux-x86_64 Error: 12: Cannot allocate memory
Additional information: 108
Additional information: 3080192
Fri Jul 25 13:01:48 CST 2014
Errors in file /oracle/admin/wasudb/udump/wasudb_ora_13870.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27102: out of memory
Linux-x86_64 Error: 12: Cannot allocate memory
Additional information: 108
Additional information: 3080192

告警的分析在第一段话就已经带出,在此时我们通过ulimitd -a查看oracle用户的分配信息,都是非常大的,不存在过小的设置。
在这种情况下非常有必要看下这个ora_13870.trc文件,此文件部分信息如下:

/oracle/admin/wasudb/udump/wasudb_ora_13870.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/10.2.0/db_1
System name:	Linux
Node name:	localhost
Release:	3.8.13-16.2.1.el6uek.x86_64
Version:	#1 SMP Thu Nov 7 17:01:44 PST 2013
Machine:	x86_64
Instance name: wasudb
Redo thread mounted by this instance: 1
Oracle process number: 51
Unix process pid: 13870, image: oracle@localhost

*** 2014-07-25 13:01:44.015
*** SERVICE NAME:(SYS$USERS) 2014-07-25 13:01:43.995
*** SESSION ID:(6209.59205) 2014-07-25 13:01:43.995
=======================================
TOP 10 MEMORY USES FOR THIS PROCESS
---------------------------------------
81% 3328 MB, 213622 chunks: "pmuccst: adt/record       "
         koh-kghu call   ds=0x7f8e128632f8  dsprt=0x6ae9740
13%  544 MB, 8868 chunks: "permanent memory          "  SQL
         sort subheap    ds=0x7f8e121f7a00  dsprt=0x7f8e12891058
 5%  209 MB, 13443 chunks: "pmucalm coll              "
         koh-kghu call   ds=0x7f8e128632f8  dsprt=0x6ae9740
 0% 7467 KB, 8948 chunks: "free memory               "
         session heap    ds=0x7f8e12866290  dsprt=0x6aed2c0
 0% 1284 KB,  22 chunks: "permanent memory          "
         pga heap        ds=0x6ae9740  dsprt=(nil)
 0%  424 KB,   6 chunks: "free memory               "
         top call heap   ds=0x6aed0a0  dsprt=(nil)
 0%  331 KB,  82 chunks: "kxsFrame4kPage            "
         session heap    ds=0x7f8e12866290  dsprt=0x6aed2c0
 0%  274 KB,  10 chunks: "static frame of inst      "
         koh-kghu sessi  ds=0x7f8e11fabe48  dsprt=0x7f8e12866290
 0%  224 KB,   7 chunks: "kxsFrame32kPage           "
         session heap    ds=0x7f8e12866290  dsprt=0x6aed2c0
 0%  128 KB,   1 chunk : "free memory               "
         KSFD vec heap   ds=0x6aeed80  dsprt=0x6ae9740

=======================================
PRIVATE MEMORY SUMMARY FOR THIS PROCESS
---------------------------------------
******************************************************
PRIVATE HEAP SUMMARY DUMP
4103 MB total:
  4101 MB commented, 1286 KB permanent
   537 KB free (0 KB in empty extents),
    3546 MB,   3 heaps:   "koh-kghu call  "            51 KB free held
     554 MB,   1 heap:    "session heap   "
------------------------------------------------------
Summary of subheaps at depth 1
4091 MB total:
  4084 MB commented, 87 KB permanent
  7603 KB free (132 KB in empty extents),
     2 KB uncommented freeable with mark,     3328 MB, 213622 chunks:  "pmuccst: adt/record       "
     552 MB,   1 heap:    "kxs-heap-w     "            7408 KB free held

=========================================
REAL-FREE ALLOCATOR DUMP FOR THIS PROCESS
-----------------------------------------

Dump of Real-Free Memory Allocator Heap [0x7f8e1283e000]
mag=0xfefe0001 flg=0x5000003 fds=0x7 blksz=65536
blkdstbl=0x7f8e1283e010, iniblk=68608 maxblk=262144 numsegs=115
In-use num=65638 siz=8978432, Freeable num=0 siz=0, Free num=1 siz=6619136

==========================================
INSTANCE-WIDE PRIVATE MEMORY USAGE SUMMARY
------------------------------------------

Dumping Work Area Table (level=1)
=====================================

  Global SGA Info
  ---------------

    global target:    40000 MB
    auto target:      31163 MB
    max pga:           2048 MB
    pga limit:         4096 MB
    pga limit known:  0
    pga limit errors:     0

    pga inuse:         6534 MB
    pga alloc:         7384 MB
    pga freeable:       707 MB
    pga freed:        5776451 MB
    pga to free:          0 %

    pga auto:          1160 MB
    pga manual:           0 MB

    pga alloc  (max): 15183 MB
    pga auto   (max):  4383 MB
    pga manual (max):    10 MB

    # workareas     :     0
    # workareas(max):   115



================================
PER-PROCESS PRIVATE MEMORY USAGE
--------------------------------

Private memory usage per Oracle process

-------------------------
Top 10 processes:
-------------------------
(percentage is of 8065 MB total allocated memory)
51% pid 51: 4102 MB used of 4105 MB allocated  < = CURRENT PROC
 7% pid 101: 536 MB used of 537 MB allocated
 3% pid 127: 52 MB used of 211 MB allocated (158 MB freeable)
 2% pid 94: 37 MB used of 138 MB allocated (101 MB freeable)
 1% pid 215: 101 MB used of 120 MB allocated (18 MB freeable)
 1% pid 216: 101 MB used of 119 MB allocated (18 MB freeable)
 1% pid 254: 101 MB used of 119 MB allocated (18 MB freeable)
 1% pid 278: 101 MB used of 119 MB allocated (18 MB freeable)
 1% pid 155: 100 MB used of 116 MB allocated (15 MB freeable)
 1% pid 156: 111 MB used of 111 MB allocated

-------------------------
All processes:
-------------------------
(session detail when over 403 MB allocated)

pid 2: 295 KB used of 374 KB allocated
pid 3: 294 KB used of 374 KB allocated
pid 4: 388 KB used of 502 KB allocated
pid 5: 8205 KB used of 14 MB allocated (3136 KB freeable)
pid 6: 8409 KB used of 13 MB allocated (2112 KB freeable)
pid 7: 8356 KB used of 15 MB allocated (4160 KB freeable)
....................(省略很多)

通过trace中的信息我们可以看到PER-PROCESS PRIVATE MEMORY USAGE的TOP 10 process以及all process的信息,其中PID为51的进程(TOP 1)马上就入了大头的法眼~进程开销PGA已经4个G了,基本可以联想到肯定是超大DML事务带有排序等没有批量提交导致把内存撑爆!经过检查发现是一个存储过程在跑大量数据的处理,验证了我的判断,协助客户发给应用开发处理,并建议缩小事务,尽量采用批量的提交方式。

从以上也可以看出,硬件资源是有限的,即使给oracle用户分配的资源再多,只要应用开发设计得不合理,你永远都不会满足这应用的需求!

浅谈ORA-12545 / TNS-12545故障诊断思路

前一个刚经历rac安装完后遭遇的ORA-12545错误,这里就顺便把这个错误的诊断思路理出来,毕竟这个错误在10g后还是比较常见,本文只是对这个故障的处理诊断思路做一些经验上的讨论,纯为有趣。

12545的报错提示为如下:

ORA-12545 / TNS-12545 Connect failed because target host or object does not exist

先往下聊吧,这类错误通常是和hostname相关配置不正确有关,举个例子

[ora10g@ludatou ~]$ tnsping lu10g
TNS Ping Utility for Linux: Version 10.2.0.4.0 - Production on 15-APR-2014 12:08:27
Copyright (c) 1997,  2007, Oracle.  All rights reserved.
Used parameter files:

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = ludatou)(PORT = 1523)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = lu10g)))
OK (10 msec)

[ora10g@ludatou ~]$ sqlplus luda/luda@lu10g
SQL*Plus: Release 10.2.0.4.0 - Production on Tue Apr 15 12:12:15 2014
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

=====这里我的tnsnames配置的解析主机名是ludatou,而且采用网络验证方式登录成功
[ora10g@ludatou ~]$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
192.168.102.128         ludatou   luda
=====在这边我把/etc/hosts文件的主机名ludatou变更为ludatouxx,变更结果如下
[ora10g@ludatou ~]$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
192.168.102.128         ludatouxx   luda
 

再次在客户端(另外一个主机)远程登录,在响应了很久之后报错

[root@ludatou ~]# su - oracle
[oracle@ludatou ~]$ sqlplus luda/luda@lu10g
SQL*Plus: Release 11.1.0.7.0 - Production on Tue Apr 15 12:13:40 2014
Copyright (c) 1982, 2008, Oracle.  All rights reserved.

ERROR:
ORA-12545 Connect failed because target host or object does not exist
 

下来以分类场景的方式来描述这类故障的诊断:
场景1:
当连接数据库不通过listener的时候,而是使用BEQ协议来连接数据库,这个时候如果$ORACLE_HOME/bin目录下oracle文件丢失或者损坏,用户执行权限不足,以及Oracle_home(多个安装版本)的路径设置错误的情况下,使用beq协议连接oracle的用户就会遭遇ORA/TNS-12545的错误。
关于BEQ协议可以参考如下解释:

This is the bequeth oracle process. The bequeth process starts first. Later all the processes related to particular database are controlled by its bequeth process. 

场景2:
这个场景就是上面我所做的测试场景,由客户端通过listner连接服务端oracle数据库,这个时候如果客户端的tnsnames解析文件中对应service name部分的hostname和服务端的hostname不匹配,就会报错ora-12545.
在这个场景中可以使用nslookup命令来搜索对应的主机,按照上面的例子,这里使用nslookup命令搜索ludatou主机时候则会出现以下情况:

$ nslookup ludatou
Server: 192.168.102.128
Address: 192.168.1.102#53
** server can't find ludatou: ==>Indicates ludatou is not resolvable on the machine

这个场景的解决办法就是把客户端的tnsnames中的hostname部分写为ip,避免服务端主机名变更导致配置不符的情况出现。

场景3:
参考 RAC实施完遭遇ORA-12545

Cause: One of the hostname (which corresponds to public IP or VIP) is not reachable from this client machine.
When the server side load balancing is enabled in the RAC setup, the listener will redirect the connection to the least loaded node.While doing so, the server sends the packet NSPTRD containing the hostname of the corresponding machine.

The Remote Service Handler value registered with the remote Listener process via the REMOTE_LISTENER parameter is built by the LOCAL_LISTENER value on the local server.So, its necessary to check whether the local_listener / remote listener information are reflected properly in the listener services output as well.

Diagnosis:
Enable the oracle sqlnet client tracing at support level, and reproduce the issue.In the generated client trace, you would see the below information:
After NSPTCN Connect to the listener , listener sends

场景4:
listerner.ora的hostname配置错误,导致监听无法找寻到正确的监听地址,也会产生ora/tns-12545的错误。具体监听日志报错如下:

Error listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=abc)(PORT=1522)))
TNS-12545: Connect failed because target host or object does not exist
TNS-12560: TNS:protocol adapter error
TNS-00515: Connect failed because target host or object does not exist

如果打开了sqlnet的trace跟踪,可以发现类似如下的报错:

nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=abc)(PORT=1522)))
nttbnd2addr: entry
snlinGetAddrInfo: entry
snlinGetAddrInfo: Invalid IP address string abc
snlinFreeAddrInfo: entry
snlinFreeAddrInfo: exit
snlinGetAddrInfo: exit
nttbnd2addr: looking up IP addr for host: abc
snlinGetAddrInfo: entry
snlinGetAddrInfo: Name resolution failed for abc
snlinFreeAddrInfo: entry
snlinFreeAddrInfo: exit
snlinGetAddrInfo: exit
nttbnd2addr: *** hostname lookup failure! ***
nttbnd2addr: exit
nserror: entry
nserror: nsres: id=0, op=78, ns=12545, ns2=12560; nt[0]=515, nt[1]=0, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0

这类问题解决办法就是检测监听的配置是否和主机的ip,hostname配置吻合,不吻合的情况下会造成12545的错误。

场景5:
操作系统或者硬件问题也会导致产生ora/tns-12545的产生。
5.1windows 2000 with Service Pack 3 会导致这个ora/tns-12545 的产生
http://support.microsoft.com/default.aspx?scid=kb;en-us;329405
5.2Solaris系统在配置不正确情况下也会导致ora/tns-12545的产生

以上的场景描述是对1254错误的故障做一些分类分析,大概的诊断思路也是这样。

当然以上只是针对12545,oracle网络的错误有很多,对于诊断oracle网络错误,我个人的习惯诊断流程是如下
1.check listener.log确认是否有报错信息
2.check syslog是否有设备或者系统的报错信息
3.check $ORACLE_HOME/bin/oracle文件是否存在以及权限正确与否
4.check tnsnames,sqlnet,listener的配置是否与主机对应
5.在rac环境下,我会优先检查节点和client的通信情况,网络配置文件是否配置正确,最后才是taf和ld的影响
6.必要时候开启trace进一步跟踪详细信息

Dataguard 跨平台支持列表

oracle支持dataguard 跨平台,但是有不少限制,具体参考以下列表。

 

PLATFORM_ID PLATFORM_NAME
Release name
PLATFORM_IDs supported within the same Data Guard configuration when using Data Guard Redo Apply (Physical Standby)
2 Solaris[tm] OE (64-bit)
Solaris Operating System (SPARC) (64-bit)
2
6 – This is not supported due to issues reported in Bug 12702521
3 HP-UX (64-bit)
HP-UX PA-RISC
3
4 – Oracle 10g onward, see Support Note: 395982.1 and Note:414043.1
4 HP-UX IA (64-bit)
HP-UX Itanium
4
3 – Oracle 10g onward, see Support Notes Note: 395982.1 and Note:414043.1
5 HP Tru64 UNIX
HP Tru64 UNIX
5
6 IBM AIX on POWER Systems (64-bit) 2 – This is not supported due to issues reported in Bug 12702521
6
7 Microsoft Windows (32-bit)
Microsoft Windows (x86)
7
8, 12  – Oracle 10g onward, see Support Note: 414043.1
10 – Oracle 11g onward, requires Patch 13104881
11, 13 – Oracle 11g onward, see Support Note: 414043.1, also requires Patch 13104881
8 Microsoft Windows IA (64-bit)
Microsoft Windows (64-bit Itanium)
7 – Oracle 10g onward, see Support Note: 414043.1
8
12 – Oracle 10g onward
11, 13 – Oracle 11g onward, requires Patch 13104881
9 IBM zSeries Based Linux
z/Linux
9
18 (64-bit zSeries only)
10 Linux (32-bit)
Linux x86
7 – Oracle 11g onward, requires Patch 13104881
10
11, 13 – Oracle 10g onward, see Support Note: 414043.1
11 Linux IA (64-bit)
Linux Itanium
10 – Oracle 10g onward, see Support Note: 414043.1
11
13 – Oracle 10g onward
7 – Oracle 11g onward, see Support Note: 414043.1, also requires Patch 13104881
8, 12 – Oracle 11g onward, requires Patch 13104881
12 Microsoft Windows 64-bit for AMD
Microsoft Windows (x86-64)
7 – Oracle 10g onward, see Support Note Note: 414043.1
8 – Oracle 10g onward
12
11, 13 – Oracle 11g onward, requires Patch 13104881
13 Linux 64-bit for AMD
Linux x86-64
7 – Oracle 11g onward, see Support Note: 414043.1, also requires Patch 13104881
10 – Oracle 10g onward, see Support Note Note: 414043.1
11 – Oracle 10g onward
8, 12 – Oracle 11g onward, requires Patch 13104881
13
20 – Oracle 11g onward
15 HP Open VMS
HP OpenVMS Alpha
HP IA OpenVMS
OpenVMS Itanium
15
16 Apple Mac OS
Mac OS X Server
16
17 Solaris Operating System (x86)
Solaris Operating System (x86)
17
20 – Oracle 10g onward, see Support Note: 414043.1
18 IBM Power Based Linux
Linux on Power
9 (64-bit zSeries only)
18
20 Solaris Operating System (AMD64)
Solaris Operating System (x86-64)
13 – Oracle 11g onward
17 – Oracle 10g onward, see Support Note: 414043.1
20