ADG无法同步:TT00进程报错 Error 12514

adg,无法,同步,tt00,进程,报错,error · 浏览次数 : 298

小编点评

**ADG配置链路defer后再enable** * alter system set log_archive_dest_state_2=defer; * alter system set log_archive_dest_state_2=enable; **密码文件同步** * scp orapwdemo 192.168.1.11:/u01/app/oracle/product/19.3.0/db_1/dbs/orapwjydb1[oracle@bogon dbs]$ scp orapwdemo 192.168.1.12:/u01/app/oracle/product/19.3.0/db_1/dbs/orapwjydb23 **重启MRP进程** * RECOVER MANAGED STANDBY DATABASE CANCEL; * RECOVER MANAGED STANDBY DATABASE DISCONNECT; **检查基本参数配置** * set linesize 500col value for a70col name for a50 select name, valuefrom v$parameterwhere name in ('db_name','db_unique_name','log_archive_config', 'log_archive_dest_1','log_archive_dest_2', 'log_archive_dest_state_1','log_archive_dest_state_2', 'remote_login_passwordfile', 'log_archive_format','log_archive_max_processes','fal_server','db_file_name_convert', 'log_file_name_convert', 'standby_file_management'); **了解主要进程** *遇到过给生产库配置ADG,但无论如何折腾备库也无法实现实时同步,最终发现是主库有异常,重启后就自动解决了。但很多情况申请重启主库很难,其实可以先申请尝试重启相关进程,比如主库的arc,tt 等进程。这些进程被杀掉通常不会导致主库crash,而是会重新启动新的进程。

正文

环境: Oracle 19.16 ADG (Single Instance -> RAC)
在配置ADG的场景,发现ADG不能同步。

1.查看报错信息

1)主库alert日志:

2023-01-29T11:18:41.489164+08:00
TT00 (PID:27840): Error 12514 received logging on to the standby
TT00 (PID:27840): Attempting LAD:2 network reconnect (12514)
TT00 (PID:27840): LAD:2 network reconnect abandoned
2023-01-29T11:18:41.489623+08:00
Errors in file /u01/app/oracle/diag/rdbms/demo/demo/trace/demo_tt00_27840.trc:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
TT00 (PID:27840): krsg_check_connection: Error 12514 connecting to standby 'demorac'

2)ADG配置链路的error信息:

SQL> select error from v$archive_dest where dest_id = 2;

ERROR
-----------------------------------------------------------------
ORA-12514: TNS:listener does not currently know of service
requested in connect descriptor

错误号一致,均为12514。

2.oerr查看该错误说明

[oracle@bogon 2023_01_28]$ oerr ora 12514
12514, 00000, "TNS:listener does not currently know of service requested in connect descriptor"
// *Cause:  The listener received a request to establish a connection to a
// database or other service. The connect descriptor received by the listener
// specified a service name for a service (usually a database service)
// that either has not yet dynamically registered with the listener or has
// not been statically configured for the listener.  This may be a temporary
// condition such as after the listener has started, but before the database
// instance has registered with the listener.
// *Action:
//  - Wait a moment and try to connect a second time.
//  - Check which services are currently known by the listener by executing:
//    lsnrctl services <listener name>
//  - Check that the SERVICE_NAME parameter in the connect descriptor of the
//    net service name used specifies a service known by the listener.
//  - If an easy connect naming connect identifier was used, check that
//    the service name specified is a service known by the listener.
//  - Check for an event in the listener.log file.

3.尝试sqlplus连接到standby

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac as sysdba
...
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor

分别连接两个节点:

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac1 as sysdba
...
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL> 

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac2 as sysdba
...
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor

4.尝试relocate监听

看来是因为RAC的实例2连接有问题?
嗯,现在实例2还不存在,那直接把scan配置到节点1机器上试试吧:

[grid@db01rac1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node db01rac2
[grid@db01rac1 ~]$ srvctl relocate scan_listener -scannumber 1 -node db01rac1
[grid@db01rac1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node db01rac1

结果依然报错,说明并不是relocate的问题。
其实在relocate之前,就发现scan在节点2时的状态有问题,没有监听到任何服务:

[grid@db01rac2 ~]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 28-JAN-2023 18:54:31

Copyright (c) 1991, 2022, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                28-JAN-2023 18:53:51
Uptime                    0 days 0 hr. 0 min. 40 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/db01rac2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.15)(PORT=1521)))
The listener supports no services
The command completed successfully

5.继续排查发现是参数问题

为什么LISTENER_SCAN1监听不到任何服务呢?尝试使用手工alter system register;也不行。
与scan有关的数据库配置就是那个remote参数了,难道设置有问题,oracle用户登陆到数据中查看确认:

SQL> show parameter remote_listener

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
remote_listener 		     string

为什么会是空呢?实际是因为该环境搭建ADG,主库是单实例,参数中没有设置remote_listener参数,备库是RAC,虽然配置了其他RAC独有参数,但漏掉了这个参数。
知道原因后,解决很简单,直接根据当前环境在备库配置正确的 remote_listener 参数值:

alter system set remote_listener='db01rac-scan:1521';

再次查看LISTENER_SCAN1监听状态,已经正常有服务注册了:

[grid@db01rac1 admin]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 28-JAN-2023 19:26:13

Copyright (c) 1991, 2022, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                28-JAN-2023 19:15:24
Uptime                    0 days 0 hr. 10 min. 49 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/db01rac1/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.15)(PORT=1521)))
Services Summary...
Service "demorac" has 1 instance(s).
  Instance "jydb1", status READY, has 1 handler(s) for this service...
The command completed successfully

此时再次测试连接,已恢复正常,ADG链路的报错信息也已经随之解决。

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac as sysdba
...
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL>

6.总结和延伸

本次问题定位后发现很简单,只是因为备库的一个参数设置导致:

# primary 查看配置链路的错误信息:
SQL> select error from v$archive_dest where dest_id = 2;

ERROR
-----------------------------------------------------------------
ORA-12514: TNS:listener does not currently know of service
requested in connect descriptor

# standby 修正参数设置:
alter system set remote_listener='db01rac-scan:1521';

延伸:其实很多ADG同步问题,日志中都会抛出各类错误,除了按照错误排查之外,还有一些经验性的思路,可以在troubleshooting过程中选择尝试,下面给出示例,希望能对大家有所帮助。

1)ADG配置链路defer后再enable:

SQL> alter system set log_archive_dest_state_2=defer;
SQL> alter system set log_archive_dest_state_2=enable;

2)密码文件同步:

[oracle@bogon dbs]$ scp orapwdemo 192.168.1.11:/u01/app/oracle/product/19.3.0/db_1/dbs/orapwjydb1
[oracle@bogon dbs]$ scp orapwdemo 192.168.1.12:/u01/app/oracle/product/19.3.0/db_1/dbs/orapwjydb2

3)重启MRP进程:

RECOVER MANAGED STANDBY DATABASE CANCEL;
RECOVER MANAGED STANDBY DATABASE DISCONNECT;

注意:有时在备库open read only状态下无法启动MRP的情况,可以选择到mount状态下尝试,遇到过很多情况下都有效。

4)尝试重建SRLs:
遇到过个别情况,实时同步有问题是因为SRLs有问题,按照你的环境要求重建即可:

#recreate:drop and create SRLs:
SQL> recover managed standby database cancel;

alter database drop standby logfile group 14;
alter database drop standby logfile group 13;
alter database drop standby logfile group 12;
alter database drop standby logfile group 11;

#确认asm磁盘组中清理,注意千万核对清楚,别误删除其他的:
ASMCMD> rm standby_group_*
ASMCMD> pwd
+datadg/DEMORAC

#新建SRLs:
SQL> select * from v$standby_log;

alter database add standby logfile thread 1 group 11 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 12 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 13 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 14 '+DATADG' size 209715200;

SQL> recover managed standby database disconnect;

5)检查基本参数配置:

很多时候,环境有可能被人修改过,要检查基本参数,确认和你的环境要求匹配:

set linesize 500
col value for a70
col name for a50
 
select name, value
from v$parameter
where name in ('db_name','db_unique_name','log_archive_config', 'log_archive_dest_1','log_archive_dest_2',
               'log_archive_dest_state_1','log_archive_dest_state_2', 'remote_login_passwordfile',
               'log_archive_format','log_archive_max_processes','fal_server','db_file_name_convert',
                     'log_file_name_convert', 'standby_file_management');

6)了解主要进程:

遇到过给生产库配置ADG,但无论如何折腾备库也无法实现实时同步,最终发现是主库有异常,重启后就自动解决了。
但很多情况申请重启主库很难,其实可以先申请尝试重启相关进程,比如主库的arc,tt 等进程。
这些进程被杀掉通常不会导致主库crash,而是会重新启动新的进程。

与ADG无法同步:TT00进程报错 Error 12514相似的内容:

ADG无法同步:TT00进程报错 Error 12514

环境: Oracle 19.16 ADG (Single Instance -> RAC) 在配置ADG的场景,发现ADG不能同步。 1.查看报错信息 2.oerr查看该错误说明 3.尝试sqlplus连接到standby 4.尝试relocate监听 5.继续排查发现是参数问题 6.总结和延伸 1

ADG无法切换:报错 ORA-16467

现象: ADG无法切换:验证时就报错 ORA-16467 记录问题,顺便展现一次troubleshooting的心路历程。 具体查询: 在主库操作, @primary 切换验证: alter database switchover to demorac verify; 报错ORA-16467: SQ

Oracle ADG环境下的RMAN备份策略

作为IT运维人员,尤其是数据库岗位,数据的备份重于一切。 现在很多用户会有一个普遍误区,认为现在类似ADG这类灾备已经很完善,且实时性也更佳,往往就忽略了传统的备份效用。 但实际上,我们千万不能因为有了容灾建设就盲目忽略备份的作用,二者其实有着本质区别。很多场景,灾备都是无法替代传统备份的,二者是缺

ADG备库中某个PDB缺失temp文件

之前认为缺失的temp文件在开库时会自动创建,但其实也有不能自动创建的场景,alert会有类似如下提示: 2023-05-11T20:35:35.974983+08:00 AWR(6):*********************************************************

ADG级联备库环境PSU应用验证

上篇文章 - [源端为备库的场景下Duplicate失败问题](https://www.cnblogs.com/jyzhao/p/17420831.html) 我只在中间备库环境应用了PSU,解决了级联备库从中间备库duplicate数据库的问题: 细心的朋友已经发现,因为是备库环境,并没有做数据库

19c ADG Switchover 切换测试

背景: 环境未配置DG Broker,手工切换ADG,19c也要比11g时代的切换更简单。 使用自己的测试环境,具体可参见: 单实例Primary快速搭建Standby RAC参考手册(19.16 ADG) 1.主库demo切换到RAC环境demorac: 在主库demo执行命令: SQL> alt

验证ADG的坏块检测和自动修复

环境: Oracle 19c ADG(主库:单实例;备库:RAC) 1.主库新建测试文件 2.主库创建测试表 3.查询表对应数据文件信息 4.模拟数据文件物理坏块 5.查询对应测试表 6.进一步查询日志信息 7.确认当前参数设置 1.主库新建测试文件 主库在AWR的PDB中做测试,为了不影响其他测试

11g ADG级联备库基础测试环境准备

客户通过duplicate生产备库的方式创建cascade备库。 发现每次都会遇到两个文件报错,ORA-17628: Oracle error 19505错误,且每一次跑,报错文件不一样。 现在想帮客户验证,这属于是正常现象还是bug; 本文需要先模拟客户11.2.0.3环境,构建备库、级联备库环境

部署19c ADG过程中的问题处理

回忆起来也是有些年没亲自动手搭建ADG了,今天正好有个机会重温,客户环境是19.16,恍惚记得上一次搭ADG还是在11.2.0.4的时代,时光荏苒啊。 正好看下19c的ADG和11g的ADG在部署方面有啥不同? 主备库都是RAC架构,数据库是CDB架构,包含有4个PDB,整个搭建过程还是遇到很多小问

记录一则ADG备库报错ORA-29771的案例

有客户找到我这边咨询,说他们的一套核心ADG库在业务高峰期报错,因为业务做了读写分离,其备库也实际承担读业务,所以备库故障也会对业务产生影响。 这里也要提醒大家,做读写分离,如果读库出现故障的情况,要有切换到主库的应急方案考虑进去。 客户这里自己通过重启备库暂时解决,但担心故障再现,所以非常着急要分