环境: Oracle 19.16 ADG (Single Instance -> RAC)
在配置ADG的场景,发现ADG不能同步。
1)主库alert日志:
2023-01-29T11:18:41.489164+08:00 TT00 (PID:27840): Error 12514 received logging on to the standby TT00 (PID:27840): Attempting LAD:2 network reconnect (12514) TT00 (PID:27840): LAD:2 network reconnect abandoned 2023-01-29T11:18:41.489623+08:00 Errors in file /u01/app/oracle/diag/rdbms/demo/demo/trace/demo_tt00_27840.trc: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor TT00 (PID:27840): krsg_check_connection: Error 12514 connecting to standby 'demorac'
复制
2)ADG配置链路的error信息:
SQL> select error from v$archive_dest where dest_id = 2;
ERROR
-----------------------------------------------------------------
ORA-12514: TNS:listener does not currently know of service
requested in connect descriptor
复制
错误号一致,均为12514。
[oracle@bogon 2023_01_28]$ oerr ora 12514 12514, 00000, "TNS:listener does not currently know of service requested in connect descriptor" // *Cause: The listener received a request to establish a connection to a // database or other service. The connect descriptor received by the listener // specified a service name for a service (usually a database service) // that either has not yet dynamically registered with the listener or has // not been statically configured for the listener. This may be a temporary // condition such as after the listener has started, but before the database // instance has registered with the listener. // *Action: // - Wait a moment and try to connect a second time. // - Check which services are currently known by the listener by executing: // lsnrctl services <listener name> // - Check that the SERVICE_NAME parameter in the connect descriptor of the // net service name used specifies a service known by the listener. // - If an easy connect naming connect identifier was used, check that // the service name specified is a service known by the listener. // - Check for an event in the listener.log file.
复制
[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac as sysdba ... ERROR: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
复制
分别连接两个节点:
[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac1 as sysdba
...
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0
SQL>
[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac2 as sysdba
...
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor
复制
看来是因为RAC的实例2连接有问题?
嗯,现在实例2还不存在,那直接把scan配置到节点1机器上试试吧:
[grid@db01rac1 ~]$ srvctl status scan SCAN VIP scan1 is enabled SCAN VIP scan1 is running on node db01rac2 [grid@db01rac1 ~]$ srvctl relocate scan_listener -scannumber 1 -node db01rac1 [grid@db01rac1 ~]$ srvctl status scan SCAN VIP scan1 is enabled SCAN VIP scan1 is running on node db01rac1
复制
结果依然报错,说明并不是relocate的问题。
其实在relocate之前,就发现scan在节点2时的状态有问题,没有监听到任何服务:
[grid@db01rac2 ~]$ lsnrctl status LISTENER_SCAN1 LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 28-JAN-2023 18:54:31 Copyright (c) 1991, 2022, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1))) STATUS of the LISTENER ------------------------ Alias LISTENER_SCAN1 Version TNSLSNR for Linux: Version 19.0.0.0.0 - Production Start Date 28-JAN-2023 18:53:51 Uptime 0 days 0 hr. 0 min. 40 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/19.3.0/grid/network/admin/listener.ora Listener Log File /u01/app/grid/diag/tnslsnr/db01rac2/listener_scan1/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.15)(PORT=1521))) The listener supports no services The command completed successfully
复制
为什么LISTENER_SCAN1监听不到任何服务呢?尝试使用手工alter system register;
也不行。
与scan有关的数据库配置就是那个remote参数了,难道设置有问题,oracle用户登陆到数据中查看确认:
SQL> show parameter remote_listener
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
remote_listener string
复制
为什么会是空呢?实际是因为该环境搭建ADG,主库是单实例,参数中没有设置remote_listener参数,备库是RAC,虽然配置了其他RAC独有参数,但漏掉了这个参数。
知道原因后,解决很简单,直接根据当前环境在备库配置正确的 remote_listener 参数值:
alter system set remote_listener='db01rac-scan:1521';
复制
再次查看LISTENER_SCAN1监听状态,已经正常有服务注册了:
[grid@db01rac1 admin]$ lsnrctl status LISTENER_SCAN1 LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 28-JAN-2023 19:26:13 Copyright (c) 1991, 2022, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1))) STATUS of the LISTENER ------------------------ Alias LISTENER_SCAN1 Version TNSLSNR for Linux: Version 19.0.0.0.0 - Production Start Date 28-JAN-2023 19:15:24 Uptime 0 days 0 hr. 10 min. 49 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/19.3.0/grid/network/admin/listener.ora Listener Log File /u01/app/grid/diag/tnslsnr/db01rac1/listener_scan1/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.15)(PORT=1521))) Services Summary... Service "demorac" has 1 instance(s). Instance "jydb1", status READY, has 1 handler(s) for this service... The command completed successfully
复制
此时再次测试连接,已恢复正常,ADG链路的报错信息也已经随之解决。
[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac as sysdba
...
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0
SQL>
复制
本次问题定位后发现很简单,只是因为备库的一个参数设置导致:
# primary 查看配置链路的错误信息:
SQL> select error from v$archive_dest where dest_id = 2;
ERROR
-----------------------------------------------------------------
ORA-12514: TNS:listener does not currently know of service
requested in connect descriptor
# standby 修正参数设置:
alter system set remote_listener='db01rac-scan:1521';
复制
延伸:其实很多ADG同步问题,日志中都会抛出各类错误,除了按照错误排查之外,还有一些经验性的思路,可以在troubleshooting过程中选择尝试,下面给出示例,希望能对大家有所帮助。
1)ADG配置链路defer后再enable:
SQL> alter system set log_archive_dest_state_2=defer;
SQL> alter system set log_archive_dest_state_2=enable;
复制
2)密码文件同步:
[oracle@bogon dbs]$ scp orapwdemo 192.168.1.11:/u01/app/oracle/product/19.3.0/db_1/dbs/orapwjydb1 [oracle@bogon dbs]$ scp orapwdemo 192.168.1.12:/u01/app/oracle/product/19.3.0/db_1/dbs/orapwjydb2
复制
3)重启MRP进程:
RECOVER MANAGED STANDBY DATABASE CANCEL; RECOVER MANAGED STANDBY DATABASE DISCONNECT;
复制
注意:有时在备库open read only状态下无法启动MRP的情况,可以选择到mount状态下尝试,遇到过很多情况下都有效。
4)尝试重建SRLs:
遇到过个别情况,实时同步有问题是因为SRLs有问题,按照你的环境要求重建即可:
#recreate:drop and create SRLs:
SQL> recover managed standby database cancel;
alter database drop standby logfile group 14;
alter database drop standby logfile group 13;
alter database drop standby logfile group 12;
alter database drop standby logfile group 11;
#确认asm磁盘组中清理,注意千万核对清楚,别误删除其他的:
ASMCMD> rm standby_group_*
ASMCMD> pwd
+datadg/DEMORAC
#新建SRLs:
SQL> select * from v$standby_log;
alter database add standby logfile thread 1 group 11 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 12 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 13 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 14 '+DATADG' size 209715200;
SQL> recover managed standby database disconnect;
复制
5)检查基本参数配置:
很多时候,环境有可能被人修改过,要检查基本参数,确认和你的环境要求匹配:
set linesize 500 col value for a70 col name for a50 select name, value from v$parameter where name in ('db_name','db_unique_name','log_archive_config', 'log_archive_dest_1','log_archive_dest_2', 'log_archive_dest_state_1','log_archive_dest_state_2', 'remote_login_passwordfile', 'log_archive_format','log_archive_max_processes','fal_server','db_file_name_convert', 'log_file_name_convert', 'standby_file_management');
复制
6)了解主要进程:
遇到过给生产库配置ADG,但无论如何折腾备库也无法实现实时同步,最终发现是主库有异常,重启后就自动解决了。
但很多情况申请重启主库很难,其实可以先申请尝试重启相关进程,比如主库的arc,tt 等进程。
这些进程被杀掉通常不会导致主库crash,而是会重新启动新的进程。