agctl start goldengate hangs and fails

In this post I will discuss one of the reasons why golden gate startup may hang and fail on both nodes:

Problem:

[root@primrac1 tmp]# /u01/app/grid/xag/bin/agctl start goldengate gg_replicate
^[[ACRS-2672: Attempting to start 'xag.gg_replicate.goldengate' on 'primrac2'
CRS-2674: Start of 'xag.gg_replicate.goldengate' on 'primrac2' failed
CRS-2679: Attempting to clean 'xag.gg_replicate.goldengate' on 'primrac2'
CRS-2681: Clean of 'xag.gg_replicate.goldengate' on 'primrac2' succeeded
CRS-2563: Attempt to start resource 'xag.gg_replicate.goldengate' on 'primrac2' has failed. Will re-retry on 'primrac1' now.
CRS-2672: Attempting to start 'xag.gg_replicate.goldengate' on 'primrac1'
CRS-2674: Start of 'xag.gg_replicate.goldengate' on 'primrac1' failed
CRS-2679: Attempting to clean 'xag.gg_replicate.goldengate' on 'primrac1'
CRS-2681: Clean of 'xag.gg_replicate.goldengate' on 'primrac1' succeeded
CRS-2632: There are no more servers to try to place resource 'xag.gg_replicate.goldengate' on that would satisfy its placement policy
CRS-4000: Command Start failed, or completed with errors.

Troubleshooting:

Check crsd_scriptagent_oracle.trc trace file on each database node. For my environment the full path of this file is /u01/app/grid/diag/crs/primrac2/crs/trace/crsd_scriptagent_oracle.trc

I found that the following error was repeating in the trace file:

2020-03-05 23:36:30.467 : CLSDADR:3560113920: ERRORdynamic component [R2530613] mapped to [CLSDYNAM]

On the metalink found only this PRCR-1079 : Failed to start resource ora.oc4j, the problem is not the same, but glanced a sentence containing config file.. so here I realized it could be related to the config file. And I was correct.

Solution:

Delete existing GG resource:

# /u01/app/grid/xag/bin/agctl remove goldengate gg_replicate

Readd using correct config file location:
--config_home /GG_HOME/sm/etc/conf --var_home /GG_HOME/sm/var

# /u01/app/grid/xag/bin/agctl add goldengate gg_replicate --gg_home /GG_HOME/ma --service_manager --config_home /GG_HOME/sm/etc/conf --var_home /GG_HOME/sm/var --port 9001 --adminuser oggadmin --user oracle --group oinstall --filesystems ora.ggdg.acfsgg.acfs --db_services ora.primorcl.orclservice.svc --use_local_services

Start Golden Gate, relocate and check:

[root@primrac1 ~]# /u01/app/grid/xag/bin/agctl start goldengate gg_replicate
[root@primrac1 ~]# /u01/app/grid/xag/bin/agctl status goldengate gg_replicate
Goldengate instance 'gg_replicate' is running on primrac2
[root@primrac1 ~]# /u01/app/grid/xag/bin/agctl relocate goldengate gg_replicate
[root@primrac1 ~]# /u01/app/grid/xag/bin/agctl status goldengate gg_replicate
Goldengate instance 'gg_replicate' is running on primrac1

Presentation: Oracle GoldenGate Microservices Overview (with DEMO)

Webinar: Oracle GoldenGate Microservices Overview (with DEMO)

XAG Integration mode global setting of true does not match command line setting of false

Problem:

Not able to start Service Manager in Oracle Golden Gate Microservices architecture:

$ export OGG_CONF_HOME=/GG_HOME/sm/etc/conf 
$ /GG_HOME/ma/bin/ServiceManager

Errors in ServiceManager.log:

2019-03-23T21:59:50.237+0000 INFO | Read SharedContext from store for length 1372 of file '/GG_HOME/deploy/var/lib/conf/ServiceManager-config.dat'. (main)
2019-03-23T21:59:50.242+0000 INFO | Read SharedContext from store for length 245 of file '/GG_HOME/deploy/var/lib/conf/global.dat'. (main)
2019-03-23T21:59:50.245+0000 INFO | Read SharedContext from store for length 19 of file '/GG_HOME/deploy/var/lib/conf/ServiceManager-resources.dat'. (main)
2019-03-23T21:59:50.247+0000 ERROR| XAG Integration mode global setting of true does not match command line setting of false. (main)
2019-03-23T21:59:50.249+0000 ERROR| XAG Configuration failed (main)

Solution:

Pass `--xagEnabled` option when starting ServiceManager:

$ export OGG_CONF_HOME=/GG_HOME/sm/etc/conf
$ /GG_HOME/ma/bin/ServiceManager --xagEnabled

Oracle GoldenGate Service Manager for Oracle
Version 18.1.0.0.0 OGGCORE_18.1.0.0.0_PLATFORMS_180928.0432

Copyright (C) 1995, 2018, Oracle and/or its affiliates. All rights reserved.

Linux, x64, 64bit (optimized) on Sep 28 2018 17:31:51
Operating system character set identified as UTF-8.

Oracle Golden Gate: Passive Extract Configuration

When do we need passive extract?

When you have untrusted source and trusted target. Or in simple words, your untrusted source should not have IP address and manager port details of the trusted target system in the configuration. And the trusted target is allowed to know VIP or Host IP of the untrusted source.

How does it work?

1) Start command must be issued on the Trusted Target for the Alias Extract, that is intercepted by Manager.
2) Start command is forwarded to the Manager on the Untrusted Source.
3) Manager on the Untrusted Source starts Passive Extract.
4) Passive Extract retrieves data from source database.
5) After security validation, Manager on the Untrusted Source knows where to send Trails (initiator server IP). And sends captured data.
6) Trail files are read by Replicat.
7) Data from the Trails are applied to the database at the Trusted Target.

Environment Details:

Source host: primrac1.example.com
Target host: stbyrac1.example.com

Create Manager

On the source:

GGSCI (primrac1.example.com)> edit params mgr
PORT 7809
ACCESSRULE, PROG *, IPADDR *, PRI 1, ALLOW
AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60
AUTOSTART ER *

On the target:

GGSCI (stbyrac1.example.com)> edit params mgr
PORT 7809
ACCESSRULE, PROG *, IPADDR *, PRI 1, ALLOW
AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60
AUTOSTART ER *

ACCESSRULE is used to control connection access to the Manager process and the processes under its control. By default there is only deny rule.

For testing purposes I created a rule that grants access to all programs and all IP addresses. In production environment you have to filter incomming IPs. For more information about ACCESSRULE, please click here.

Modify Globals

To allow a remote Director server connection, add the following parameter in ./GLOBALS:

GGSCI (primrac1.example.com)> edit params ./GLOBALS
_DISABLEFIX21427144

Start Manager

GGSCI (primrac1.example.com)> start mgr
Manager started.
GGSCI (stbyrac1.example.com)> start mgr
Manager started.

Create Passive Extract

On the source:

# su - oracle
$ cd /GG_HOME/home_1
$ export ORACLE_HOME=/u01/app/oracle/product/18.3.0/dbhome_1
$ export ORACLE_SID=orclgg
$ ./ggsci

GGSCI (primrac1.example.com as ggcw@orclgg)> DBLOGIN USERID ggcw@ORCLGG, PASSWORD ggcw
Successfully logged into database.

GGSCI (primrac1.example.com as ggcw@orclgg)> edit param PassExtr
extract PassExtr
USERID ggcw@ORCLGG, PASSWORD ggcw
rmttrail ./dirdat/pe
table HR.SALARY;

GGSCI (primrac1.example.com as ggcw@orclgg)> add extract PassExtr, passive, integrated tranlog, begin now
EXTRACT (Integrated) added.

GGSCI (primrac1.example.com as ggcw@orclgg)> register extract PassExtr database
2019-03-21 21:45:23 INFO OGG-02003 Extract PASSEXTR successfully registered with database at SCN 8228334.

GGSCI (primrac1.example.com as ggcw@orclgg)> add rmttrail ./dirdat/pe, extract PassExtr, megabytes 1000
RMTTRAIL added.

Please note that in add extract command I have indicated keyword passive.

Create Alias Extract

On the target:

GGSCI (stbyrac1.example.com  as ggcw@orclgg)> DBLOGIN USERID ggcw@ORCLGG, PASSWORD ggcw
Successfully logged into database.

GGSCI (stbyrac1.example.com as ggcw@orclgg)> add extract AlExtr, rmthost primrac1.example.com, mgrport 7809, rmtname PassExtr
EXTRACT added.

Start Extract

START and STOP commands must be run on Alias Extract. Any GGSCI command executed against the Alias Extract is forwarded to the untrusted source for execution against the Passive Extract.

If you try to STOP or START Passive Extract, you will get the following error:

GGSCI (primrac1.example.com as ggcw@orclgg)> start PASSEXTR
ERROR: START is not allowed against a passive group.
GGSCI (primrac1.example.com as ggcw@orclgg)> stop PASSEXTR
ERROR: STOP is not allowed against a passive group.

Run start on only Alias Extract:

GGSCI (stbyrac1.example.com as ggcw@orclgg)> start AlExtr
Sending START request to MANAGER@PRIMRAC1.EXAMPLE.COM:7809 …
Sending START request to MANAGER@stbyrac1.example.com:7809 …

Check Status

GGSCI (stbyrac1.example.com as ggcw@orclgg)> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING ALEXTR
GGSCI (primrac1.example.com as ggcw@orclgg)> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING PASSEXTR 00:00:00 00:18:14

For any possible problems you can check ggserr.log file under GG home.

GGSCI: error while loading shared libraries: libclntshcore.so.18.1: cannot open shared object file: No such file or directory

Problem:

Oracle GoldenGate Software Command Interface (GGSCI) 18c raises error:

$ /GG_HOME/home_1/ggsci
/GG_HOME/home_1/ggsci: error while loading shared libraries: libclntshcore.so.18.1: cannot open shared object file: No such file or directory

Cause: 

GoldenGate put all necessary shared objects (.so files) under its own home directory, but several shared objects may not be found there. They should be borrowed from RDBMS or GI home.

Solution:

# find / -name libclntshcore.so.18.1

/u01/app/18.3.0/grid/.patch_storage/28090523_Jul_14_2018_00_03_50/files/lib/libclntshcore.so.18.1
/u01/app/18.3.0/grid/.patch_storage/28655784_Oct_8_2018_21_27_28/files/lib/libclntshcore.so.18.1
/u01/app/18.3.0/grid/lib/libclntshcore.so.18.1
/u01/app/18.3.0/grid/inventory/Scripts/ext/lib/libclntshcore.so.18.1

Create symbolic link:

$ cd /GG_HOME/home_1
$ ln -s /u01/app/18.3.0/grid/lib/libclntshcore.so.18.1 libclntshcore.so.18.1

Rerun ggsci.

 

GGSCI: error while loading shared libraries: libclntsh.so.18.1: cannot open shared object file: No such file or directory

Problem:

Oracle GoldenGate Software Command Interface (GGSCI) 18c raises error:

$ /GG_HOME/home_1/ggsci
/GG_HOME/home_1/ggsci: error while loading shared libraries: libclntsh.so.18.1: cannot open shared object file: No such file or directory

Cause: 

GoldenGate put all necessary shared objects (.so files) under its own home directory, but several shared objects may not be found there. They should be borrowed from RDBMS or GI home.

Solution:

# find / -name libclntsh.so.18.1

/u01/app/18.3.0/grid/.patch_storage/28090523_Jul_14_2018_00_03_50/files/lib/libclntsh.so.18.1
/u01/app/18.3.0/grid/.patch_storage/28655784_Oct_8_2018_21_27_28/files/lib/libclntsh.so.18.1
/u01/app/18.3.0/grid/lib/libclntsh.so.18.1
/u01/app/18.3.0/grid/inventory/Scripts/ext/lib/libclntsh.so.18.1

Create symbolic link:

$ cd /GG_HOME/home_1
$ ln -s /u01/app/18.3.0/grid/lib/libclntsh.so.18.1 libclntsh.so.18.1

Rerun ggsci.

 

OGG-02022 Oracle GoldenGate Capture for Oracle, ext1.prm: Logmining server does not exist on this Oracle database

Problem:

The extract EXT1 stops immediatelly in Golden Gate environment:

GGSCI (stbyrac1.example.com as ggcw@orcl1) 4> start ext1

Sending START request to MANAGER …
EXTRACT EXT1 starting

GGSCI (stbyrac1.example.com as ggcw@orcl1) 5> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
EXTRACT STOPPED EXT1 00:00:00 00:52:17
REPLICAT RUNNING REP1 00:00:00 00:00:02

Troubleshoot:

/GG_HOME/home_1/ggserr.log file contains the following error:

2019-01-08T23:05:11.364+0000 ERROR OGG-02022 Oracle GoldenGate Capture for Oracle, ext1.prm: Logmining server does not exist on this Oracle database.
2019-01-08T23:05:11.364+0000 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, ext1.prm: PROCESS ABENDING.

Solution:

Register the extract:

GGSCI (stbyrac1.example.com as ggcw@orcl1) 8> register extract ext1 database

2019-01-08 23:10:54 INFO OGG-02003 Extract EXT1 successfully registered with database at SCN 2126943.

Verify:

GGSCI (stbyrac1.example.com as ggcw@orcl1) 9> start ext1

Sending START request to MANAGER …
EXTRACT EXT1 starting

GGSCI (stbyrac1.example.com as ggcw@orcl1) 10> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
EXTRACT RUNNING EXT1 00:00:00 00:58:38
REPLICAT RUNNING REP1 00:00:00 00:00:04

GGSCI: error while loading shared libraries: libnnz18.so: cannot open shared object file: No such file or directory

Problem:

Oracle GoldenGate Software Command Interface (GGSCI) 18c raises error:

$ /GG_HOME/home_1/ggsci
/GG_HOME/home_1/ggsci: error while loading shared libraries: libnnz18.so: cannot open shared object file: No such file or directory

Cause: 

GoldenGate put all necessary shared objects (.so files) under its own home directory, but several shared objects may not be found there. They should be borrowed from RDBMS or GI home.

Solution:

# find / -name libnnz18.so
/u01/app/18.3.0/grid/lib/libnnz18.so
/u01/app/18.3.0/grid/inventory/Scripts/ext/lib/libnnz18.so

Create symbolic link of the specidifed library in the Golden Gate home:

$ cd /GG_HOME/home_1
$ ln -s /u01/app/18.3.0/grid/lib/libnnz18.so libnnz18.so

Rerun ggsci.

 

ERROR: This EXTRACT EXT1 is already registered with the database.

Problem:

Trying to register ext1 extract and getting the error:

GGSCI> register extract ext1 database

2019-01-03 20:48:26 ERROR OGG-08241 ERROR: This EXTRACT EXT1 is already registered with the database.

Troubleshooting:

GGSCI> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

Solution:

Unregister extract first:

GGSCI> unregister extract ext1 database

2019-01-03 20:51:24 INFO OGG-01750 Successfully unregistered EXTRACT EXT1 from database.

Reregister again:

GGSCI> register extract ext1 database

2019-01-03 20:52:15 INFO OGG-02003 Extract EXT1 successfully registered with database at SCN 1643626.