ORA-15477: cannot communicate with the volume driver (DBD ERROR: OCIStmtExecute)

Problem:

I had GI Standalone installation, which I’ve deconfigured and configured one node RAC which was successful. Then I’ve tried to create ACFS volume which failed with ORA-15477:

[root@host1 dbs]# asmcmd volcreate -G OGG -s 10G ACFSGG
ORA-15032: not all alterations performed
ORA-15477: cannot communicate with the volume driver (DBD ERROR: OCIStmtExecute)

Reason:

It seems the ACFS/ADMV modules are not loaded:

[root@host1 dbs]# lsmod | grep oracle
oracleacfs           5921415  0
oracleadvm           1236257  0
oracleoks             750688  2 oracleacfs,oracleadvm

Solution:

First of all, I will share two possible solutions, that helped others but not me and one possible solution (3rd) that helped me:

  1. Start module manualy and make sure it’s enabled:
# acfsload start
# acfsload enable

Check if modules is loaded using lsmod | grep oracle and retry volume creation.

2. Reinstall acfs/admv modules manually:

[root@host1 dbs]# acfsroot install
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9314: Removing previous ADVM/ACFS installation.
depmod: ERROR: fstatat(6, uds.ko): No such file or directory
depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
depmod: ERROR: fstatat(6, uds.ko): No such file or directory
depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory
ACFS-9390: The command 'echo '/lib/modules/3.10.0-862.el7.x86_64/extra/usm/oracleadvm.ko
/lib/modules/3.10.0-862.el7.x86_64/extra/usm/oracleoks.ko
/lib/modules/3.10.0-862.el7.x86_64/extra/usm/oracleacfs.ko
' | /sbin/weak-modules --no-initramfs --add-modules 3.10.0-1127.18.2.el7.x86_64 2>&1 |' returned unexpected output that may be important for system configuration:
depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory

depmod: ERROR: fstatat(6, uds.ko): No such file or directory

depmod: ERROR: fstatat(6, uds.ko): No such file or directory

depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory

ACFS-9154: Loading 'oracleoks.ko' driver.
ACFS-9154: Loading 'oracleadvm.ko' driver.
ACFS-9154: Loading 'oracleacfs.ko' driver.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9309: ADVM/ACFS installation correctness verified.

Retry volume creation.

If none of the above helps, do the 3rd solution (which is not available on the internet, it was my decision):

3. Rebuild initramfs

[root@host1 ~]# cp -p /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
[root@host1 ~]# dracut -f
[root@host1 ~]# reboot

After restart, you should be able to create volume.

Advertisement

ACFS-05913: unable to contact the standby node stbyrac1

Problem:

I was trying to setup ACFS replication, where one of the steps is to validate keys using acfsutil, which failed with ACFS-05913 error:

[root@rac1 .ssh]# acfsutil repl info -c -u oggrepl stbyrac1 stbyrac2 /GG
acfsutil repl info: ACFS-05913: unable to contact the standby node stbyrac1
acfsutil repl info: ACFS-05913: unable to contact the standby node stbyrac2

Cause: 

An attempt to use the ping utility to contact a standby node failed.

Solution:

Enable ICMP traffic between these nodes and retry validation:

[root@rac1 .ssh]# acfsutil repl info -c -u oggrepl stbyrac1 stbyrac2 /GG
A valid 'ssh' connection was detected for standby node stbyrac1 as user oggrepl.
A valid 'ssh' connection was detected for standby node stbyrac2 as user oggrepl.

srvctl start filesystem hangs

The title of this post is general, there can be a lot of reasons why srvctl start filesystem hangs. The aim of this blog post is to share one of the reasons only.

Problem:

I’ve created ACFS volume and added it to srvctl:

$ asmcmd volcreate -G OGG -s 10G ACFSGG
# srvctl add filesystem -device /dev/asm/acfsgg-11 -path /GG_HOME -volume acfsgg -diskgroup OGG -user oracle -fstype ACFS

then tried to start the filesystem using:

# srvctl start filesystem -device /dev/asm/acfsgg-11

Which hanged.

Troubleshooting:

I’ve checked logs under trace folder under GI base, but could not find any clue. Even worse, stopping filesystem was also hanging.

But let’s stop here, the file that should have been checked was really there, but I missed it and checked wrong files. The file name that shows the necessary error is mount_<process id>.trc and is definitely located under trace folder. So instead of manually mounting filesystem to see the error, you can just open that mount_<process id>.trc and you will see the reason there.

Then I tried manual mounting of the filesystem, without srvctl:

[root@stbyrac1 trace]# /bin/mount -t acfs  /dev/asm/acfsgg-11 /GG_HOME
mount.acfs: ACFS-03037: not an ACFS file system

saw the error, which explained what was happening. My volume was not formatted with acfs filesystem. Somehow I missed that step on the standby cluster, so just a human error, but srvctl at least should have said that instead of hanging and placing info in trace file.

Solution:

Format ACFS volume:

[root@stbyrac1 trace]# mkfs -t acfs /dev/asm/acfsgg-11
mkfs.acfs: version                   = 19.0.0.0.0
mkfs.acfs: on-disk version           = 46.0
mkfs.acfs: volume                    = /dev/asm/acfsgg-11
mkfs.acfs: volume size               = 10737418240  (  10.00 GB )
mkfs.acfs: Format complete.

Because the start and stop operations are hanged, you need to mount filesystem on all database nodes manually:

[root@stbyrac1 ~]# /bin/mount -t acfs  /dev/asm/acfsgg-11 /GG_HOME
[root@stbyrac1 ~]# /bin/mount -t acfs  /dev/asm/acfsgg-11 /GG_HOME

Now try to stop and start filesystem, to make sure srvctl is able to do it’s job without any manual interaction:

[root@stbyrac1 ~]# srvctl stop filesystem -device /dev/asm/acfsgg-11
[root@stbyrac1 ~]# srvctl start filesystem -device /dev/asm/acfsgg-11

ACFS Input/output error on OS kernel 3.10.0-957

Problem:

My ACFS volume has intermittent problems, when I run ls command I get ls: cannot access sm: Input/output error and user/group ownership contains question marks.

[root@primrac1 GG_HOME]# ll
ls: cannot access ma: Input/output error
ls: cannot access sm: Input/output error
ls: cannot access deploy: Input/output error
total 64
d????????? ? ? ? ? ? deploy
drwx------ 2 oracle oinstall 65536 Sep 28 21:05 lost+found
d????????? ? ? ? ? ? ma
d????????? ? ? ? ? ? sm

Reason:

Doc ID 2561145.1 mentions that kernel 3.10.0-957 changed behaviour of d_splice_alias interface which is used by ACFS driver.

My kernel is also the same:

[root@primrac1 GG_HOME]# uname -r
3.10.0-957.21.3.el7.x86_64

Solution:

Download and apply patch 29963428 on GI home.

[root@primrac1 29963428]# /u01/app/19.3.0/grid/OPatch/opatchauto apply -oh /u01/app/19.3.0/grid

Patch Planner to check and request conflict patches

Problem:

Recently, I was applying p29963428_194000ACFSRU_Linux-x86-64.zip on top of 19.4 GI home and got the following error:

==Following patches FAILED in analysis for apply:
 Patch: /u01/swtmp/29963428/29963428
 Log: /u01/app/19.3.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-08-07_10-07-56AM_1.log
 Reason: Failed during Analysis: CheckConflictAgainstOracleHome Failed, [ Prerequisite Status: FAILED, Prerequisite output: 
 Summary of Conflict Analysis:
 There are no patches that can be applied now.
 Following patches have conflicts. Please contact Oracle Support and get the merged patch of the patches : 
 29851014, 29963428
 Conflicts/Supersets for each patch are:
 Patch : 29963428
 Bug Conflict with 29851014 Conflicting bugs are: 29039918, 27494830, 29338628, 29031452, 29264772, 29760083, 28855761 ... 
 After fixing the cause of failure Run opatchauto resume

Solution:

Oracle MOS note ID 1317012.1 describes steps how to check such conflicts and request conflict/merged patches in previous:

1. Run lsinventory from the target home:

[grid@rac1 ~]$ /u01/app/19.3.0/grid/OPatch/opatch lsinventory > GI_lsinventory.txt

2. Logon to support.oracle.com -> Click the “Patch and Updates” tab -> Enter the patch number you want to apply:

2. Click Analyze with OPatch…

3. Attach GI_lsinventory.txt file created in the first step and click “Analyze for Conflict”:

4. Wait for a while and you will see the result. According to it, patch 29963428 conflicts with my current patches:

From the same screen I can “Request Patch”.

5. After clicking “Request Patch” button I got the following error:

Click “See Details”:

The message actually means that fix for the same bug is already included in currently installed 19.4.0.0.190716ACFSRU.

So I don’t have to apply 29963428 patch. I wanted to share the steps with you , because the mentioned tool is really useful.

Oracle Golden Gate Setup on FlashGrid enabled clusters

Network Access:

  1. Setup VPC peering between sites.

2. Make sure that port UDP/4801 is open via the security group settings for each side.

Preparing the System:

  1. /etc/hosts file on each side must contain entries about other side:

10.30.0.5 rac1-hq-ext.example.com rac1-hq-ext
10.30.0.6 rac2-hq-ext.example.com rac2-hq-ext
10.30.0.4 racq-hq-ext.example.com racq-hq-ext
10.40.0.6 rac1-dr-ext.example.com rac1-dr-ext
10.40.0.5 rac2-dr-ext.example.com rac2-dr-ext
10.40.0.4 racq-dr-ext.example.com racq-dr-ext

2. On the first node of the source cluster, add entries in nodes section about target cluster in /etc/flashgrid-clan.cfg:

nodes = {‘rac1-hq’: {‘address’: ‘rac1-hq-ext’, ‘id’: 101, ‘role’: ‘database’},
‘rac2-hq’: {‘address’: ‘rac2-hq-ext’, ‘id’: 102, ‘role’: ‘database’},
‘racq-hq’: {‘address’: ‘racq-hq-ext’, ‘id’: 103, ‘role’: ‘quorum’},
‘rac1-dr’: {‘address’: ‘rac1-dr-ext’, ‘id’: 201, ‘role’: ‘database’},
‘rac2-dr’: {‘address’: ‘rac2-dr-ext’, ‘id’: 202, ‘role’: ‘database’},
‘racq-dr’: {‘address’: ‘racq-dr-ext’, ‘id’: 203, ‘role’: ‘quorum’}}

3. Modify /home/fg/.ssh/authorized_keys file on each cluster node and add public key of the other side.

4. Run the following command from the first node of the source cluster to deploy clan config:

 # flashgrid-clan-cfg deploy-config -f


Create shared folder for Golden Gate on ACFS

Create separate diskgroup called GGDG for Golden Gate software and create the ACFS-shared filesystem on it.

  1. Log on to the first node of the RAC cluster as the grid user.
  2. Create a volume:

$ sudo su – grid

$ asmcmd

ASMCMD> volcreate -G GGDG -s 10G ACFSGG

3. Determine the device name of the volume:

ASMCMD> volinfo –all

Diskgroup Name: GGDG

Volume Name: ACFSGG
Volume Device: /dev/asm/acfsgg-458
State: ENABLED
Size (MB): 10240
Resize Unit (MB): 512
Redundancy: MIRROR
Stripe Columns: 8
Stripe Width (K): 1024
Usage:
Mountpath:

4. Create the filesystem on the volume:

[grid@rac1-hq ~]$ mkfs -t acfs /dev/asm/acfsgg-458

5. As the root, create an empty directory and mount ACFS volume:

# mkdir /GG_HOME
# chmod 775 /GG_HOME
# chown oracle:oinstall /GG_HOME

Setup the file system to be automounted by Clusterware:

  1. Using srvctl add filesystem

# srvctl add filesystem -device /dev/asm/acfsgg-458 -path /GG_HOME -volume acfsgg -diskgroup GGDG -user oracle -fstype ACFS -description “ACFS for Golden Gate”

2. Start filesystem service:

# srvctl start filesystem -device /dev/asm/acfsgg-458

3. Do the same steps on the target cluster for creating ACFS-shared filesystem for GG.

Installing the GoldenGate software

  1. Download and install Golden Gate software on shared ACFS mount point. Software binaries (123012_fbo_ggs_Linux_x64_shiphome.zip) can be downloaded from http://www.oracle.com/technetwork/middleware/goldengate/downloads/index.html

2. Place downloaded software in /tmp/Install directory. Give Oracle Database owner the necessary permissions and install the software via oracle user:

# chown oracle:oinstall /tmp/Install/123012_fbo_ggs_Linux_x64_shiphome.zip

# su – oracle
$ cd /tmp/Install/

$ unzip 123012_fbo_ggs_Linux_x64_shiphome.zip
$ cd fbo_ggs_Linux_x64_shiphome/Disk1

$ ./runInstaller -silent -nowait -showProgress INSTALL_OPTION=ORA12c SOFTWARE_LOCATION=/GG_HOME/home_1 START_MANAGER=false MANAGER_PORT= DATABASE_LOCATION= INVENTORY_LOCATION=/u01/app/oraInventory UNIX_GROUP_NAME=oinstall

3. Do the same steps on the target cluster also.

Install Oracle Grid Infrastructure Standalone Agents

  1. Download the software from http://www.oracle.com/technetwork/database/database-technologies/clusterware/downloads/index.html
  2. Place downloaded software in /tmp/Install. Give grid user necessary permissions. From the first node of the cluster run the following via grid user:

# chown grid:dba /tmp/Install/xagpack81b.zip
# su – grid
$ cd /tmp/Install
$ unzip xagpack81b.zip
$ ./xag/xagsetup.sh –install –directory /u01/app/grid/xag –all_nodes
Installing Oracle Grid Infrastructure Agents on: rac1-hq
Installing Oracle Grid Infrastructure Agents on: rac2-hq
Done.

3. Do the same steps on the target cluster also.


Create application VIP

  1. From the first node of the target cluster as root create application VIP:

# $GRID_HOME/bin/appvipcfg create -network=1 -ip=192.168.1.250 -vipname=gg_vip_dr -user=root

2. Start VIP:

# crsctl start resource gg_vip_dr

3. Allow oracle user to start the vip:

# crsctl setperm resource gg_vip_dr -u user:oracle:r-x

Application VIP is necessary on the target server only. If you plan to have dual systems and each cluster can be a target, then create VIP on each side.

XAG Registration

  1. From the first node as oracle user create Golden Gate Agent.

On source:

$ /u01/app/grid/xag/bin/agctl add goldengate gg_replicate –gg_home /GG_HOME/home_1 –instance_type source –nodes rac1-hq,rac2-hq –filesystems ora.ggdg.acfsgg.acfs –databases ora.orcl.db –oracle_home /u01/app/oracle/product/12.2.0/dbhome_1

On target:

$ /u01/app/grid/xag/bin/agctl add goldengate gg_replicate –gg_home /GG_HOME/home_1 –instance_type target –nodes rac1-dr,rac2-dr  –vip_name gg_vip_dr –filesystems ora.ggdg.acfsgg.acfs –databases ora.orcl.db –oracle_home /u01/app/oracle/product/12.2.0/dbhome_1

ArchiveLog mode setup for source database

The steps to convert database to archivelog mode:

  1. Stop database if it is running:

$ srvctl stop database -db orcl

2. Do the following from to the first node only:

SQL> startup mount;

SQL> alter system set DB_RECOVERY_FILE_DEST_SIZE=10G;

SQL> alter system set db_recovery_file_dest=’+FRA’;

SQL> alter database archivelog;

SQL> alter database add supplemental log data;

SQL> shutdown immediate;

3. Start database

$ srvctl start database -db orcl

Schema setup on source and target

Set the following parameters on each side:

  1. Set enable_goldengate_replication parameter to true:

SQL> alter system set enable_goldengate_replication=true;

2. For integrated capture, set streams_pool_size parameter:

SQL> alter system set streams_pool_size=4G scope=spfile;

3. Create chemas on source and target databases:

SQL> create tablespace ggcw datafile ‘+DATA’ size 100m autoextend on next 5m maxsize unlimited;

SQL> create user ggcw identified by ggcw default tablespace ggcw temporary tablespace temp quota unlimited on ggcw;

SQL> create user ggcw identified by ggcw;
SQL> grant connect, resource to ggcw;
SQL> grant select any dictionary, select any table to ggcw;
SQL> grant create table to ggcw;
SQL> grant flashback any table to ggcw;
SQL> grant execute on dbms_flashback to ggcw;
SQL> grant execute on utl_file to ggcw;
SQL> grant create any table to ggcw;
SQL> grant insert any table to ggcw;
SQL> grant update any table to ggcw;
SQL> grant delete any table to ggcw;
SQL> grant drop any table to ggcw;

SQL> @/GG_HOME/home_1/marker_setup
Enter Oracle GoldenGate schema name:ggcw

SQL> @/GG_HOME/home_1/ddl_setup
Enter Oracle GoldenGate schema name:ggcw
SQL> @/GG_HOME/home_1/role_setup
Enter Oracle GoldenGate schema name:ggcw
SQL> grant ggs_ggsuser_role to ggcw;
SQL> @/GG_HOME/home_1/ddl_enable
SQL> @/GG_HOME/home_1/ddl_pin ggcw
SQL> @/GG_HOME/home_1/sequence.sql ggcw

 

Setup a basic configuration of OGG

  1. To be able to run GGSCI command create the following symbolic links in /GG_HOME/home_1 via oracle user on both sides:

$ cd /GG_HOME/home_1
$ ln -s /u01/app/oracle/product/12.2.0/dbhome_1/lib/libnnz12.so libnnz12.so
$ ln -s /u01/app/oracle/product/12.2.0/dbhome_1/lib/libclntsh.so.12.1 libclntsh.so.12.1
$ ln -s /u01/app/oracle/product/12.2.0/dbhome_1/lib/libons.so libons.so
$ ln -s /u01/app/oracle/product/12.2.0/dbhome_1/lib/libclntshcore.so.12.1 libclntshcore.so.12.1

2. Create subdirs from GGSCI command line interface on source and target:

[oracle@rac1-hq ~]$ . oraenv
ORACLE_SID = [orcl1] ?
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/12.2.0/dbhome_1
[oracle@rac1-hq ~]$ /GG_HOME/home_1/ggsci

GGSCI> create subdirs

3. Enable Supplemental logging on source database tables:

GGSCI> DBLOGIN USERID ggcw PASSWORD ggcw

GGSCI> ADD TRANDATA HR.*

4. Edit global parameters on source and target and create checkpoint table:

GGSCI> edit params ./GLOBALS

GGSCHEMA ggcw
CHECKPOINTTABLE ggcw.CKPTAB

GGSCI> ADD CHECKPOINTTABLE

5. Create manager parameter file on source and target:

GGSCI > edit params mgr

PORT 7809
AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60
AUTOSTART ER *

6. Create Integrated and Classic extracts.

6.1 Integrated capture:

Make sure you have set the correct value for ORACLE_SID.

GGSCI> DBLOGIN USERID ggcw PASSWORD ggcw

GGSCI> register extract ext1 database

GGSCI> add extract ext1, integrated tranlog, begin now

GGSCI> ADD EXTTRAIL ./dirdat/rt, EXTRACT ext1, MEGABYTES 100

GGSCI> EDIT PARAMS ext1

EXTRACT ext1
USERID ggcw, PASSWORD ggcw
RMTHOST 192.168.1.250, MGRPORT 7809
RMTTRAIL ./dirdat/rt
TABLE hr.emp;

Note: The RMTHOST parameter must reference the resource VIP IP of the target.

6.2 Classic capture:

6.2.1 Please apply patch for bug 27379190 before using classic capture, see Doc ID 2360874.1

6.2.2 If archivelogs are located on ASM, then need to add the following entries in $ORACLE_HOME/network/admin/tnsnames.ora:

Cluster Node 1:

ASM =
(ADDRESS=(PROTOCOL=BEQ)
(PROGRAM=/u01/app/12.2.0/grid/bin/oracle)
(ARGV0=oracle+ASM1)
(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=BEQ)))’)
(ENVS=’ORACLE_HOME=/u01/app/12.2.0/grid,ORACLE_SID=+ASM1′))

Cluster Node 2:

ASM =
(ADDRESS=(PROTOCOL=BEQ)
(PROGRAM=/u01/app/12.2.0/grid/bin/oracle)
(ARGV0=oracle+ASM2)
(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=BEQ)))’)
(ENVS=’ORACLE_HOME=/u01/app/12.2.0/grid,ORACLE_SID=+ASM2′))

6.2.3 Set the correct value for ORACLE_SID and login to the database using GGSCI command line interface:

GGSCI> DBLOGIN USERID ggcw PASSWORD ggcw

6.2.4 Add extract with number of threads:

GGSCI> ADD EXTRACT ext2, TRANLOG, THREADS 2, BEGIN NOW

GGSCI> ADD EXTTRAIL ./dirdat/et, EXTRACT ext2, MEGABYTES 100

GGSCI> EDIT PARAMS ext2

EXTRACT ext2
USERID ggcw, PASSWORD ggcw
TRANLOGOPTIONS ASMUSER SYS@ASM, ASMPASSWORD MyPassword2017
RMTHOST 192.168.1.250, MGRPORT 7809
RMTTRAIL ./dirdat/et
TABLE hr.sal;

6.2.5 Create replicats on target :

GGSCI> DBLOGIN USERID ggcw, PASSWORD ggcw

GGSCI> ADD REPLICAT rep1, EXTTRAIL ./dirdat/rt

GGSCI> ADD REPLICAT rep2, EXTTRAIL ./dirdat/et

At this point if the ADD REPLICAT command fails with the following error “RROR: No checkpoint table specified for ADD REPLICAT” simply exit that GGSCI session and then start another one before issuing ADD REPLICAT. The ADD REPLICAT command fails if issued from the same session where the GLOBALS file was created.

GGSCI> EDIT PARAMS rep1

REPLICAT rep1
ASSUMETARGETDEFS
USERID ggcw, PASSWORD ggcw
MAP hr.emp, TARGET hr.emp;

GGSCI> EDIT PARAMS rep2

REPLICAT rep2
ASSUMETARGETDEFS
USERID ggcw, PASSWORD ggcw
MAP hr.sal, TARGET hr.sal;

7. Start golden gate on source and target servers via oracle user:

$ /u01/app/grid/xag/bin/agctl start goldengate gg_replicate