ORA-15477: cannot communicate with the volume driver (DBD ERROR: OCIStmtExecute)

Problem:

I had GI Standalone installation, which I’ve deconfigured and configured one node RAC which was successful. Then I’ve tried to create ACFS volume which failed with ORA-15477:

[root@host1 dbs]# asmcmd volcreate -G OGG -s 10G ACFSGG
ORA-15032: not all alterations performed
ORA-15477: cannot communicate with the volume driver (DBD ERROR: OCIStmtExecute)

Reason:

It seems the ACFS/ADMV modules are not loaded:

[root@host1 dbs]# lsmod | grep oracle
oracleacfs           5921415  0
oracleadvm           1236257  0
oracleoks             750688  2 oracleacfs,oracleadvm

Solution:

First of all, I will share two possible solutions, that helped others but not me and one possible solution (3rd) that helped me:

  1. Start module manualy and make sure it’s enabled:
# acfsload start
# acfsload enable

Check if modules is loaded using lsmod | grep oracle and retry volume creation.

2. Reinstall acfs/admv modules manually:

[root@host1 dbs]# acfsroot install
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9314: Removing previous ADVM/ACFS installation.
depmod: ERROR: fstatat(6, uds.ko): No such file or directory
depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
depmod: ERROR: fstatat(6, uds.ko): No such file or directory
depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory
ACFS-9390: The command 'echo '/lib/modules/3.10.0-862.el7.x86_64/extra/usm/oracleadvm.ko
/lib/modules/3.10.0-862.el7.x86_64/extra/usm/oracleoks.ko
/lib/modules/3.10.0-862.el7.x86_64/extra/usm/oracleacfs.ko
' | /sbin/weak-modules --no-initramfs --add-modules 3.10.0-1127.18.2.el7.x86_64 2>&1 |' returned unexpected output that may be important for system configuration:
depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory

depmod: ERROR: fstatat(6, uds.ko): No such file or directory

depmod: ERROR: fstatat(6, uds.ko): No such file or directory

depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory

ACFS-9154: Loading 'oracleoks.ko' driver.
ACFS-9154: Loading 'oracleadvm.ko' driver.
ACFS-9154: Loading 'oracleacfs.ko' driver.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9309: ADVM/ACFS installation correctness verified.

Retry volume creation.

If none of the above helps, do the 3rd solution (which is not available on the internet, it was my decision):

3. Rebuild initramfs

[root@host1 ~]# cp -p /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
[root@host1 ~]# dracut -f
[root@host1 ~]# reboot

After restart, you should be able to create volume.

ACFS-05913: unable to contact the standby node stbyrac1

Problem:

I was trying to setup ACFS replication, where one of the steps is to validate keys using acfsutil, which failed with ACFS-05913 error:

[root@rac1 .ssh]# acfsutil repl info -c -u oggrepl stbyrac1 stbyrac2 /GG
acfsutil repl info: ACFS-05913: unable to contact the standby node stbyrac1
acfsutil repl info: ACFS-05913: unable to contact the standby node stbyrac2

Cause: 

An attempt to use the ping utility to contact a standby node failed.

Solution:

Enable ICMP traffic between these nodes and retry validation:

[root@rac1 .ssh]# acfsutil repl info -c -u oggrepl stbyrac1 stbyrac2 /GG
A valid 'ssh' connection was detected for standby node stbyrac1 as user oggrepl.
A valid 'ssh' connection was detected for standby node stbyrac2 as user oggrepl.

srvctl start filesystem hangs

The title of this post is general, there can be a lot of reasons why srvctl start filesystem hangs. The aim of this blog post is to share one of the reasons only.

Problem:

I’ve created ACFS volume and added it to srvctl:

$ asmcmd volcreate -G OGG -s 10G ACFSGG
# srvctl add filesystem -device /dev/asm/acfsgg-11 -path /GG_HOME -volume acfsgg -diskgroup OGG -user oracle -fstype ACFS

then tried to start the filesystem using:

# srvctl start filesystem -device /dev/asm/acfsgg-11

Which hanged.

Troubleshooting:

I’ve checked logs under trace folder under GI base, but could not find any clue. Even worse, stopping filesystem was also hanging.

But let’s stop here, the file that should have been checked was really there, but I missed it and checked wrong files. The file name that shows the necessary error is mount_<process id>.trc and is definitely located under trace folder. So instead of manually mounting filesystem to see the error, you can just open that mount_<process id>.trc and you will see the reason there.

Then I tried manual mounting of the filesystem, without srvctl:

[root@stbyrac1 trace]# /bin/mount -t acfs  /dev/asm/acfsgg-11 /GG_HOME
mount.acfs: ACFS-03037: not an ACFS file system

saw the error, which explained what was happening. My volume was not formatted with acfs filesystem. Somehow I missed that step on the standby cluster, so just a human error, but srvctl at least should have said that instead of hanging and placing info in trace file.

Solution:

Format ACFS volume:

[root@stbyrac1 trace]# mkfs -t acfs /dev/asm/acfsgg-11
mkfs.acfs: version                   = 19.0.0.0.0
mkfs.acfs: on-disk version           = 46.0
mkfs.acfs: volume                    = /dev/asm/acfsgg-11
mkfs.acfs: volume size               = 10737418240  (  10.00 GB )
mkfs.acfs: Format complete.

Because the start and stop operations are hanged, you need to mount filesystem on all database nodes manually:

[root@stbyrac1 ~]# /bin/mount -t acfs  /dev/asm/acfsgg-11 /GG_HOME
[root@stbyrac1 ~]# /bin/mount -t acfs  /dev/asm/acfsgg-11 /GG_HOME

Now try to stop and start filesystem, to make sure srvctl is able to do it’s job without any manual interaction:

[root@stbyrac1 ~]# srvctl stop filesystem -device /dev/asm/acfsgg-11
[root@stbyrac1 ~]# srvctl start filesystem -device /dev/asm/acfsgg-11

Resize ASM disks in GCP (FG enabled cluster)

Increasing disks in GCP is an online procedure and you don’t have to stop the VM.

1. If the node is a database node, stop all local database instances running on the node.

2. Go to the Disks page -> click the name of the disk that you want to resize -> click Edit -> enter the new size in Size field -> Save.

Please note that all data disks (not quorum disk) must be increased under the same diskgroup, otherwise ASM will not let you to have different sized disks.

Choose another data disks and repeat the same steps.

3. Run the following on database nodes via root user:

# for i in /sys/block/*/device/rescan; do echo 1 > $i; done

4. Check new disk sizes:

If it is Fg cluster, Phys_GiB column must show increased size:

[root@rac1 ~]# flashgrid-dg show -G DATA
...
------------------------------------------------------------
FailGroup ASM_Disk_Name Drive Phys_GiB  ASM_GiB  Status
------------------------------------------------------------
RAC1 RAC1$SHARED_2 /dev/flashgrid/rac1.shared-2 15 10 ONLINE
RAC2 RAC2$SHARED_2 /dev/flashgrid/rac2.shared-2 15 10 ONLINE
RACQ RACQ$SHARED_3 /dev/flashgrid/racq.shared-3 1  1  ONLINE
------------------------------------------------------------

In case it is a normal cluster, OS_MB must show increased size:

# su - grid
$ sqlplus / as sysasm
SQL> select TOTAL_MB/1024,OS_MB/1024 from v$asm_disk where GROUP_NUMBER=2;

TOTAL_MB/1024 OS_MB/1024
------------- ----------
	   10	      15
	   10	      15
	    1	       1

5. Connect to the ASM from any database node and run:

# su - grid
$ sqlplus / as sysasm
SQL> ALTER DISKGROUP DATA RESIZE ALL; 

The above command will resize all disks in the specified diskgroup based on their size returned by OS.

6. Check new sizes:

Fg cluster:

[root@rac1 ~]# flashgrid-dg show -G DATA
...
------------------------------------------------------------
RAC1 RAC1$SHARED_2 /dev/flashgrid/rac1.shared-2 15 15 ONLINE
RAC2 RAC2$SHARED_2 /dev/flashgrid/rac2.shared-2 15 15 ONLINE
RACQ RACQ$SHARED_3 /dev/flashgrid/racq.shared-3 1  1  ONLINE
------------------------------------------------------------

Normal cluster:

SQL> select TOTAL_MB/1024,OS_MB/1024 from v$asm_disk where GROUP_NUMBER=2 ;

TOTAL_MB/1024 OS_MB/1024
------------- ----------
	   15	      15
	   15	      15
	    1	       1

Phys_GiB and ASM_GiB should have the same increased size, which means disks are resized and you can use extended space.

Resize ASM disks in Azure (FG enabled cluster)

1. If the node is a database node, stop all local database instances running on the node.

2. Stop database VM from Azure console. In azure you are not able to resize disks while VM is running, so we need to stop it first.

3. Increase all database disks belonging to the same diskgroup to the desired size. Make sure disks in the same diskgroup have the same sizes.

To resize disk, click VM -> Disks -> choose data disk (in my case 10GB disk is a DATA disk)

After clicking the above disk, you will be redirected to the following screen, choose Configuration -> enter desired disk size (in my case I’ve changed from 10 to 15) -> Save

4. Start the database node.

5. Repeat 1-4 steps for the next database nodes (no need to increase disks for quorum, it is only necessary for the database nodes)

6. Check new disk sizes:

If it is Fg cluster, Phys_GiB column must show increased size:

[root@rac1 ~]# flashgrid-dg show -G DATA
...
------------------------------------------------------------
FailGroup ASM_Disk_Name Drive Phys_GiB  ASM_GiB  Status
------------------------------------------------------------
RAC1    RAC1$LUN2     /dev/flashgrid/rac1.lun2 15  10 ONLINE
RAC2    RAC2$LUN2     /dev/flashgrid/rac2.lun2 15  10 ONLINE
RACQ    RACQ$LUN3     /dev/flashgrid/racq.lun3  1  1  ONLINE
------------------------------------------------------------

In case it is a normal cluster, OS_MB must show increased size:

# su - grid
$ sqlplus / as sysasm
SQL> select TOTAL_MB/1024,OS_MB/1024 from v$asm_disk where GROUP_NUMBER=2 ;

TOTAL_MB/1024 OS_MB/1024
------------- ----------
	   10	      15
	   10	      15
	    1	       1

7. Connect to the ASM from any database node and run:

# su - grid
$ sqlplus / as sysasm
SQL> ALTER DISKGROUP DATA RESIZE ALL; 

The above command will resize all disks in the specified diskgroup based on their size returned by OS.

8. Check new sizes:

Fg cluster:

[root@rac1 ~]# flashgrid-dg show -G DATA
...
------------------------------------------------------------
FailGroup ASM_Disk_Name Drive Phys_GiB  ASM_GiB  Status
------------------------------------------------------------
RAC1    RAC1$LUN2     /dev/flashgrid/rac1.lun2 15  15 ONLINE
RAC2    RAC2$LUN2     /dev/flashgrid/rac2.lun2 15  15 ONLINE
RACQ    RACQ$LUN3     /dev/flashgrid/racq.lun3  1  1  ONLINE
------------------------------------------------------------

Normal cluster:

SQL> select TOTAL_MB/1024,OS_MB/1024 from v$asm_disk where GROUP_NUMBER=2 ;

TOTAL_MB/1024 OS_MB/1024
------------- ----------
	   15	      15
	   15	      15
	    1	       1

Phys_GiB and ASM_GiB should have the same increased size, which means disks are resized and you can use extended space.

‘udev’ rules continuously being reloaded resulted in ASM nvme disks going offline

Environment:

Linux Server release 7.2
kernel-3.10.0-514.26.2.el7.x86_64

Problem:

When Oracle processes are opening the device for writing and then closing it, this synthesizes a change event. And udev rules having  ACTION=="add|change" gets reloaded. This behavior causes ASM nvme disks to go offline:

Thu Jul 09 16:33:16 2020
WARNING: Disk 18 (rac1$disk1) in group 2 mode 0x7f is now being offlined

Fri Jul 10 10:04:34 2020
WARNING: Disk 19 (rac1$disk5) in group 2 mode 0x7f is now being offlined

Fri Jul 10 13:45:45 2020
WARNING: Disk 15 (rac1$disk8) in group 2 mode 0x7f is now being offlined

Solution:

To suppress the false positive change events disable the inotify watch for devices used for Oracle ASM using following steps:

  1.  Create /etc/udev/rules.d/96-nvme-nowatch.rules file with just one line in it:
ACTION=="add|change", KERNEL=="nvme*", OPTIONS:="nowatch"

2. After creating the file please run the following to activate the rule:

# udevadm control --reload-rules
# udevadm trigger --type=devices --action=change

The above command will reload the complete udev configuration and will trigger all the udev rules. On a busy production system this could disrupt ongoing operations, applications running on the server. Please use the above command during a scheduled maintenance window only.

Source: https://access.redhat.com/solutions/1465913 + our experience with customers.

Resize ASM disks in AWS (FG enabled cluster)

  1. Connect to AWS console https://console.aws.amazon.com
  2. On the left side -> under the section ELASTIC BLOCK STORE -> choose Volumes
  3. Choose necessary disk -> click Actions button -> choose Modify Volume -> change Size
    Please note that all data disks (not quorum disk) must be increased under the same diskgroup, otherwise ASM will not let you to have different sized disks.

Choose another data disks and repeat the same steps.

4. Run the following on database nodes via root user:

# for i in /sys/block/*/device/rescan; do echo 1 > $i; done

5. Check that disks have correct sizes:

# flashgrid-node

6. Connect to the ASM instance from any database node and run:

[grid@rac1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 19.0.0.0.0 - Production on Fri Aug 23 10:17:50 2019
Version 19.4.0.0.0
Copyright (c) 1982, 2019, Oracle.  All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.4.0.0.0

SQL> alter diskgroup GRID resize all; 
Diskgroup altered.

UDEV rules for configuring ASM disks

Problem:

During my previous installations I used the following udev rule on multipath devices:

KERNEL=="dm-[0-9]*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", RESULT=="360050768028200a9a40000000000001c", NAME="oracleasm/asm-disk1", OWNER="oracle", GROUP="asmadmin", MODE="0660"

So to identify the exact disk I used PROGRAM option. The above script looks through `/dev/dm-*` devices and if any of them satisfy the condition, for example:

# scsci_id -gud /dev/dm-3
360050768028200a9a40000000000001c 

then device name will be changed to /dev/oracleasm/asm-disk1, owner:group to grid:asmadmin and permission to 0660

But on my new servers same udev rule was not working anymore. (Of course, it needs more investigation, but our time is really valuable and never enough and if we know another solution that works and is acceptable- let’s just use it)

Solution:

I used udevadm command to identify other properties of these devices and wrote new udev rule (to see all properties, just remove grep):

# udevadm info --query=property --name /dev/mapper/asm1 | grep DM_UUID
DM_UUID=mpath-360050768028200a9a40000000000001c

New udev rule looks like this:

# cat /etc/udev/rules.d/99-oracle-asmdevices.rules
ENV{DM_UUID}=="mpath-360050768028200a9a40000000000001c",  SUBSYSTEM=="block", NAME="oracleasm/asm-disk1", OWNER="grid", GROUP="asmadmin", MODE="0660"

Trigger udev rules:

# udevadm trigger

Verify that name, owner, group and permissions are changed:

# ll /dev/oracleasm/
total 0
brw-rw---- 1 grid asmadmin 253, 3 Jul 17 17:33 asm-disk1

ASM Diskstring default values

In case you have trouble getting the newly added disks to your ASM instance check for the ASM_DISKSTRING init paramter. If ASM_DISKSTRING is NOT SET then the following default is used

Default ASM_DISKSTRING per Operating System

Operating System Default            Search String
=======================================
Solaris (32/64 bit)                        /dev/rdsk/*
Windows NT/XP                          .orcldisk*
Linux (32/64 bit)                          /dev/raw/*

LINUX (ASMLIB)                         ORCL:*
LINUX (ASMLIB)                        /dev/oracleasm/disks/*

HPUX                                             /dev/rdsk/*
HP-UX(Tru 64)                            /dev/rdisk/*
AIX                                                 /dev/*

IF ASM_DISKSTRING is SET  then we should verify that the setting includes the disks that are needed to be seen by ASM.

 

Source from :  https://abchatur.wordpress.com/2011/09/01/asm-diskstring-default-values/

 

Add/Drop ASM disks to DISKGROUP on RAC(or Standalone)

Note: The steps are described for RAC, but you can easily guess what are the steps for the standalone database.

1. First of all find the disk or partition name, that should be added to the ASM.

fdisk -l

My disk partition name is /dev/sdi1.

2. Assign the disk to ORACLEASM.

–On node1

/etc/init.d/oracleasm createdisk DISK7 /dev/sdi1

3. Scan disks in ALL NODES and list them to check if is presented.

–On node1

/etc/init.d/oracleasm scandisks
/etc/init.d/oracleasm listdisks

–On node2

/etc/init.d/oracleasm scandisks
/etc/init.d/oracleasm listdisks

4. Change the environment to the grid infrastructure, by setting ORACLE_SID to +ASM and so on :

$ . oraenv
ORACLE_SID = [media1] ? +ASM1
The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle

# Connect as an SYSASM

sqlplus / as sysasm

Note: If you don’t remember the password for the sysasm user see How to reset SYSASM password.

# Find the diskgroup name

SQL> select name from v$asm_diskgroup;

NAME
——————————
DATA01

# Increase power limit, if you want, to complete rebalance operation in a short time.

SQL>  alter system set asm_power_limit=10

# Indicate disks location by the parameter asm_diskstring

SQL> alter system set asm_diskstring=’ORCL:DISK*’

SQL> alter diskgroup DATA01 add disk ‘ORCL:DISK7’;

It will do the rebalance automatically.

# To drop the disk , do the following:

SQL >  alter diskgroup DATA01 drop disk DISK7;

It will rebalance first and then drops the disk automatically.

You can see the current operation in v$asm_operation view.

Note: Until the view v$asm_operation contains a record you are able to undrop the disks by the following way:

SQL> alter diskgroup DATA01 undrop disks;

If the operation is already completed , you are not able to undrop the disk . But you can re-add the disk , if you want.

That is all.