TCPS configuration: ora.LISTENER.lsnr is in INTERMEDIATE state

Problem

After configuring TCPS endpoint in listener configuration using the following way:

In srvctl:

[grid@rac1 ~]$ srvctl modify listener -p "TCP:1522/TCPS:1524"

In listener.ora file:

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac1.example.com)(PORT = 1522))
      (ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
      (ADDRESS = (PROTOCOL = TCPS)(HOST =rac1.example.com)(PORT = 1524))
    ))

So from the above, we see that I’ve configured as static endpoint registration as well as dynamic.

Restarted listener and saw that ora.LISTENER.lsnr resource left in the INTERMEDIATE state:

# srvctl stop listener;srvctl start listener

# crsctl status res -t|head
...
ora.LISTENER.lsnr 
ONLINE INTERMEDIATE rac1 Not All Endpoints Registered,STABLE
ONLINE INTERMEDIATE rac2 Not All Endpoints Registered,STABLE

Troubelshooting

After checking listener.log file, found the following messages:

Dynamic address is already listened on (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=rac1.example.com)(PORT=1524)))
Dynamic address is already listened on (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac1.example.com)(PORT=1522)))
Dynamic address is already listened on (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac1.example.com)(PORT=1522)))

So, due to the static registration, the dynamic failed and Agent thought it could not register endpoints, which left resource in the INTERMEDIATE state (which should not be normal I think) 

I tried to disable dynamic registration by setting ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER to OFF in listener.ora file, but after modifying and restarting listener using srvctl Oracle Agent changed the value back to ON. I could not find a way to disable it so far.

The second option was to remove static entries from listener.ora file and leave only dynamic registration, but after that, the important endpoint on TCPS/1524 was not registered:

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=192.168.1.11)(PORT=1524)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.1)(PORT=1522)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.11)(PORT=1522)))

The endpoint that I want is 192.168.1.1:1524 and not 192.168.1.11:1524

Workaround

Configured dynamic registration in srvctl for TCP/1522 port only:

[grid@rac1 ~]$ srvctl modify listener -p "TCP:1522"

Configured static registration in listener.ora file for TCPS/1524:

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
      (ADDRESS = (PROTOCOL = TCPS)(HOST =rac1.example.com)(PORT = 1524))
    )
  )

After restarting listener, the status became ONLINE:

ora.LISTENER.lsnr
              ONLINE  ONLINE   rac1   STABLE
              ONLINE  ONLINE   rac2   STABLE

Another workaround that my colleage found is the following:

Set ENDPOINTS to null in crsctl (in srvctl it’s not possible):

# crsctl modify resource ora.LISTENER.lsnr -attr "ENDPOINTS=" –unsupported

And register all endpoints statically in listener.ora. Using that way, no dynamic registration is triggered, only static endpoints will be visible.

Please comment bellow, if you have a better way. Thank you!

Listener on TCPS, TNS-12557: TNS:protocol adapter not loadable

Error

Listener configured on TCPS protocol cannot be started from 11g SE home:

[oracle@skybase ~]$ lsnrctl start LISTENER_TCPS
...
TNSLSNR for Linux: Version 11.2.0.4.0 - Production
System parameter file is /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Log messages written to /u01/app/oracle/diag/tnslsnr/skybase/listener_tcps/alert/log.xml
Error listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=skybase.example.com)(PORT=1522)))
TNS-12557: TNS:protocol adapter not loadable
 TNS-12560: TNS:protocol adapter error
  TNS-00527: Protocol Adapter not loadable

Reason

TCPS Support for Oracle SE not enabled.

Solution

Enable TCPS support using the following steps:

1. In $ORACLE_HOME/lib, backup libntcps11.a:

[oracle@skybase ~]$ cd $ORACLE_HOME/lib

[oracle@skybase lib]$ cp libntcps11.a libntcps11.a.backup

2. Replace libntcps11.a with libntcps11_ee.a.dbl

[oracle@skybase lib]$ cp libntcps11_ee.a.dbl libntcps11.a

[oracle@skybase lib]$ ls -al libntcps*

-rw-r--r-- 1 oracle oinstall 306314 Aug 14 16:31 libntcps11.a
-rw-r--r-- 1 oracle oinstall 296986 Aug 14 16:30 libntcps11.a.backup
-rw-r--r-- 1 oracle oinstall 306314 Jul 15  2013 libntcps11_ee.a.dbl

3. Stop all running processes from Oracle home:

$ srvctl stop database -d orcl

4. Relink

[oracle@skybase lib]$ pwd
/u01/app/oracle/product/11.2.0/dbhome_1/lib

[oracle@skybase lib]$ relink all
writing relink log to: /u01/app/oracle/product/11.2.0/dbhome_1/install/relink.log

5. Start all previously stopped services and try to start listener again:

[oracle@skybase lib]$ srvctl start database -d orcl

[oracle@skybase lib]$ lsnrctl start LISTENER_TCPS
...
TNSLSNR for Linux: Version 11.2.0.4.0 - Production
System parameter file is /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Log messages written to /u01/app/oracle/diag/tnslsnr/skybase/listener_tcps/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=skybase.example.com)(PORT=1522)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=skybase.example.com)(PORT=1522)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_TCPS
Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date                14-AUG-2020 16:35:11
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/skybase/listener_tcps/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=skybase.example.com)(PORT=1522)))
The listener supports no services
The command completed successfully

ORA-15477: cannot communicate with the volume driver (DBD ERROR: OCIStmtExecute)

Problem:

I had GI Standalone installation, which I’ve deconfigured and configured one node RAC which was successful. Then I’ve tried to create ACFS volume which failed with ORA-15477:

[root@host1 dbs]# asmcmd volcreate -G OGG -s 10G ACFSGG
ORA-15032: not all alterations performed
ORA-15477: cannot communicate with the volume driver (DBD ERROR: OCIStmtExecute)

Reason:

It seems the ACFS/ADMV modules are not loaded:

[root@host1 dbs]# lsmod | grep oracle
oracleacfs           5921415  0
oracleadvm           1236257  0
oracleoks             750688  2 oracleacfs,oracleadvm

Solution:

First of all, I will share two possible solutions, that helped others but not me and one possible solution (3rd) that helped me:

  1. Start module manualy and make sure it’s enabled:
# acfsload start
# acfsload enable

Check if modules is loaded using lsmod | grep oracle and retry volume creation.

2. Reinstall acfs/admv modules manually:

[root@host1 dbs]# acfsroot install
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9314: Removing previous ADVM/ACFS installation.
depmod: ERROR: fstatat(6, uds.ko): No such file or directory
depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
depmod: ERROR: fstatat(6, uds.ko): No such file or directory
depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory
ACFS-9390: The command 'echo '/lib/modules/3.10.0-862.el7.x86_64/extra/usm/oracleadvm.ko
/lib/modules/3.10.0-862.el7.x86_64/extra/usm/oracleoks.ko
/lib/modules/3.10.0-862.el7.x86_64/extra/usm/oracleacfs.ko
' | /sbin/weak-modules --no-initramfs --add-modules 3.10.0-1127.18.2.el7.x86_64 2>&1 |' returned unexpected output that may be important for system configuration:
depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory

depmod: ERROR: fstatat(6, uds.ko): No such file or directory

depmod: ERROR: fstatat(6, uds.ko): No such file or directory

depmod: ERROR: fstatat(6, kvdo.ko): No such file or directory

ACFS-9154: Loading 'oracleoks.ko' driver.
ACFS-9154: Loading 'oracleadvm.ko' driver.
ACFS-9154: Loading 'oracleacfs.ko' driver.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9309: ADVM/ACFS installation correctness verified.

Retry volume creation.

If none of the above helps, do the 3rd solution (which is not available on the internet, it was my decision):

3. Rebuild initramfs

[root@host1 ~]# cp -p /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
[root@host1 ~]# dracut -f
[root@host1 ~]# reboot

After restart, you should be able to create volume.

ACFS-05913: unable to contact the standby node stbyrac1

Problem:

I was trying to setup ACFS replication, where one of the steps is to validate keys using acfsutil, which failed with ACFS-05913 error:

[root@rac1 .ssh]# acfsutil repl info -c -u oggrepl stbyrac1 stbyrac2 /GG
acfsutil repl info: ACFS-05913: unable to contact the standby node stbyrac1
acfsutil repl info: ACFS-05913: unable to contact the standby node stbyrac2

Cause: 

An attempt to use the ping utility to contact a standby node failed.

Solution:

Enable ICMP traffic between these nodes and retry validation:

[root@rac1 .ssh]# acfsutil repl info -c -u oggrepl stbyrac1 stbyrac2 /GG
A valid 'ssh' connection was detected for standby node stbyrac1 as user oggrepl.
A valid 'ssh' connection was detected for standby node stbyrac2 as user oggrepl.

srvctl start filesystem hangs

The title of this post is general, there can be a lot of reasons why srvctl start filesystem hangs. The aim of this blog post is to share one of the reasons only.

Problem:

I’ve created ACFS volume and added it to srvctl:

$ asmcmd volcreate -G OGG -s 10G ACFSGG
# srvctl add filesystem -device /dev/asm/acfsgg-11 -path /GG_HOME -volume acfsgg -diskgroup OGG -user oracle -fstype ACFS

then tried to start the filesystem using:

# srvctl start filesystem -device /dev/asm/acfsgg-11

Which hanged.

Troubleshooting:

I’ve checked logs under trace folder under GI base, but could not find any clue. Even worse, stopping filesystem was also hanging.

But let’s stop here, the file that should have been checked was really there, but I missed it and checked wrong files. The file name that shows the necessary error is mount_<process id>.trc and is definitely located under trace folder. So instead of manually mounting filesystem to see the error, you can just open that mount_<process id>.trc and you will see the reason there.

Then I tried manual mounting of the filesystem, without srvctl:

[root@stbyrac1 trace]# /bin/mount -t acfs  /dev/asm/acfsgg-11 /GG_HOME
mount.acfs: ACFS-03037: not an ACFS file system

saw the error, which explained what was happening. My volume was not formatted with acfs filesystem. Somehow I missed that step on the standby cluster, so just a human error, but srvctl at least should have said that instead of hanging and placing info in trace file.

Solution:

Format ACFS volume:

[root@stbyrac1 trace]# mkfs -t acfs /dev/asm/acfsgg-11
mkfs.acfs: version                   = 19.0.0.0.0
mkfs.acfs: on-disk version           = 46.0
mkfs.acfs: volume                    = /dev/asm/acfsgg-11
mkfs.acfs: volume size               = 10737418240  (  10.00 GB )
mkfs.acfs: Format complete.

Because the start and stop operations are hanged, you need to mount filesystem on all database nodes manually:

[root@stbyrac1 ~]# /bin/mount -t acfs  /dev/asm/acfsgg-11 /GG_HOME
[root@stbyrac1 ~]# /bin/mount -t acfs  /dev/asm/acfsgg-11 /GG_HOME

Now try to stop and start filesystem, to make sure srvctl is able to do it’s job without any manual interaction:

[root@stbyrac1 ~]# srvctl stop filesystem -device /dev/asm/acfsgg-11
[root@stbyrac1 ~]# srvctl start filesystem -device /dev/asm/acfsgg-11

How to identify OS is Oracle Linux or RHEL?

There are several ways to identify that, I will suggest one of them using rpm -qf, that finds out what package a file belongs to:

Oracle Linux:

#  rpm -qf /etc/redhat-release
oraclelinux-release-7.8-1.0.7.el7.x86_64

RHEL:

# rpm -qf /etc/redhat-release
redhat-release-server-7.8-2.el7.x86_64

Identify whether database is RAC or SI using crsctl

Environment:

GI – configured as a cluster
orclsingle – configured as a Single Instance
orclrac – configured as RAC
orclone – configured as RACOneNode

Q1: How to identify whether database is RAC or SI?

A1: crsctl stat res shows parameter CLUSTER_DATABASE. In case of RAC or RACOneNode the value is TRUE otherwise FALSE

# crsctl stat res ora.orclrac.db -p|grep CLUSTER_DATABASE
CLUSTER_DATABASE=true

# crsctl stat res ora.orclone.db -p|grep CLUSTER_DATABASE
CLUSTER_DATABASE=true

# crsctl stat res ora.orclsingle.db -p|grep CLUSTER_DATABASE
CLUSTER_DATABASE=false

Please note that the same can be identified from sqlplus by selecting CLUSTER_DATABASE initialization parameter.

Q2: How to identify whether my database is RAC or RACOneNode?

A2: srvctl config database shows Type, that can have the following values SINGLE, RAC, or RACOneNode.

$ srvctl config database -db orclrac|grep Type
Type: RAC

$ srvctl config database -db orclone|grep Type
Type: RACOneNode

$ srvctl config database -db orclsingle|grep Type
Type: SINGLE

Resize ASM disks in GCP (FG enabled cluster)

Increasing disks in GCP is an online procedure and you don’t have to stop the VM.

1. If the node is a database node, stop all local database instances running on the node.

2. Go to the Disks page -> click the name of the disk that you want to resize -> click Edit -> enter the new size in Size field -> Save.

Please note that all data disks (not quorum disk) must be increased under the same diskgroup, otherwise ASM will not let you to have different sized disks.

Choose another data disks and repeat the same steps.

3. Run the following on database nodes via root user:

# for i in /sys/block/*/device/rescan; do echo 1 > $i; done

4. Check new disk sizes:

If it is Fg cluster, Phys_GiB column must show increased size:

[root@rac1 ~]# flashgrid-dg show -G DATA
...
------------------------------------------------------------
FailGroup ASM_Disk_Name Drive Phys_GiB  ASM_GiB  Status
------------------------------------------------------------
RAC1 RAC1$SHARED_2 /dev/flashgrid/rac1.shared-2 15 10 ONLINE
RAC2 RAC2$SHARED_2 /dev/flashgrid/rac2.shared-2 15 10 ONLINE
RACQ RACQ$SHARED_3 /dev/flashgrid/racq.shared-3 1  1  ONLINE
------------------------------------------------------------

In case it is a normal cluster, OS_MB must show increased size:

# su - grid
$ sqlplus / as sysasm
SQL> select TOTAL_MB/1024,OS_MB/1024 from v$asm_disk where GROUP_NUMBER=2;

TOTAL_MB/1024 OS_MB/1024
------------- ----------
	   10	      15
	   10	      15
	    1	       1

5. Connect to the ASM from any database node and run:

# su - grid
$ sqlplus / as sysasm
SQL> ALTER DISKGROUP DATA RESIZE ALL; 

The above command will resize all disks in the specified diskgroup based on their size returned by OS.

6. Check new sizes:

Fg cluster:

[root@rac1 ~]# flashgrid-dg show -G DATA
...
------------------------------------------------------------
RAC1 RAC1$SHARED_2 /dev/flashgrid/rac1.shared-2 15 15 ONLINE
RAC2 RAC2$SHARED_2 /dev/flashgrid/rac2.shared-2 15 15 ONLINE
RACQ RACQ$SHARED_3 /dev/flashgrid/racq.shared-3 1  1  ONLINE
------------------------------------------------------------

Normal cluster:

SQL> select TOTAL_MB/1024,OS_MB/1024 from v$asm_disk where GROUP_NUMBER=2 ;

TOTAL_MB/1024 OS_MB/1024
------------- ----------
	   15	      15
	   15	      15
	    1	       1

Phys_GiB and ASM_GiB should have the same increased size, which means disks are resized and you can use extended space.

Resize ASM disks in Azure (FG enabled cluster)

1. If the node is a database node, stop all local database instances running on the node.

2. Stop database VM from Azure console. In azure you are not able to resize disks while VM is running, so we need to stop it first.

3. Increase all database disks belonging to the same diskgroup to the desired size. Make sure disks in the same diskgroup have the same sizes.

To resize disk, click VM -> Disks -> choose data disk (in my case 10GB disk is a DATA disk)

After clicking the above disk, you will be redirected to the following screen, choose Configuration -> enter desired disk size (in my case I’ve changed from 10 to 15) -> Save

4. Start the database node.

5. Repeat 1-4 steps for the next database nodes (no need to increase disks for quorum, it is only necessary for the database nodes)

6. Check new disk sizes:

If it is Fg cluster, Phys_GiB column must show increased size:

[root@rac1 ~]# flashgrid-dg show -G DATA
...
------------------------------------------------------------
FailGroup ASM_Disk_Name Drive Phys_GiB  ASM_GiB  Status
------------------------------------------------------------
RAC1    RAC1$LUN2     /dev/flashgrid/rac1.lun2 15  10 ONLINE
RAC2    RAC2$LUN2     /dev/flashgrid/rac2.lun2 15  10 ONLINE
RACQ    RACQ$LUN3     /dev/flashgrid/racq.lun3  1  1  ONLINE
------------------------------------------------------------

In case it is a normal cluster, OS_MB must show increased size:

# su - grid
$ sqlplus / as sysasm
SQL> select TOTAL_MB/1024,OS_MB/1024 from v$asm_disk where GROUP_NUMBER=2 ;

TOTAL_MB/1024 OS_MB/1024
------------- ----------
	   10	      15
	   10	      15
	    1	       1

7. Connect to the ASM from any database node and run:

# su - grid
$ sqlplus / as sysasm
SQL> ALTER DISKGROUP DATA RESIZE ALL; 

The above command will resize all disks in the specified diskgroup based on their size returned by OS.

8. Check new sizes:

Fg cluster:

[root@rac1 ~]# flashgrid-dg show -G DATA
...
------------------------------------------------------------
FailGroup ASM_Disk_Name Drive Phys_GiB  ASM_GiB  Status
------------------------------------------------------------
RAC1    RAC1$LUN2     /dev/flashgrid/rac1.lun2 15  15 ONLINE
RAC2    RAC2$LUN2     /dev/flashgrid/rac2.lun2 15  15 ONLINE
RACQ    RACQ$LUN3     /dev/flashgrid/racq.lun3  1  1  ONLINE
------------------------------------------------------------

Normal cluster:

SQL> select TOTAL_MB/1024,OS_MB/1024 from v$asm_disk where GROUP_NUMBER=2 ;

TOTAL_MB/1024 OS_MB/1024
------------- ----------
	   15	      15
	   15	      15
	    1	       1

Phys_GiB and ASM_GiB should have the same increased size, which means disks are resized and you can use extended space.

‘udev’ rules continuously being reloaded resulted in ASM nvme disks going offline

Environment:

Linux Server release 7.2
kernel-3.10.0-514.26.2.el7.x86_64

Problem:

When Oracle processes are opening the device for writing and then closing it, this synthesizes a change event. And udev rules having  ACTION=="add|change" gets reloaded. This behavior causes ASM nvme disks to go offline:

Thu Jul 09 16:33:16 2020
WARNING: Disk 18 (rac1$disk1) in group 2 mode 0x7f is now being offlined

Fri Jul 10 10:04:34 2020
WARNING: Disk 19 (rac1$disk5) in group 2 mode 0x7f is now being offlined

Fri Jul 10 13:45:45 2020
WARNING: Disk 15 (rac1$disk8) in group 2 mode 0x7f is now being offlined

Solution:

To suppress the false positive change events disable the inotify watch for devices used for Oracle ASM using following steps:

  1.  Create /etc/udev/rules.d/96-nvme-nowatch.rules file with just one line in it:
ACTION=="add|change", KERNEL=="nvme*", OPTIONS:="nowatch"

2. After creating the file please run the following to activate the rule:

# udevadm control --reload-rules
# udevadm trigger --type=devices --action=change

The above command will reload the complete udev configuration and will trigger all the udev rules. On a busy production system this could disrupt ongoing operations, applications running on the server. Please use the above command during a scheduled maintenance window only.

Source: https://access.redhat.com/solutions/1465913 + our experience with customers.