TNS-12518: TNS:listener could not hand off client connection | TNS-12547: TNS:lost contact

Problem:

In two-node cluster, client was not able to connect to the second node, but connection to the first node was successful.

Connection from SQL developer threw error: Status: Failure - Test failed: IO Error: Got minus one from a read call, connect lapse 16ms, Authentication lapse 0ms

Connection from sqlplus using TNS string showed:

[oracle@rac02 ~]$ sqlplus "sys@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac02.example.com)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=orcl)))" as sysdba

ORA-12537: TNS:connection closed

Listener.log showed:

 2019-07-18T11:19:23.568231+00:00
 TNS-12518: TNS:listener could not hand off client connection
  TNS-12547: TNS:lost contact
   TNS-12560: TNS:protocol adapter error
    TNS-00517: Lost contact
     Linux Error: 32: Broken pipe

Solution:

This problem can happen in other cases (entries in sqlnet.ora .. in our case it was ok) and we could think about network problem, because initially we were trying to connect from the application sever and from the SQL developer remotely. But after getting ORA-12537: TNS:connection closed error while trying to connect via sqlplus from the local server, we could only think about local non-network related problem.

The reason of this problem was that setuid bit was not set on /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle binary:

Problematic node:

[root@rac02 ~]# ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
 -rwxr-s--x 1 oracle asmadmin 408607040 Apr  4 19:51 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

Healthy node:

[oracle@rac01 ~]$ ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle asmadmin 408607040 Apr  4 19:48 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

We have set setuid bit on oracle binary in RDBMS home:

[root@rac02 ~]# chmod u+s /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

[root@rac02 ~]# ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle asmadmin 408607040 Apr  4 19:51 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

The problem was resolved without restarting the database instance, so clients were able to connect to the 2nd node. But because of it was staging cluster – I still restarted the database, I just made sure that database was started with the correct binary.

PRCS-1007 : Server pool RPPCOMS already exists

Problem:

While adding database using srvctl add database command got PRCS-1007 and PRCR-1086 errors:

$ srvctl add database -db RPPCOMS -oraclehome /u01/app/oracle/product/12.1.0/dbhome_1 -spfile +DATA/RPPCOMS/spfileRPPCOMS.ora -pwfile +DATA/RPPCOMS/PASSWORD/pwdrppcoms.256.1005727427 -dbname RPPCOMS  -diskgroup FRA,DATA

PRCS-1007 : Server pool RPPCOMS already exists
PRCR-1086 : server pool ora.RPPCOMS is already registered

Solution:

Remove mentioned server pool via root user:

# crsctl delete serverpool ora.RPPCOMS

And retry adding database service.

Detach diskgroup from 12c GI and attach to 19c GI

Task:

We have two separate Real Application Clusters, one 12c and another 19c. We decided to migrate data from 12c to 19c by simply detaching all ASM disks from the source and attaching to the destination.

Steps:

1. Connect to the 12c GI via grid user and dismount FRA diskgroup on all nodes:

[grid@rac1 ~]$ sqlplus  / as sysasm
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> alter diskgroup FRA dismount;
Diskgroup altered. 
[grid@rac2 ~]$ sqlplus  / as sysasm
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> alter diskgroup FRA dismount;
Diskgroup altered.

You can also use srvctl to stop the diskgroup with one command.

2. Detach disks belonging to the specific diskgroup from 12c cluster and attach to 19c cluster.

3. After ASM disks are visible on 19c cluster, connect as sysasm via grid user and mount the diskgroup:

# Check that there is no FRA resource registered with CRS:

[root@rac1 ~]# crsctl status res -t |grep FRA

# Mount the diskgroup on all nodes

[grid@rac1 ~]$ sqlplus / as sysasm
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> alter diskgroup FRA mount;
Diskgroup altered.
[grid@rac2 ~]$ sqlplus / as sysasm
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> alter diskgroup FRA mount;
Diskgroup altered.

# FRA diskgroup resource will automatically be registered with CRS:

[root@rac1 ~]# crsctl status res -t |grep FRA
ora.FRA.dg(ora.asmgroup)

And data will be there…

Rollback RU patches from 12c GI home using opatchauto

Junior DBAs will find these steps useful 🙂

Environment details:

Two-node Real Application Cluster.
Database version: 12.2.0.1
Applied RU: 16-04-2019

1. Check existing patches

[grid@rac1 ~]$  /u01/app/12.2.0/grid/OPatch/opatch lspatches
29314424;OCW APR 2019 RELEASE UPDATE 12.2.0.1.190416 (29314424)
29314339;Database Apr 2019 Release Update : 12.2.0.1.190416 (29314339)
29301676;ACFS APR 2019 RELEASE UPDATE 12.2.0.1.190416 (29301676)
28566910;TOMCAT RELEASE UPDATE 12.2.0.1.0(ID:180802.1448.S) (28566910)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)
OPatch succeeded.

Note that all these patches are part of RU 16-04-2019.

2. Stop all database instances on that node:

# srvctl stop instance -db orclA -i orclA1

3. Download Release Update 16-04-2019 (p29301687_122010_Linux-x86-64.zip), unzip and go to the unzipped patch location:

To rollback all these patches it is easier to have unzipped Release Update 16-04-2019 patch (all existing patches are part of it) on the server.

If you cannot download zipped RU then you need to indicate all patch ids in the list during opatchauto rollback -id 29314424,29314339,29301676,28566910,26839277

As long as I have unzipped RU on rac1, I will do by the following way:

[root@rac1 ~]# cd /u01/app/sw/29301687

[root@rac1 29301687]# ll
 total 132
 drwxr-x--- 4 grid oinstall     48 Mar 25 01:09 26839277
 drwxr-x--- 4 grid oinstall     48 Mar 25 01:08 28566910
 drwxr-x--- 5 grid oinstall     62 Mar 25 01:03 29301676
 drwxr-x--- 4 grid oinstall     67 Mar 25 01:08 29314339
 drwxr-x--- 5 grid oinstall     62 Mar 25 01:06 29314424
 drwxr-x--- 2 grid oinstall   4096 Mar 25 01:03 automation
 -rw-rw-r-- 1 grid oinstall   5828 Mar 25 01:29 bundle.xml
 -rw-r--r-- 1 grid oinstall 120219 Apr 10 18:07 README.html
 -rw-r----- 1 grid oinstall      0 Mar 25 01:03 README.txt

4. Rollback patches using opatchauto:

[root@rac1 29301687]# /u01/app/12.2.0/grid/OPatch/opatchauto rollback -oh /u01/app/12.2.0/grid
 ….
 ==Following patches were SUCCESSFULLY rolled back:
 Patch: /u01/app/sw/29301687/29314424
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/29301676
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/26839277
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/28566910
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/29314339
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log

5. Start database instance on the first node and shutdown on the second:

# srvctl start instance -db orclA -i orclA1
# srvctl stop instance -db orclA -i orclA2

6. Connect to the second node and repeat the same steps:

[root@rac2 ~]# cd /u01/app/sw/29301687

[root@rac2 29301687]# /u01/app/12.2.0/grid/OPatch/opatchauto rollback -oh /u01/app/12.2.0/grid

7. Start database instance on rac2

# srvctl start instance -db orclA -i orclA2

8. Check inventory

$  /u01/app/12.2.0/grid/OPatch/opatch lspatches

There are no Interim patches installed in this Oracle Home "/u01/app/12.2.0/grid".
 OPatch succeeded.

How to delete MGMTDB?

Introduction

MGMTDB is a repository database that saves Cluster Health Monitor (CHM) data. In Oracle 11g this information was stored in Berkley database ( .bdb files) but starting from Oracle database 12c it is configured as an Oracle single instance database.

In Oracle 12.1.0.1 – GIMR is optional. Whereas in Oracle 12.1.0.2 – it’s mandatory  and it’s not supported to be turned off with the exception of Exadata.
I’ve searched a lot to find out why it is not supported to be turned off, but I still do not have that answer. I only know that TFA collects some information from MGMTDB and if we turn it off, it means TFA will not be able to retrieve that information. In 19c GIMR is optional again.

The reason why I want to turn it off, is that there are several bugs related to MGMTDB. We have noticed that several customers had performance related issues because of MGMTDB. The repository database was able to use almost 100% of CPU resources. In addition to this, one customer noticed that MGMTDB increased up to 60GB and exhausted GRID diskgroup where OCR and voting files were located (this size is not normal for 3-node cluster).

More information about Grid Infrastructure Management Repository (GIMR) can be found at 1568402.1

Steps

Please consider that for 12.1.0.2 deleting it is not supported by Oracle, which is not clear why, but it is better to ask Oracle support before doing this.

Instead of deleting a repository, it is better to apply all bug fixes that is related to it. And try to use it’s intelligece to proactively tune your database. But if you still want to delete it, or at least know how to delete it – then let’s do that, it is not harmful.

1. Stop and disable CRF

# crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded 

# crsctl modify res ora.crf -attr ENABLED=0 -init

# crsctl status res ora.crf -init
NAME=ora.crf
TYPE=ora.crf.type
TARGET=OFFLINE
STATE=OFFLINE 

2. Delete MGMTDB using dbca

Database must be up and running to do this step.

# su - grid

$ dbca -silent -deleteDatabase -sourceDB -MGMTDB
Connecting to database
4% complete
9% complete
14% complete
19% complete
23% complete
28% complete
47% complete
Updating network configuration files
52% complete
Deleting instance and datafiles
76% complete
100% complete    

3. Make sure that MGMTDB was deleted

# srvctl status mgmtdb
PRCD-1120 : The resource for database _mgmtdb could not be found.
PRCR-1001 : Resource ora.mgmtdb does not exist

What is a Flex ASM and how to check if it is enabled?

In versions prior to 12c, the ASM instance needed to be run on each of the nodes of the cluster. In case ASM was not able to start, the database instance located on the same node was not able to come up also. There were a hard dependency between database and ASM instances.

With Oracle Flex ASM, databases are able to connect remote ASM using network connection(ASM network). In case of ASM instance fails, the database instance will reconnect to another ASM instance on another node. This feature is called Oracle Flex ASM.

Check if you are using such a great feature using the following command:

[grid@rac1 ~]$ asmcmd
ASMCMD> showclustermode
ASM cluster : Flex mode enabled

 

ORA-01103: when creating a Standby Database on the same Host as the Primary Database

Typically the standby and the primary databases are located on the different hosts to ensure the full DR capabilities. However, there are some situations when you want to have the primary and standby database on the same Host.

Problem #1: You are not able to start two databases with the same SID on the same server.

Problem #2: You cannot change db_name, because it is used in the controlfile and if you try to duplicate the standby database from the primary using different db_name, you will get the following error:

ORA-01103: database name 'orcldgst' in control file is not 'orcldg'

Assume db_name=orcldg and ORACLE_SID for the primary is orcldg1. To solve problem #1 and problem #2, you need to the following steps:

db_name must be the same for both databases. But during startup nomount of the standby database, you need to set ORACLE_SID to the different value:

$ export ORACLE_SID=orcldgst1
$ sqlplus / as sysdba
SQL> startup nomount pfile='/tmp/mypfile.ora'

After that you will be able to run RMAN duplicate command to create the standby database on the same host as the primary.