LGWR: Primary database is in MAXIMUM AVAILABILITY mode | ORA-16072: a minimum of one standby database destination is required

Problem:

One of our customer cloned database from a DG environment to a different host and tried to open it as a standalone database. Controlfile and datafiles still considered the database in maximum availability mode.

Errors after trying to open the database:

LGWR: Primary database is in MAXIMUM AVAILABILITY mode
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
LGWR: Minimum of 1 LGWR standby database required
Thu Jul 18 18:43:14 2019
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lgwr_39735_39805.trc:
ORA-16072: a minimum of one standby database destination is required

Solution:

SQL> startup mount;
SQL> alter database set standby database to maximize performance;
SQL> shutdown immediate;

$ srvctl start database -d orcl 

TNS-12518: TNS:listener could not hand off client connection | TNS-12547: TNS:lost contact

Problem:

In two-node cluster, client was not able to connect to the second node, but connection to the first node was successful.

Connection from SQL developer threw error: Status: Failure - Test failed: IO Error: Got minus one from a read call, connect lapse 16ms, Authentication lapse 0ms

Connection from sqlplus using TNS string showed:

[oracle@rac02 ~]$ sqlplus "sys@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac02.example.com)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=orcl)))" as sysdba

ORA-12537: TNS:connection closed

Listener.log showed:

 2019-07-18T11:19:23.568231+00:00
 TNS-12518: TNS:listener could not hand off client connection
  TNS-12547: TNS:lost contact
   TNS-12560: TNS:protocol adapter error
    TNS-00517: Lost contact
     Linux Error: 32: Broken pipe

Solution:

This problem can happen in other cases (entries in sqlnet.ora .. in our case it was ok) and we could think about network problem, because initially we were trying to connect from the application sever and from the SQL developer remotely. But after getting ORA-12537: TNS:connection closed error while trying to connect via sqlplus from the local server, we could only think about local non-network related problem.

The reason of this problem was that setuid bit was not set on /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle binary:

Problematic node:

[root@rac02 ~]# ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
 -rwxr-s--x 1 oracle asmadmin 408607040 Apr  4 19:51 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

Healthy node:

[oracle@rac01 ~]$ ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle asmadmin 408607040 Apr  4 19:48 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

We have set setuid bit on oracle binary in RDBMS home:

[root@rac02 ~]# chmod u+s /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

[root@rac02 ~]# ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle asmadmin 408607040 Apr  4 19:51 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

The problem was resolved without restarting the database instance, so clients were able to connect to the 2nd node. But because of it was staging cluster – I still restarted the database, I just made sure that database was started with the correct binary.

PRCS-1007 : Server pool RPPCOMS already exists

Problem:

While adding database using srvctl add database command got PRCS-1007 and PRCR-1086 errors:

$ srvctl add database -db RPPCOMS -oraclehome /u01/app/oracle/product/12.1.0/dbhome_1 -spfile +DATA/RPPCOMS/spfileRPPCOMS.ora -pwfile +DATA/RPPCOMS/PASSWORD/pwdrppcoms.256.1005727427 -dbname RPPCOMS  -diskgroup FRA,DATA

PRCS-1007 : Server pool RPPCOMS already exists
PRCR-1086 : server pool ora.RPPCOMS is already registered

Solution:

Remove mentioned server pool via root user:

# crsctl delete serverpool ora.RPPCOMS

And retry adding database service.

Detach diskgroup from 12c GI and attach to 19c GI

Task:

We have two separate Real Application Clusters, one 12c and another 19c. We decided to migrate data from 12c to 19c by simply detaching all ASM disks from the source and attaching to the destination.

Steps:

1. Connect to the 12c GI via grid user and dismount FRA diskgroup on all nodes:

[grid@rac1 ~]$ sqlplus  / as sysasm
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> alter diskgroup FRA dismount;
Diskgroup altered. 
[grid@rac2 ~]$ sqlplus  / as sysasm
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> alter diskgroup FRA dismount;
Diskgroup altered.

You can also use srvctl to stop the diskgroup with one command.

2. Detach disks belonging to the specific diskgroup from 12c cluster and attach to 19c cluster.

3. After ASM disks are visible on 19c cluster, connect as sysasm via grid user and mount the diskgroup:

# Check that there is no FRA resource registered with CRS:

[root@rac1 ~]# crsctl status res -t |grep FRA

# Mount the diskgroup on all nodes

[grid@rac1 ~]$ sqlplus / as sysasm
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> alter diskgroup FRA mount;
Diskgroup altered.
[grid@rac2 ~]$ sqlplus / as sysasm
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> alter diskgroup FRA mount;
Diskgroup altered.

# FRA diskgroup resource will automatically be registered with CRS:

[root@rac1 ~]# crsctl status res -t |grep FRA
ora.FRA.dg(ora.asmgroup)

And data will be there…

Rollback RU patches from 12c GI home using opatchauto

Junior DBAs will find these steps useful ūüôā

Environment details:

Two-node Real Application Cluster.
Database version: 12.2.0.1
Applied RU: 16-04-2019

1. Check existing patches

[grid@rac1 ~]$  /u01/app/12.2.0/grid/OPatch/opatch lspatches
29314424;OCW APR 2019 RELEASE UPDATE 12.2.0.1.190416 (29314424)
29314339;Database Apr 2019 Release Update : 12.2.0.1.190416 (29314339)
29301676;ACFS APR 2019 RELEASE UPDATE 12.2.0.1.190416 (29301676)
28566910;TOMCAT RELEASE UPDATE 12.2.0.1.0(ID:180802.1448.S) (28566910)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)
OPatch succeeded.

Note that all these patches are part of RU 16-04-2019.

2. Stop all database instances on that node:

# srvctl stop instance -db orclA -i orclA1

3. Download Release Update 16-04-2019 (p29301687_122010_Linux-x86-64.zip), unzip and go to the unzipped patch location:

To rollback all these patches it is easier to have unzipped Release Update 16-04-2019 patch (all existing patches are part of it) on the server.

If you cannot download zipped RU then you need to indicate all patch ids in the list during opatchauto rollback -id 29314424,29314339,29301676,28566910,26839277

As long as I have unzipped RU on rac1, I will do by the following way:

[root@rac1 ~]# cd /u01/app/sw/29301687

[root@rac1 29301687]# ll
 total 132
 drwxr-x--- 4 grid oinstall     48 Mar 25 01:09 26839277
 drwxr-x--- 4 grid oinstall     48 Mar 25 01:08 28566910
 drwxr-x--- 5 grid oinstall     62 Mar 25 01:03 29301676
 drwxr-x--- 4 grid oinstall     67 Mar 25 01:08 29314339
 drwxr-x--- 5 grid oinstall     62 Mar 25 01:06 29314424
 drwxr-x--- 2 grid oinstall   4096 Mar 25 01:03 automation
 -rw-rw-r-- 1 grid oinstall   5828 Mar 25 01:29 bundle.xml
 -rw-r--r-- 1 grid oinstall 120219 Apr 10 18:07 README.html
 -rw-r----- 1 grid oinstall      0 Mar 25 01:03 README.txt

4. Rollback patches using opatchauto:

[root@rac1 29301687]# /u01/app/12.2.0/grid/OPatch/opatchauto rollback -oh /u01/app/12.2.0/grid
 ….
 ==Following patches were SUCCESSFULLY rolled back:
 Patch: /u01/app/sw/29301687/29314424
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/29301676
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/26839277
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/28566910
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/29314339
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log

5. Start database instance on the first node and shutdown on the second:

# srvctl start instance -db orclA -i orclA1
# srvctl stop instance -db orclA -i orclA2

6. Connect to the second node and repeat the same steps:

[root@rac2 ~]# cd /u01/app/sw/29301687

[root@rac2 29301687]# /u01/app/12.2.0/grid/OPatch/opatchauto rollback -oh /u01/app/12.2.0/grid

7. Start database instance on rac2

# srvctl start instance -db orclA -i orclA2

8. Check inventory

$  /u01/app/12.2.0/grid/OPatch/opatch lspatches

There are no Interim patches installed in this Oracle Home "/u01/app/12.2.0/grid".
 OPatch succeeded.

How to delete MGMTDB?

Introduction

MGMTDB is a repository database that saves Cluster Health Monitor (CHM) data. In Oracle 11g this information was stored in Berkley database ( .bdb files) but starting from Oracle database 12c it is configured as an Oracle single instance database.

In Oracle 12.1.0.1 – GIMR is optional. Whereas in Oracle 12.1.0.2 – it’s mandatory  and it’s not supported to be turned off with the exception of Exadata.
I’ve searched a lot to find out why it is not supported to be turned off, but I still do not have that answer. I only know that TFA collects some information from MGMTDB and if we turn it off, it means TFA will not be able to retrieve that information. In 19c GIMR is optional again.

The reason why I want to turn it off, is that there are several bugs related to MGMTDB. We have noticed that several customers had performance related issues because of MGMTDB. The repository database was able to use almost 100% of CPU resources. In addition to this, one customer noticed that MGMTDB increased up to 60GB and exhausted GRID diskgroup where OCR and voting files were located (this size is not normal for 3-node cluster).

More information about Grid Infrastructure Management Repository (GIMR) can be found at 1568402.1

Steps

Please consider that for 12.1.0.2 deleting it is not supported by Oracle, which is not clear why, but it is better to ask Oracle support before doing this.

Instead of deleting a repository, it is better to apply all bug fixes that is related to it. And try to use it’s intelligece to proactively tune your database. But if you still want to delete it, or at least know how to delete it – then let’s do that, it is not harmful.

1. Stop and disable CRF

# crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded 

# crsctl modify res ora.crf -attr ENABLED=0 -init

# crsctl status res ora.crf -init
NAME=ora.crf
TYPE=ora.crf.type
TARGET=OFFLINE
STATE=OFFLINE 

2. Delete MGMTDB using dbca

Database must be up and running to do this step.

# su - grid

$ dbca -silent -deleteDatabase -sourceDB -MGMTDB
Connecting to database
4% complete
9% complete
14% complete
19% complete
23% complete
28% complete
47% complete
Updating network configuration files
52% complete
Deleting instance and datafiles
76% complete
100% complete    

3. Make sure that MGMTDB was deleted

# srvctl status mgmtdb
PRCD-1120 : The resource for database _mgmtdb could not be found.
PRCR-1001 : Resource ora.mgmtdb does not exist

What is a Flex ASM and how to check if it is enabled?

In versions prior to 12c, the ASM instance needed to be run on each of the nodes of the cluster. In case ASM was not able to start, the database instance located on the same node was not able to come up also. There were a hard dependency between database and ASM instances.

With Oracle Flex ASM, databases are able to connect remote ASM using network connection(ASM network). In case of ASM instance fails, the database instance will reconnect to another ASM instance on another node. This feature is called Oracle Flex ASM.

Check if you are using such a great feature using the following command:

[grid@rac1 ~]$ asmcmd
ASMCMD> showclustermode
ASM cluster : Flex mode enabled

 

ORA-01103: when creating a Standby Database on the same Host as the Primary Database

Typically the standby and the primary databases are located on the different hosts to ensure the full DR capabilities. However, there are some situations when you want to have the primary and standby database on the same Host.

Problem #1: You are not able to start two databases with the same SID on the same server.

Problem #2: You cannot change db_name, because it is used in the controlfile and if you try to duplicate the standby database from the primary using different db_name, you will get the following error:

ORA-01103: database name 'orcldgst' in control file is not 'orcldg'

Assume db_name=orcldg and ORACLE_SID for the primary is orcldg1. To solve problem #1 and problem #2, you need to the following steps:

db_name must be the same for both databases. But during startup nomount of the standby database, you need to set ORACLE_SID to the different value:

$ export ORACLE_SID=orcldgst1
$ sqlplus / as sysdba
SQL> startup nomount pfile='/tmp/mypfile.ora'

After that you will be able to run RMAN duplicate command to create the standby database on the same host as the primary.

Add filegroup fails with ORA-15067: command or option incompatible with diskgroup redundancy

Problem:

I was trying to add filegroup to the FRA diskgroup:

SQL> alter diskgroup FRA add filegroup high_filegroup database orcl set ‘datafile.redundancy’ = ‘HIGH’;

Error:

ORA-15067: command or option incompatible with diskgroup redundancy

Troubleshooting:

Checking diskgroup type:

SQL> select name,type,compatibility,database_compatibility from v$asm_diskgroup where name=’FRA’;

NAME      TYPE   COMPATIBILITY    DATABASE_COMPATIBILITY
————- —— ————————– ————————————————————
FRA        NORMAL 18.0.0.0.0    12.2.0.1.0

Solution:

Change diskgroup type to FLEX:

SQL> alter diskgroup FRA convert redundancy to flex;
Diskgroup altered.

Check that type was changed:

SQL> select name,type,compatibility,database_compatibility from v$asm_diskgroup where name=’FRA’;

NAME      TYPE   COMPATIBILITY    DATABASE_COMPATIBILITY
————- —— ————————– ————————————————————
FRA        FLEX   18.0.0.0.0    12.2.0.1.0

Adding filegroup succeeds:

SQL> alter diskgroup FRA add filegroup high_filegroup database orcl set ‘datafile.redundancy’ = ‘HIGH’;
Diskgroup altered.

SRVCTL: CRS-2678, CRS-0267, CRS-5802: Unable to start the agent process

We had the following problem with some customer:

srvctl start database -db dbname was failing on one of the cluster nodes with the following error:

[oracle@node1 ~]$ srvctl start database -db dbname
PRCR-1079 : Failed to start resource ora.dbname.db
CRS-2674: Start of ‘ora.dbname.db’ on ‘rac1’ failed
CRS-2678: ‘ora.dbname.db’ on ‘rac1’ has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
CRS-5802: Unable to start the agent process

But during that time we were able to startup database using sqlplus:

[oracle@rac1 ~]$ sqlplus / as sysdba

SQL> startup

ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size 8621136 bytes
Variable Size 805307312 bytes
Database Buffers 754974720 bytes
Redo Buffers 8155136 bytes
Database mounted.
Database opened.

It was strange and took a lot of time for me to troubleshoot this issue.

I tried many things:
* removed srvctl config using srvctl remove database -db orcl
* readded it again srvctl add database -db orcl
* readded instances
* also tried to restart crs and even the servers
but with no luck.

Then I found the following documentation Doc ID 1957360.1 on Oracle site and tried to reproduce the same problem on my lab servers and I did it.

I tried to change the ownership for the file on my test cluster on only one node:

[root@rac1 ~]# ll /u01/app/grid/crsdata/rac1/output/crsd_oraagent_oracleOUT.trc
-rw-r–r– 1 oracle oinstall 1085 Sep 5 20:17 /u01/app/grid/crsdata/rac1/output/crsd_oraagent_oracleOUT.trc
[root@rac1 ~]# ll /u01/app/grid/crsdata/rac1/output/crsd_oraagent_oracle.pid
-rw-r–r– 1 oracle oinstall 6 Sep 5 20:17 /u01/app/grid/crsdata/rac1/output/crsd_oraagent_oracle.pid
[root@rac1 ~]# chown root:root /u01/app/grid/crsdata/rac1/output/crsd_oraagent_oracle.pid
[root@rac1 ~]# chown root:root /u01/app/grid/crsdata/rac1/output/crsd_oraagent_oracleOUT.trc

I tried to startup instance using sqlplus and it was successful:

[oracle@rac1 ~]$ sqlplus / as sysdba

SQL> startup

ORACLE instance started
Database mounted.
Database opened.

Stopped the database and tried with srvctl :

After a long wait it failed:

[oracle@rac1 ~]$ srvctl start database -db orcl
PRCR-1079 : Failed to start resource ora.orcl.db
CRS-2674: Start of ‘ora.orcl.db’ on ‘rac1’ failed
CRS-2678: ‘ora.orcl.db’ on ‘rac1’ has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
CRS-5802: Unable to start the agent process

I also checked customer logs and found that files crsd_oraagent_oracle.pid, crsd_oraagent_oracleOUT.trc were not updated for a long time, they were older than other files.

So to solve such problem you need to assign correct owner, group and access permission for the above two files and you will be able to start database using srvctl.

[root@rac1 ~]# chown oracle:oinstall /u01/app/grid/crsdata/rac1/output/crsd_oraagent_oracle.pid
[root@rac1 ~]# chown oracle:oinstall /u01/app/grid/crsdata/rac1/output/crsd_oraagent_oracleOUT.trc
[root@rac1 ~]# chown 644 /u01/app/grid/crsdata/rac1/output/crsd_oraagent_oracle.pid
[root@rac1 ~]# chown 644 /u01/app/grid/crsdata/rac1/output/crsd_oraagent_oracleOUT.trc

You may never have such errors but if you have you know how to solve.