Forceful startup of CRS, when minority VMs are down

If a minority of database nodes are down because of cloud maintenance, those nodes may not be startable. If CRS is also down on the remaining working nodes, manual intervention is required.

Before proceeding, confirm that the cluster still has majority quorum.

Majority formula = TRUNC((number of database nodes + number of quorum nodes) / 2) + 1

The cluster can only be started when the majority of voting members are available. If the majority of database nodes or quorum nodes are down, the steps below will not work.

Use the following procedure on each database node where CRS fails to start.



Procedure 1: Restart CRS cleanly

1. Temporarily disable CRS autostart

crsctl disable crs

2. Stop any running CRS processes

crsctl stop crs -f

It is normal to see errors such as CRS-4639 or CRS-4000 when running this command. You can continue with the next steps.

3. Kill any remaining ohasd.bin reboot processes

ps -ef | grep "ohasd.bin reboot" | grep -v grep | awk '{print $2}' | xargs kill -9 > /dev/null 2>&1

4. [Only if using FlashGrid cluster] Stop flashgrid_wait service

flashgrid-node stop-waiting

Expected output may look similar to this:

pkill -USR1 -f flashgrid_wait ... OK

5. Restart the ohasd services

systemctl restart ohasd
systemctl restart oracle-ohasd

6. Monitor CRS startup

First, check whether the Clusterware daemons are running:

crsctl status res -t -init

If the Clusterware daemons started successfully, check the cluster resources:

crsctl status res -t

If CRS does not start automatically, start it manually:

crsctl start crs -wait

If startup hangs on ora.storage, check the ASM alert log (alert_+ASM?.log).

Look for errors such as: ORA-15042, ORA-15040

If these errors are present, cancel the CRS startup, skip step 7, and continue with Procedure 2 below.

7. Re-enable CRS autostart

crsctl enable crs

Procedure 2: If CRS still does not start

Use this procedure if CRS did not start successfully and some CRS resources remain failed.

Repeat the following steps on each database node where CRS still fails to start.

1. Stop any running CRS processes

crsctl stop crs -f

2. Kill any remaining ohasd.bin reboot processes

ps -ef | grep "ohasd.bin reboot" | grep -v grep | awk '{print $2}' | xargs kill -9 > /dev/null 2>&1

3. Restart the ohasd services

systemctl restart ohasd
systemctl restart oracle-ohasd

4. Start only HAS

crsctl start has

5. Start ASM in nomount mode

Connect as the Grid Infrastructure owner, for example grid:

su - grid
sqlplus / as sysasm

Then start ASM in nomount mode:

startup nomount;

6. Try to mount all ASM diskgroups

alter diskgroup all mount;

7. If mounting all diskgroups fails, mount them one by one using force

For example:

alter diskgroup GRID mount force;
alter diskgroup DATA mount force;

Sometimes ASM delays background operations after an unclean shutdown. In that case, you may see a message similar to this in alert_+ASM?.log:

WARNING: Background operations delayed until 08/08/23 21:22:21 because ASM was not stopped cleanly and there could be disconnected client(s)

Do not cancel the running command. Wait until the time shown in the message. The diskgroup should mount after that delay.

8. Re-enable CRS autostart

crsctl enable crs

9. Check cluster status

crsctl status res -t

[WARNING] [INS-08102] Unexpected error occurred while transitioning from state ‘DBIdentification’

Problem:

While using dbca to create a database I get the following error:

[oracle@rac1 ~]$ dbca -silent -createDatabase \
>   -responseFile NO_VALUE \
>   -templateName New_Database.dbt \
>   -dbOptions "OMS:false,JSERVER:false,SPATIAL:false,IMEDIA:false,ORACLE_TEXT:false,SAMPLE_SCHEMA:false,CWMLITE:false,APEX:false,DV:false" \
>   -gdbname racdb \
>   -characterSet AL32UTF8 \
>   -sysPassword Oracle123 \
>   -systemPassword Oracle123 \
>   -databaseType MULTIPURPOSE \
>   -automaticMemoryManagement false \
>   -totalMemory 6144 \
>   -redoLogFileSize 512 \
>   -nodelist rac1,rac2 \
>   -storageType ASM \
>   -diskGroupName DATA01 \
>   -recoveryGroupName DATA01 \
>   -recoveryAreaSize 20000 \
>   -initParams "log_buffer=128M,processes=3000" \
>   -asmsnmpPassword Oracle123
[WARNING] [INS-08102] Unexpected error occurred while transitioning from state 'DBIdentification'
   CAUSE: No additional information available.
   ACTION: Contact Oracle Support Services or refer to the software manual.

On DBCA GUI it looks like this:

Solution:

On the Oracle site or on the Internet, there is no useful information that can help so far. The only place you should search for the reason is in dbca logs generated under /u01/app/oracle/cfgtoollogs/dbca

[root@rac1 dbca]# pwd
/u01/app/oracle/cfgtoollogs/dbca

[root@rac1 dbca]# ll
total 1344
drwxr-xr-x 2 root   root        4096 Jul 14 18:48 old
drwxr-x--- 2 oracle oinstall       6 Apr 28 16:25 orcl
-rw-r----- 1 oracle oinstall 1370046 Jul 14 18:51 trace.log_2023-07-14_06-51-03PM
-rw-r----- 1 oracle oinstall       0 Jul 14 18:51 trace.log_2023-07-14_06-51-03PM.lck

The latest trace file contains the following error messages:

[Thread-355] [ 2023-07-14 18:51:35.375 UTC ] [StreamReader.run:66]  OUTPUT> DIA-49802: missing read, write, or execute permission on specified ADR home directory [/u01/app/grid/diag/crs/rac1/crs/log]
[Thread-355] [ 2023-07-14 18:51:35.375 UTC ] [StreamReader.run:66]  OUTPUT>DIA-49801: actual permissions [rwxr-xr-x], expected minimum permissions [rwxrwx---] for effective user [oracle]
[Thread-355] [ 2023-07-14 18:51:35.375 UTC ] [StreamReader.run:66]  OUTPUT>DIA-48188: user missing read, write, or exec permission on specified directory
....
[Thread-363] [ 2023-07-14 19:01:53.417 UTC ] [StreamReader.run:66]  OUTPUT> DIA-48141: error creating directory during ADR initialization [/u01/app/grid/diag/crs/rac1/crs/trace]
[Thread-363] [ 2023-07-14 19:01:53.417 UTC ] [StreamReader.run:66]  OUTPUT>DIA-48189: OS command to create directory failed
[Thread-363] [ 2023-07-14 19:01:53.417 UTC ] [StreamReader.run:66]  OUTPUT>Linux-x86_64 Error: 1: Operation not permitted

Let’s check permissions on folders under /u01/app/grid/diag/crs/rac1/crs

[root@rac1 dbca]# ll /u01/app/grid/diag/crs/rac1/crs/

drwxr-xr-x 2 grid oinstall   21 Jul 14 16:09 alert
drwxr-xr-x 2 grid oinstall    6 Jul 12 21:33 cdump
drwxr-xr-x 2 grid oinstall    6 Jul 12 21:33 incident
drwxr-xr-x 2 grid oinstall    6 Jul 12 21:33 incpkg
drwxr-xr-x 2 grid oinstall 4096 Jul 12 21:33 lck
drwxrwxr-x 5 grid oinstall   48 Jul 12 21:33 log
drwxr-xr-x 2 grid oinstall 4096 Jul 12 21:33 metadata
drwxr-xr-x 2 grid oinstall    6 Jul 12 21:33 metadata_dgif
drwxr-xr-x 2 grid oinstall    6 Jul 12 21:33 metadata_pv
drwxr-xr-x 2 grid oinstall    6 Jul 12 21:33 stage
drwxr-xr-x 2 grid oinstall    6 Jul 12 21:33 sweep
drwxr-xr-x 2 grid oinstall 8192 Jul 14 18:43 trace

When I have the same case, I always check the same folder on a healthy server, let’s verify permissions on a healthy node:

[root@rac2 oraInventory]# ll /u01/app/grid/diag/crs/rac2/crs

drwxrwxr-x 2 grid oinstall    21 Apr 26 17:53 alert
drwxrwxr-x 5 grid oinstall    57 Jul 13 21:43 cdump
drwxrwxr-x 7 grid oinstall    89 Jul 13 21:43 incident
drwxrwxr-x 2 grid oinstall     6 Apr 26 17:53 incpkg
drwxrwxr-x 2 grid oinstall  4096 Jul 13 21:43 lck
drwxrwxr-x 5 grid oinstall    65 Jun  1 00:36 log
drwxrwxr-x 2 grid oinstall  4096 Apr 26 17:53 metadata
drwxrwxr-x 2 grid oinstall     6 Apr 26 17:53 metadata_dgif
drwxrwxr-x 2 grid oinstall     6 Apr 26 17:53 metadata_pv
drwxrwxr-x 2 grid oinstall   119 Jul 13 21:43 stage
drwxrwxr-x 2 grid oinstall   124 Jul 13 21:43 sweep
drwxrwxr-x 2 grid oinstall 32768 Jul 14 18:43 trace

Now it’s time to correct permissions on the problematic node:

[root@rac1 dbca]# chmod -R 775 /u01/app/grid/diag/crs/rac1/crs/*

[root@rac1 dbca]# ll /u01/app/grid/diag/crs/rac1/crs/

drwxrwxr-x 2 grid oinstall   21 Jul 14 16:09 alert
drwxrwxr-x 2 grid oinstall    6 Jul 12 21:33 cdump
drwxrwxr-x 2 grid oinstall    6 Jul 12 21:33 incident
drwxrwxr-x 2 grid oinstall    6 Jul 12 21:33 incpkg
drwxrwxr-x 2 grid oinstall 4096 Jul 12 21:33 lck
drwxrwxr-x 5 grid oinstall   48 Jul 12 21:33 log
drwxrwxr-x 2 grid oinstall 4096 Jul 12 21:33 metadata
drwxrwxr-x 2 grid oinstall    6 Jul 12 21:33 metadata_dgif
drwxrwxr-x 2 grid oinstall    6 Jul 12 21:33 metadata_pv
drwxrwxr-x 2 grid oinstall    6 Jul 12 21:33 stage
drwxrwxr-x 2 grid oinstall    6 Jul 12 21:33 sweep
drwxrwxr-x 2 grid oinstall 8192 Jul 14 18:43 trace

It is now possible for DBCA to proceed without any issues.
Good Luck!

RMAN spread a backup job between many RAC instances in parallel to increase throughput

There are two options to allocate RMAN channels on different RAC instances to increase the throughput.

I will start with the option, that assures all RAC instances get one channel. Regarding the other option, it does load balance but in a random way, so with a small number of channels, you may see that all of them are allocated in one instance. So you decide which option is better for you.

The test is done on a 2-node cluster.

  1. Configure parallelism and two channels in RMAN. Indicate a connect string, one per instance:
$ RMAN target /
RMAN> CONFIGURE DEVICE TYPE disk PARALLELISM 2;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL1 as sysdba';
RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL2 as sysdba';

2. Define ORCL1 and ORCL2 aliases on each database node under $ORACLE_HOME/network/admin/tnsnames.ora:

ORCL1=
(DESCRIPTION=
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac1.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )
 
ORCL2=
(DESCRIPTION=
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )

Please note, in my case 1522 is a local listener port.

3. Run backup:

RMAN> backup database;
 
Starting backup at 22-MAR-22
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=507 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=753 instance=orcl2 device type=DISK
channel ORA_DISK_2: SID=753 instance=orcl2 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/ORCL/DATAFILE/system.257.1098460673
input datafile file number=00004 name=+DATA/ORCL/DATAFILE/undotbs1.259.1098460743
channel ORA_DISK_1: starting piece 1 at 22-MAR-22
channel ORA_DISK_2: starting full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00003 name=+DATA/ORCL/DATAFILE/sysaux.258.1098460717
input datafile file number=00005 name=+DATA/ORCL/DATAFILE/undotbs2.265.1098461311
input datafile file number=00007 name=+DATA/ORCL/DATAFILE/users.260.1098460743

As you see, two files were backed up by the 1st channel (1st instance) and the other three files by the 2nd channel (2nd instance).

Now let’s explain another possible variant:

  1. Configure one TNS string with load balance parameter:
ORCL_BALANCE=
     (DESCRIPTION=
           (TRANSPORT_CONNECT_TIMEOUT=3) (RETRY_COUNT=6)(LOAD_BALANCE=on)
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac1.example.com) (PORT=1522))
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )

Both node addresses are defined and Oracle will pick each address randomly.

2. Configure parallelism and one channel with ORCL_BALANCE string:

Please note, I did this test case on the same server where I’ve already defined CHANNEL 1 and CHANNEL 2, so I had to clear them:

RMAN> CONFIGURE CHANNEL DEVICE TYPE DISK clear;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK clear;
RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK clear;

Define channel and parallelism:

RMAN> CONFIGURE DEVICE TYPE disk PARALLELISM 2;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL_BALANCE as sysdba';

RMAN> backup database;
 
Starting backup at 22-MAR-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=752 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=138 instance=orcl1 device type=DISK

Both channels were allocated on orcl1. Try one more time, or better configure parallelism 3, CHANNEL parameter is already defined and it is permanent until changed:

RMAN>  CONFIGURE DEVICE TYPE disk PARALLELISM 3;
 
RMAN> backup database;
 
Starting backup at 22-MAR-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=752 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=138 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=753 instance=orcl2 device type=DISK

As you see it really did a random choice. But two channels were allocated on the 1st node and the last one on the 2nd node.

I think here random algorithm is not a good option, but you better know which variant is appropriate in your case.

ORA-39701: database must be mounted EXCLUSIVE for UPGRADE or DOWNGRADE

Problem:

In cluster environment, I was not able to start database in upgrade mode:

SQL> startup upgrade

ORACLE instance started.
Total System Global Area 1996486272 bytes
Fixed Size		    8898176 bytes
Variable Size		  704643072 bytes
Database Buffers	 1275068416 bytes
Redo Buffers		    7876608 bytes
Database mounted.
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-39701: database must be mounted EXCLUSIVE for UPGRADE or DOWNGRADE

The following also did not work:

SQL> startup mount exclusive

ORACLE instance started.
Total System Global Area 1996486272 bytes
Fixed Size		    8898176 bytes
Variable Size		  704643072 bytes
Database Buffers	 1275068416 bytes
Redo Buffers		    7876608 bytes
Database mounted.

SQL> alter database open upgrade;
alter database open upgrade
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-39701: database must be mounted EXCLUSIVE for UPGRADE or DOWNGRADE

Solution:

Change parameter cluster_database to FALSE, and startup in upgrade mode:

$ sqlplus / as sysdba

SQL> startup nomount;

SQL> alter system set cluster_database=FALSE scope=spfile sid='*';

SQL> shutdown immediate;

SQL> startup upgrade

ORACLE instance started.
Total System Global Area 1996486272 bytes
Fixed Size		    8898176 bytes
Variable Size		  704643072 bytes
Database Buffers	 1275068416 bytes
Redo Buffers		    7876608 bytes
Database mounted.
Database opened.

After finishing your work, don’t forget to return cluster_database parameter to TRUE and restart your database:

SQL> alter system set cluster_database=TRUE scope=spfile sid='*';

TCPS configuration: ora.LISTENER.lsnr is in INTERMEDIATE state

Problem

After configuring TCPS endpoint in listener configuration using the following way:

In srvctl:

[grid@rac1 ~]$ srvctl modify listener -p "TCP:1522/TCPS:1524"

In listener.ora file:

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac1.example.com)(PORT = 1522))
      (ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
      (ADDRESS = (PROTOCOL = TCPS)(HOST =rac1.example.com)(PORT = 1524))
    ))

So from the above, we see that I’ve configured as static endpoint registration as well as dynamic.

Restarted listener and saw that ora.LISTENER.lsnr resource left in the INTERMEDIATE state:

# srvctl stop listener;srvctl start listener

# crsctl status res -t|head
...
ora.LISTENER.lsnr 
ONLINE INTERMEDIATE rac1 Not All Endpoints Registered,STABLE
ONLINE INTERMEDIATE rac2 Not All Endpoints Registered,STABLE

Troubelshooting

After checking listener.log file, found the following messages:

Dynamic address is already listened on (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=rac1.example.com)(PORT=1524)))
Dynamic address is already listened on (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac1.example.com)(PORT=1522)))
Dynamic address is already listened on (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac1.example.com)(PORT=1522)))

So, due to the static registration, the dynamic failed and Agent thought it could not register endpoints, which left resource in the INTERMEDIATE state (which should not be normal I think) 

I tried to disable dynamic registration by setting ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER to OFF in listener.ora file, but after modifying and restarting listener using srvctl Oracle Agent changed the value back to ON. I could not find a way to disable it so far.

The second option was to remove static entries from listener.ora file and leave only dynamic registration, but after that, the important endpoint on TCPS/1524 was not registered:

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=192.168.1.11)(PORT=1524)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.1)(PORT=1522)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.11)(PORT=1522)))

The endpoint that I want is 192.168.1.1:1524 and not 192.168.1.11:1524

Workaround

Configured dynamic registration in srvctl for TCP/1522 port only:

[grid@rac1 ~]$ srvctl modify listener -p "TCP:1522"

Configured static registration in listener.ora file for TCPS/1524:

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
      (ADDRESS = (PROTOCOL = TCPS)(HOST =rac1.example.com)(PORT = 1524))
    )
  )

After restarting listener, the status became ONLINE:

ora.LISTENER.lsnr
              ONLINE  ONLINE   rac1   STABLE
              ONLINE  ONLINE   rac2   STABLE

Another workaround that my colleage found is the following:

Set ENDPOINTS to null in crsctl (in srvctl it’s not possible):

# crsctl modify resource ora.LISTENER.lsnr -attr "ENDPOINTS=" –unsupported

And register all endpoints statically in listener.ora. Using that way, no dynamic registration is triggered, only static endpoints will be visible.

Please comment bellow, if you have a better way. Thank you!

Identify whether database is RAC or SI using crsctl

Environment:

GI – configured as a cluster
orclsingle – configured as a Single Instance
orclrac – configured as RAC
orclone – configured as RACOneNode

Q1: How to identify whether database is RAC or SI?

A1: crsctl stat res shows parameter CLUSTER_DATABASE. In case of RAC or RACOneNode the value is TRUE otherwise FALSE

# crsctl stat res ora.orclrac.db -p|grep CLUSTER_DATABASE
CLUSTER_DATABASE=true

# crsctl stat res ora.orclone.db -p|grep CLUSTER_DATABASE
CLUSTER_DATABASE=true

# crsctl stat res ora.orclsingle.db -p|grep CLUSTER_DATABASE
CLUSTER_DATABASE=false

Please note that the same can be identified from sqlplus by selecting CLUSTER_DATABASE initialization parameter.

Q2: How to identify whether my database is RAC or RACOneNode?

A2: srvctl config database shows Type, that can have the following values SINGLE, RAC, or RACOneNode.

$ srvctl config database -db orclrac|grep Type
Type: RAC

$ srvctl config database -db orclone|grep Type
Type: RACOneNode

$ srvctl config database -db orclsingle|grep Type
Type: SINGLE

INS-45511: Installer has detected that an Oracle Grid Infrastructure home is marked incorrectly as configured

Problem:

After deconfiguring Oracle Restart stack using:

[root@rac1 ~]# /u01/app/19.3.0/grid/root.sh -deconfig
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/19.3.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/19.3.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/rac1/crsconfig/hadeconfig.log
2020/07/04 10:49:21 CLSRSC-332: CRS resources for listeners are still configured
2020/07/04 10:49:49 CLSRSC-337: Successfully deconfigured Oracle Restart stack

Tried to configure GI as clusterware stack and got the following error:

INS-45511: Installer has detected that an Oracle Grid Infrastructure home is marked incorrectly  as configured

Solution:

Remove CRS="true" accross GI home entry in /u01/app/oraInventory/ContentsXML/inventory.xml

Original:

<HOME NAME="OraGI19Home1" LOC="/u01/app/19.3.0/grid" TYPE="O" IDX="1" CRS="true"/>

After modification:

<HOME NAME="OraGI19Home1" LOC="/u01/app/19.3.0/grid" TYPE="O" IDX="1"/>

Retry configuration.

CRS-6706: Oracle Clusterware Release Patch Level (‘xxxx’) Does Not Match Software Patch Level

Problem:

We have restored OCR from backup, started CRS, and rolled back patch using opatchauto. After restarting CRS we’ve got:

CRS-6706: Oracle Clusterware Release Patch Level ('xxxx') Does Not Match Software Patch Level ('yyyy')

Solution:

1. Unlock GI home from root user:

# cd $GI_HOME
# $GI_HOME/crs/install/rootcrs.sh -unlock

2. Complete the patching setup from root user (local means it will correct OLR, not OCR):

# $ORACLE_HOME/bin/clscfg -localpatch

3. As the root user lock the GI home:

# $ORACLE_HOME/crs/install/rootcrs.sh -lock

4. Start CRS:

# crsctl start crs -wait


Restore OCR and correct Software Patch Level

Software patch level crsctl query crs softwarepatch and active patch level crsctl query crs activersion -f are introduced to ensure GI home has identical patches on all nodes.

It’s highly recommended to manually backup OCR using # ocrconfig -manualbackup before and after applying a patch or making configurtion changes (adding resources into CRS).

When you apply a patch to GI home, the software patch level will be updated on each node. On the last node, the active patch level will be updated for the cluster.

The patch level is stored in OCR, so if we restore old OCR backup, we need to update the patch level inside it using Oracle provided commands.

  1. Try to find patchlevel in the OCR backup that matches to activeversion:
[root@rac1 ~]# ocrdump -stdout -keyname SYSTEM.version.activeversion.patchlevel -backupfile +GRID:/marirac/OCRBACKUP/backup_20200425_171539.ocr.266.1038676539
...
[SYSTEM.version.activeversion.patchlevel]
UB4 (10) : 2701864972
...

If backup file is located on filesystem, then need to replace +GRID:/marirac/OCRBACKUP/backup_20200425_171539.ocr.266.1038676539 with the full path of the file.

Keep copies of OCR backups on filesystem also using asmcmd cp command.

2. Restore OCR backup

a) Stop crs services on all nodes
b) On one of the node, start cluster in exclusive and restore OCR backup:

# crsctl stop crs -f 
# crsctl start crs -excl -nocrs
# ocrconfig -restore /tmp/OCR/OCR_BACKUP_FILE
# crsctl stop crs -f

3. Correct software and active patch levels

a) Keep CRS down on all nodes
b) Start CRS on the 1st node, run clscfg, and check softwarepatch

# crsctl start crs -wait
# clscfg -patch 
# crsctl query crs softwarepatch

c) On the last node (do not run on other node, only from last node), issue the following command:

# crsctl stop rollingpatch

d) Verify that active patch level was updated

# crsctl query crs activeversion -f

Query the cluster active patchlevel in the OCR backup

To identify OCR backup where the cluster active patch level matches, you need to run the following:

Identify cluster active patch level:

[root@rac1 ~]# crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. 
The cluster upgrade state is [NORMAL]. 
The cluster active patch level is [2701864972].

If backup is located on diskgroup:

[root@rac1 ~]# ocrdump -stdout -keyname SYSTEM.version.activeversion.patchlevel -backupfile +GRID:/marirac/OCRBACKUP/backup_20200425_171539.ocr.266.1038676539
04/26/2020 12:28:20
+GRID:/marirac/OCRBACKUP/backup_20200425_171539.ocr.266.1038676539
/u01/app/19.3.0/grid/bin/ocrdump.bin -stdout -keyname SYSTEM.version.activeversion.patchlevel -backupfile +GRID:/marirac/OCRBACKUP/backup_20200425_171539.ocr.266.1038676539
[SYSTEM.version.activeversion.patchlevel]
UB4 (10) : 2701864972
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root}

If backup is located on filesystem:

[root@rac1 ~]# ocrdump -stdout -keyname SYSTEM.version.activeversion.patchlevel -backupfile /tmp/backup_07_13_47_25
04/26/2020 12:26:57
/tmp/backup_07_13_47_25
/u01/app/19.3.0/grid/bin/ocrdump.bin -stdout -keyname SYSTEM.version.activeversion.patchlevel -backupfile /tmp/backup_07_13_47_25
[SYSTEM.version.activeversion.patchlevel]
UB4 (10) : 2701864972
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root}