ORA-15137: The ASM cluster is in rolling patch state, CRS does not start

Problem:

The customer was not able to start cluster because ASM disks were offline and could not become online because the cluster was in ROLLING patch state.

SQL> ALTER DISKGROUP "GRID" ONLINE REGULAR DISK "RAC2$XVDN"
2020-04-23T17:28:10.851065+08:00ORA-15032: not all alterations performed
ORA-15137: The ASM cluster is in rolling patch state.

Troubleshooting:

  1. Check cluster activeversion
[root@rac1 ~]# crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [12.2.0.1.0].
The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [2250904419].

2. Check softwarepatch of each node:

[root@rac1 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac1 is [2269302628]

[root@rac2 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac2 is [2269302628]

[root@rac3 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac3 is [2269302628]

[root@rac4 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac4 is [3074559134]

As we see software version on rac4 is different then others. In addition to this activeversion [2250904419] is older, this is normal in ROLLING mode, when ROLLING mode is finished then this value in OCR is also updated.

3. From the output of the 2nd step we see that additional troubleshooting on installed patches are necessary, compare patches on each node:

Inventory showed the same output on all 4 nodes:

$ $GRID_HOME/OPatch/opatch lspatches
30593149;Database Jan 2020 Release Update : 12.2.0.1.200114 (30593149)
30591794;TOMCAT RELEASE UPDATE 12.2.0.1.0(ID:RELEASE) (30591794)
30586063;ACFS JAN 2020 RELEASE UPDATE 12.2.0.1.200114 (30586063)
30585969;OCW JAN 2020 RELEASE UPDATE 12.2.0.1.200114 (30585969)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)

But after checking kfod op=patches we’ve found that 4 additional patches existed on rac4 node only, the difference was not visible in inventory because they were inactive, left from old RU (they were not rolled back during applying superset patches, but they were made inactive).

The output from rac1, rac2 and rac3:

$ $GRID_HOME/bin/kfod op=patches
List of Patches
26710464
26737232
26839277
...
<extracted>

The output from rac4:

$ $GRID_HOME/bin/kfod op=patches
---------------
List of Patches
===============
26710464
26737232
26839277
...
<extracted>
28566910<<<<<Bug 28566910:TOMCAT RELEASE UPDATE 12.2.0.1.0
29757449<<<<<Bug 29757449:DATABASE JUL 2019 RELEASE UPDATE 12.2.0.1.190716
29770040<<<< Bug 29770040:OCW JUL 2019 RELEASE UPDATE 12.2.0.1.190716
29770090<<<<< Bug 29770090:ACFS JUL 2019 RELEASE UPDATE 12.2.0.1.190716

To rollback inactive patches you need to use patchgen, using opatch rollback -id it is not possible.

As root:

# $GRID_HOME/crs/install/rootcrs.sh -prepatch

As grid:

$ $GRID_HOME/bin/patchgen commit -rb 28566910
$ $GRID_HOME/bin/patchgen commit -rb 29757449
$ $GRID_HOME/bin/patchgen commit -rb 29770040
$ $GRID_HOME/bin/patchgen commit -rb 29770090

Validate patch level again on rac4, should be 2269302628:

$ $GRID_HOME/bin/kfod op=PATCHLVL
-------------------
Current Patch level
===================
2269302628

As root:

# $GRID_HOME/crs/install/rootcrs.sh -postpatch

4. If CRS was up at least on one node, then we would be able to stop rollingpatch state using crsctl stop rollingpatch (although, if CRS is up, postpatch should stop rolling also). But now our CRS is down, ASM is not able to access OCR file (this was caused by some other problem, which I consider as a bug, but will not mention here otherwise blog will become too long). We need to do extra steps to start CRS.

The only solution that I used here was to restore OCR from old backup. I needed OCR which was not in ROLLING mode. I needed NORMAL OCR backup, which was located on filesystem. I knew after restore my OCR activeversion would be lower than the current software version on nodes, but this is solvable (we will see this step also)

a) Stop crs services on all nodes
b) On one of the node, start cluster in exclusive and restore OCR backup:

# crsctl stop crs -f 
# crsctl start crs -excl -nocrs
# ocrconfig -restore /tmp/OCR/OCR_BACKUP_FILE
# crsctl stop crs -f

c) If you check softwarepatch here, it will not be 2269302628, but it will be something old (because OCR is old), so we need to correct it on each node:

# crsctl start crs -wait
# clscfg -patch
# crsctl query crs softwarepatch

Sofwarepatch will be 2269302628 on all nodes.

d) Note even correcting softwarepatch crsctl query crs activeversion -f still shows something old, which is normal. To correct it, run the following only on last node:

# crsctl stop rollingpatch
# crsctl query crs activeversion -f

e) If you had custom services or configuration in CRS after OCR backup was taken then you need to add them manually. For example, if you had service added srvctl add service, need to readd. That’s is why it’s is highly recommended to backup OCR manually before and after patching or any custom configuration changes using # ocrconfig -manualbackup

Configuring TCPS for database client connections on Flashgrid-enabled clusters

In this document we assume the following port assignments (all four ports must be different):

  • TCP SCAN Listener: 1521
  • TCP Local Listener: 1522
  • TCPS SCAN Listener: 1523
  • TCPS Local Listener: 1524

1. Add TCPS port to SCAN Proxy configuration

Do the following on all database nodes.

1.1 Modify /etc/flashgrid-scan.cfg

#scan_port: 1521
scan_port_list: [1521, 1523]

1.2 Restart SCAN Proxy service

# systemctl restart flashgrid-scan-proxy.service

2. Add iptables forwarding rule for Local Listener TCPS port

Do the following on all database nodes.

2.1 Remove immutable flag from /etc/sysconfig/iptables

# chattr -i /etc/sysconfig/iptables

2.2 In /etc/sysconfig/iptables add the following forwarding rule under similar rule for TCP port (replace X with address corresponding to the node):

-A PREROUTING -i eth0 -p tcp -m tcp --dport 1524 -j DNAT --to-destination 192.168.1.X

2.3 Add immutable flag to /etc/sysconfig/iptables

# chattr +i /etc/sysconfig/iptables

2.4 Restart iptables service

# systemctl restart iptables.service

3. Add TCPS port to Local Listener configuration

On the first node:

[grid@rac1 ~]$ srvctl modify listener -p "TCP:1522/TCPS:1524";

On each database node, modify LISTENER parameter in listener.ora under GI home (replace rac1.example.com with corresponding host name):

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac1.example.com)(PORT = 1522))
      (ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
      (ADDRESS = (PROTOCOL = TCPS)(HOST =rac1.example.com)(PORT = 1524))
    )
  )

SECURE_REGISTER_LISTENER_SCAN1 = (IPC,TCPS)
SECURE_REGISTER_LISTENER_SCAN2 = (IPC,TCPS)
SECURE_REGISTER_LISTENER_SCAN3 = (IPC,TCPS)
SECURE_REGISTER_LISTENER = (IPC,TCPS)

From the first node:

[grid@rac1 ~]$ srvctl stop listener
[grid@rac1 ~]$ srvctl start listener

[grid@rac1 ~]$ srvctl config listener
Name: LISTENER
Type: Database Listener
Network: 1, Owner: grid
Home: <CRS home>
End points: TCP:1522/TCPS:1524
Listener is enabled.
Listener is individually enabled on nodes: 
Listener is individually disabled on nodes: 

[grid@rac1 ~]$ lsnrctl status
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac1.example.com)(PORT=1522)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=rac1.example.com)(PORT=1524)))

From the second node:

[grid@rac2 ~]$ lsnrctl status
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac2.example.com)(PORT=1522)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=rac2.example.com)(PORT=1524)))

4. Add TCPS port to SCAN Listener configuration

On the first database node:

[grid@rac1 ~]$ srvctl stop scan_listener
[grid@rac1 ~]$ srvctl stop scan
[grid@rac1 ~]$ srvctl modify scan_listener -p TCP:1521/TCPS:1523
[grid@rac1 ~]$ srvctl start scan
[grid@rac1 ~]$ srvctl start scan_listener

[grid@rac1 ~]$ srvctl config scan_listener
SCAN Listeners for network 1:
Registration invited nodes: 
Registration invited subnets: 
Endpoints: TCP:1521/TCPS:1523
SCAN Listener LISTENER_SCAN1 exists
SCAN Listener is enabled.
SCAN Listener LISTENER_SCAN2 exists
SCAN Listener is enabled.
SCAN Listener LISTENER_SCAN3 exists
SCAN Listener is enabled.

[grid@rac1 ~]$ lsnrctl status listener_scan3
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN3)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=192.168.1.23)(PORT=1523)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.23)(PORT=1521)))
The listener supports no services
The command completed successfully

[grid@rac1 ~]$ crsctl stat res -p |grep ENDPOINTS
ENDPOINTS=TCP:1525
ENDPOINTS=TCP:1522 TCPS:1524
ENDPOINTS=TCP:1521 TCPS:1523
ENDPOINTS=TCP:1521 TCPS:1523
ENDPOINTS=TCP:1521 TCPS:1523

5. Create SSL Certificates and wallets for DB nodes and client

5.1 Create a self-signed CA (You can use your own CA and safely skip this step)

On the first database node:

[oracle@rac1 ~]$ mkdir -p /u01/app/oracle/CA
[oracle@rac1 ~]$ export CA_HOME=/u01/app/oracle/CA
[oracle@rac1 ~]$ orapki wallet create -wallet  $CA_HOME  -auto_login -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet remove -trusted_cert_all -wallet $CA_HOME -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet add -wallet $CA_HOME -self_signed -dn "CN=TEST-CA" -keysize 1024 -validity 3650 -sign_alg sha256 -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet export -wallet $CA_HOME -dn "CN=TEST-CA" -cert /u01/app/oracle/CA/testCA.cer -pwd Oracle123

[oracle@rac1 ~]$ orapki wallet display -wallet $CA_HOME -summary
Oracle PKI Tool Release 19.0.0.0.0 - Production
Version 19.4.0.0.0
Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

Requested Certificates:
User Certificates:
Subject:        CN=TEST-CA
Trusted Certificates:
Subject:        CN=TEST-CA

5.2 Create Cluster Wallet and Certificate

On the first database node:

[oracle@rac1 ~]$ mkdir -p /u01/app/oracle/wallet
[oracle@rac1 ~]$ export W_HOME=/u01/app/oracle/wallet
[oracle@rac1 ~]$ orapki wallet create -wallet $W_HOME -pwd Oracle123 -auto_login -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -trusted_cert -cert /u01/app/oracle/CA/testCA.cer -pwd Oracle123

[oracle@rac1 ~]$ orapki wallet display -wallet $W_HOME -summary
Oracle PKI Tool Release 19.0.0.0.0 - Production
Version 19.4.0.0.0
Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

Requested Certificates:
User Certificates:
Trusted Certificates:
Subject:        CN=TEST-CA

Find out what is your scan name (from oracle or root user):

[oracle@rac1 ~]$ srvctl config scan|grep "SCAN name"
SCAN name: marihcheck-scan.example.com, Network: 1

Instead of marihcheck-scan indicate scan name returned from the previous command:

[oracle@rac1 ~]$ export W_HOME=/u01/app/oracle/wallet
[oracle@rac1 ~]$ export CA_HOME=/u01/app/oracle/CA
[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -dn "CN=marihcheck-scan" -keysize 1024 -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet export -wallet $W_HOME -dn "CN=marihcheck-scan" -request $W_HOME/marihcheck.req -pwd Oracle123
[oracle@rac1 ~]$ orapki cert create -wallet $CA_HOME -request $W_HOME/marihcheck.req -cert $W_HOME/marihcheck.cer -validity 3650 -sign_alg sha256 -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -user_cert -cert $W_HOME/marihcheck.cer -pwd Oracle123

[oracle@rac1 ~]$ orapki wallet display -wallet $W_HOME -summary
Oracle PKI Tool Release 19.0.0.0.0 - Production
Version 19.4.0.0.0
Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

Requested Certificates:
User Certificates:
Subject:        CN=marihcheck-scan
Trusted Certificates:
Subject:        CN=TEST-CA

5.3 Create Client Wallet and Certificate

From the first database node:

Instead of clientvm indicate hostname of the client:

[oracle@rac1 ~]$ export W_HOME=/u01/app/oracle/wallet/client
[oracle@rac1 ~]$ orapki wallet create -wallet $W_HOME -pwd Oracle123 -auto_login -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -trusted_cert -cert /u01/app/oracle/CA/testCA.cer -pwd Oracle123

[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -dn "CN=clientvm" -keysize 1024 -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet export -wallet $W_HOME -dn "CN=clientvm" -request $W_HOME/clientvm.req -pwd Oracle123
[oracle@rac1 ~]$ orapki cert create -wallet $CA_HOME -request $W_HOME/clientvm.req -cert $W_HOME/clientvm.cer -validity 3650 -sign_alg sha256 -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -user_cert -cert $W_HOME/clientvm.cer -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet display -wallet $W_HOME -summary
Oracle PKI Tool Release 19.0.0.0.0 - Production
Version 19.4.0.0.0
Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

Requested Certificates:
User Certificates:
Subject:        CN=clientvm
Trusted Certificates:
Subject:        CN=TEST-CA

Change permission on server and client cwallet.sso file:

[oracle@rac1 wallet]$ chmod 640 /u01/app/oracle/wallet/cwallet.sso
[oracle@rac1 wallet]$ chmod 640 /u01/app/oracle/wallet/client/cwallet.sso

Create wallet location on the clientvm and copy generated client wallet from rac1 to the client:

[root@clientvm ~]# mkdir /usr/lib/oracle/19.5/client64/wallet
[root@clientvm ~]# cd /usr/lib/oracle/19.5/client64/wallet
[root@clientvm ~]# scp oracle@rac1:/u01/app/oracle/wallet/client/* .

Delete client folder from rac1:

[oracle@rac1 ~]$ rm -rf /u01/app/oracle/wallet/client/

Copy generated server wallet from rac1 to rac2:

[oracle@rac1 ~]$ scp -r /u01/app/oracle/wallet oracle@rac2:/u01/app/oracle/
cwallet.sso              100% 2445   638.3KB/s   00:00
cwallet.sso.lck          100%    0     0.0KB/s   00:00
ewallet.p12              100% 2400   540.6KB/s   00:00
ewallet.p12.lck          100%    0     0.0KB/s   00:00

6. Configure Oracle network files

6.1 Add the following lines to the server side sqlnet.ora file in RDBMS and GI homes accordingly (on all database nodes)

[oracle@rac1 ~]$ cat /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/sqlnet.ora
WALLET_LOCATION =
   (SOURCE =
     (METHOD = FILE)
     (METHOD_DATA =
       (DIRECTORY = /u01/app/oracle/wallet)
     )
   )

SSL_VERSION = 0
SQLNET.AUTHENTICATION_SERVICES = (TCPS,BEQ)
SSL_CLIENT_AUTHENTICATION = FALSE
[grid@rac1 ~]$ cat /u01/app/19.3.0/grid/network/admin/sqlnet.ora
WALLET_LOCATION =
   (SOURCE =
     (METHOD = FILE)
     (METHOD_DATA =
       (DIRECTORY = /u01/app/oracle/wallet)
     )
   )

SSL_CLIENT_AUTHENTICATION = FALSE

6.2 Add the following entries to the client side sqlnet.ora file:

[root@clientvm admin]# cat sqlnet.ora
WALLET_LOCATION =
   (SOURCE =
     (METHOD = FILE)
     (METHOD_DATA =
       (DIRECTORY = /usr/lib/oracle/19.5/client64/wallet)
     )
   )

6.3 Add the following lines to the listener.ora in GI home (/u01/app/19.3.0/grid/network/admin/listener.ora) on all database nodes

SSL_CLIENT_AUTHENTICATION = FALSE

WALLET_LOCATION =
  (SOURCE =
    (METHOD = FILE)
    (METHOD_DATA =
      (DIRECTORY = /u01/app/oracle/wallet)
    )
  )

6.4 Update NODEFQDN entry to TCPS/1524 in tnsnames.ora in RDBMS home on all database nodes

[oracle@rac1 ~]$ cat /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/tnsnames.ora

DONOTDELETE,NODEFQDN =
  (ADDRESS = (PROTOCOL = TCPS)(Host = rac1.example.com)(Port = 1524))
[oracle@rac2 ~]$ cat /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/tnsnames.ora

DONOTDELETE,NODEFQDN =
  (ADDRESS = (PROTOCOL = TCPS)(Host = rac2.example.com)(Port = 1524))

6.5 Reset local_listener parameter to NODEFQND

[oracle@rac1 ~]$ export ORACLE_SID=orclcdb1
[oracle@rac1 ~]$ sqlplus / as sysdba

SQL> alter system set local_listener='NODEFQDN';

6.6 Restart listeners:

[root@rac1 ~]# srvctl stop listener; srvctl start listener
[root@rac1 ~]# srvctl stop scan_listener; srvctl start scan_listener

6.7 Add the following entries to the client side tnsnames.ora

[root@clientvm admin]# cat tnsnames.ora
TEST_SSL =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCPS)(HOST = marihcheck-scan.example.com)(PORT = 1523))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orclcdb)
    )
  )

TEST_SSL1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCPS)(HOST = rac1.example.com)(PORT = 1524))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orclcdb)
    )
  )

TEST_SSL2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCPS)(HOST = rac2.example.com)(PORT = 1524))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orclcdb)
    )
  )

7. Test connections

[root@clientvm admin]# export ORACLE_HOME=/usr/lib/oracle/19.5/client64
[root@clientvm admin]# export PATH=$ORACLE_HOME/bin:$PATH
[root@clientvm admin]# export TNS_ADMIN=$ORACLE_HOME/network/admin

[root@clientvm admin]# sqlplus system/oracle@TEST_SSL

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Mar 19 11:52:33 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Sat Mar 14 2020 19:54:53 +00:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.6.0.0.0

SQL> SELECT SYS_CONTEXT('USERENV', 'network_protocol') FROM DUAL;

SYS_CONTEXT('USERENV','NETWORK_PROTOCOL')
--------------------------------------------------------------------------------
tcps
[root@clientvm admin]# sqlplus system/oracle@TEST_SSL1

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Mar 19 11:53:23 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Thu Mar 19 2020 11:53:20 +00:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.6.0.0.0

SQL>
[root@clientvm admin]# sqlplus system/oracle@TEST_SSL2

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Mar 19 11:53:23 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Thu Mar 19 2020 11:53:20 +00:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.6.0.0.0

SQL>

Note: The same steps are described https://kb.flashgrid.io/configuring-tcps-for-client-connections#6-configure-oracle-network-files, which is also written by me.

Creating an Oracle RAC on Azure cloud with FlashGrid SkyCluster

ACFS Input/output error on OS kernel 3.10.0-957

Problem:

My ACFS volume has intermittent problems, when I run ls command I get ls: cannot access sm: Input/output error and user/group ownership contains question marks.

[root@primrac1 GG_HOME]# ll
ls: cannot access ma: Input/output error
ls: cannot access sm: Input/output error
ls: cannot access deploy: Input/output error
total 64
d????????? ? ? ? ? ? deploy
drwx------ 2 oracle oinstall 65536 Sep 28 21:05 lost+found
d????????? ? ? ? ? ? ma
d????????? ? ? ? ? ? sm

Reason:

Doc ID 2561145.1 mentions that kernel 3.10.0-957 changed behaviour of d_splice_alias interface which is used by ACFS driver.

My kernel is also the same:

[root@primrac1 GG_HOME]# uname -r
3.10.0-957.21.3.el7.x86_64

Solution:

Download and apply patch 29963428 on GI home.

[root@primrac1 29963428]# /u01/app/19.3.0/grid/OPatch/opatchauto apply -oh /u01/app/19.3.0/grid

PRCS-1007 : Server pool RPPCOMS already exists

Problem:

While adding database using srvctl add database command got PRCS-1007 and PRCR-1086 errors:

$ srvctl add database -db RPPCOMS -oraclehome /u01/app/oracle/product/12.1.0/dbhome_1 -spfile +DATA/RPPCOMS/spfileRPPCOMS.ora -pwfile +DATA/RPPCOMS/PASSWORD/pwdrppcoms.256.1005727427 -dbname RPPCOMS  -diskgroup FRA,DATA

PRCS-1007 : Server pool RPPCOMS already exists
PRCR-1086 : server pool ora.RPPCOMS is already registered

Solution:

Remove mentioned server pool via root user:

# crsctl delete serverpool ora.RPPCOMS

And retry adding database service.

What is a Flex ASM and how to check if it is enabled?

In versions prior to 12c, the ASM instance needed to be run on each of the nodes of the cluster. In case ASM was not able to start, the database instance located on the same node was not able to come up also. There were a hard dependency between database and ASM instances.

With Oracle Flex ASM, databases are able to connect remote ASM using network connection(ASM network). In case of ASM instance fails, the database instance will reconnect to another ASM instance on another node. This feature is called Oracle Flex ASM.

Check if you are using such a great feature using the following command:

[grid@rac1 ~]$ asmcmd
ASMCMD> showclustermode
ASM cluster : Flex mode enabled

 

Backup best practices for Oracle Clusterware

I recommend you to backup clusterware related files after initial setup and at any change. The backup files can save you from OCR, OLR corruption during GI patch. If any of the files become corrupted you will be able to recover it in several minutes (or seconds). Depends on the failure, you may lose several hours to recover your cluster to the state it was before something happened.

Here are the steps to protect your cluster:

1. Backup ASM spfile initially and at any change.

There are several ways to backup ASM spfile using spcopy, spbackup or create pfile=<backup location> from spfile.

To locate the Oracle ASM SPFILE, use the ASMCMD spget command:

ASMCMD> spget
+GRID/myrac/ASMPARAMETERFILE/registry.253.974466047

Copy the Oracle ASM SPFILE to the backup location:

ASMCMD> spbackup +GRID/myrac/ASMPARAMETERFILE/registry.253.974466047 /backup/spfileasm.ora

2. Backing up ASM password file once should be enough. If you change password for pwfile users or add another user into the list, then make a new backup.

Locate the password file using the ASMCMD pwget command.

ASMCMD> pwget --asm
+GRID/orapwASM

Back up the password file to another location with the pwcopy command.

ASMCMD> pwcopy +GRID/orapwASM  /backup/orapwASM 
copying +GRID/orapwASM -> /backup/orapwASM

3. Use md_backup command to create a backup file containing metadata for one or more disk groups.

To backup metadata for all disk groups, do the following:

ASMCMD> md_backup /tmp/dgmetabackup

Disk group metadata to be backed up: DATA
Disk group metadata to be backed up: FRA
Disk group metadata to be backed up: GRID

In case you need to backup metadata only for a specific disk group, use -G option.

4. Backup OLR on each node.

If OLR is missing or corrupted, clusterware can’t be started on that node. So make manual backup initially and after any change:

Do the following on each node:

# ocrconfig -local -manualbackup

Copy generated file to the backup location:

# cp /u01/app/12.2.0/grid/cdata/rac1/backup_20180510_230359.olr /backup/

Or change default backup location to /backup before making the actual backup:

# ocrconfig -local -backuploc /backup

# ocrconfig -local -manualbackup

5. Mirror and Backup OCR.

You should configure OCR in two independent disk groups. Typically, this is the work area and the recovery area. At least two OCR locations should be configured.

# ocrconfig -add +FRA

There are automatic OCR backups that are taken in the past 4 hours, 8 hours, 12 hours, and in the last day and week.

You can also manually backup OCR before applying any patch or upgrade GI home:

# ocrconfig -manualbackup

Regularly save taken backup to another location using the following way:

Identify the latest backup (manual or automatic):

[grid@rac1 ~]$ ocrconfig -showbackup
rac1 2018/05/10 13:06:18 +GRID:/myrac/OCRBACKUP/backup00.ocr.289.975762375 830990544
..

Copy it to the backup location:

$ ocrconfig -copy +GRID:/myrac/OCRBACKUP/backup00.ocr.289.975762375 /backup/backup00.ocr

Or change default backup locations to another diskgroup other than GRID:

# ocrconfig -backuploc +FRA

 

 

Add filegroup fails with ORA-15067: command or option incompatible with diskgroup redundancy

Problem:

I was trying to add filegroup to the FRA diskgroup:

SQL> alter diskgroup FRA add filegroup high_filegroup database orcl set ‘datafile.redundancy’ = ‘HIGH’;

Error:

ORA-15067: command or option incompatible with diskgroup redundancy

Troubleshooting:

Checking diskgroup type:

SQL> select name,type,compatibility,database_compatibility from v$asm_diskgroup where name=’FRA’;

NAME      TYPE   COMPATIBILITY    DATABASE_COMPATIBILITY
————- —— ————————– ————————————————————
FRA        NORMAL 18.0.0.0.0    12.2.0.1.0

Solution:

Change diskgroup type to FLEX:

SQL> alter diskgroup FRA convert redundancy to flex;
Diskgroup altered.

Check that type was changed:

SQL> select name,type,compatibility,database_compatibility from v$asm_diskgroup where name=’FRA’;

NAME      TYPE   COMPATIBILITY    DATABASE_COMPATIBILITY
————- —— ————————– ————————————————————
FRA        FLEX   18.0.0.0.0    12.2.0.1.0

Adding filegroup succeeds:

SQL> alter diskgroup FRA add filegroup high_filegroup database orcl set ‘datafile.redundancy’ = ‘HIGH’;
Diskgroup altered.

Daylight saving time support in Oracle CRS

Dear readers,

I am glad to announce that my blog has been entered in Top 50 Oracle Blogs. For more information about Top 100 Oracle Blogs And Websites for Oracle DBAs To Follow in 2018 please visit https://blog.feedspot.com/oracle_blogs. You will improve your knowledge and experience by following them. 

In this post, I want to share my experience of how I solved the daylight saving time problem with Oracle CRS. With the default setup, in case timezone changes on your system, the client/application who connects to the database remotely(local/BEQ connections have correct timezone) will still have old timezone information and will enter wrong data.

Some countries,  that are not affected by daylight saving time are lucky and does not have to worry about it. But if your servers are not located in lucky countries then you must make CRS DTS aware.

During the GI installation, Oracle saves Timezone information in $CRS_HOME/crs/install/s_crsconfig_hostname_env.txt file, that makes TZ not to change for CRS even it is changed on OS level.

Please note that timezone can be changed for the database using srvctl:

srvctl setenv database -env 'TZ=time zone'

But I do not recommend to do that, because you must do the same everytime you create a new database.
Better to change TZ globally at CRS level.

In simple words just commenting out the TZ variable in $CRS_HOME/crs/install/s_crsconfig_hostname_env.txt and restarting the CRS on each node just one time is enough to do that, but let’s check it.

1.  List the current timezone settings:

[root@rac1 ~]# timedatectl status|grep zone
Time zone: UTC (UTC, +0000)
[root@rac2 ~]#  timedatectl status|grep zone
Time zone: UTC (UTC, +0000)

2. Change timezone at OS level:

[root@rac1 ~]# timedatectl set-timezone Europe/Bratislava
[root@rac2 ~]# timedatectl set-timezone Europe/Bratislava

3. Check local and scan connections:

[oracle@rac1 ~]$ sqlplus / as sysdba

SQL> select to_char(sysdate,'HH24:MI:SS AM')  dbtime from dual;

DBTIME
-----------
18:50:05 PM     <<<<<<<<<<<<Correct , same as OS

[oracle@rac1 ~]$ sqlplus marik/123@ORCL

SQL> select to_char(sysdate,'HH24:MI:SS AM') dbtime from dual;

DBTIME
-----------
16:50:10 PM     <<<<<<<<<<<<Incorrect

4. Comment TZ in the config file:

[root@rac1 ~]# cat /u01/app/18.3.0/grid/crs/install/s_crsconfig_rac1_env.txt|grep TZ=
#   the appropriate time zone name. For example, TZ=America/New_York
#TZ=UTC

[root@rac2 ~]# cat /u01/app/18.3.0/grid/crs/install/s_crsconfig_rac2_env.txt|grep TZ=
#   the appropriate time zone name. For example, TZ=America/New_York
#TZ=UTC

5. Restart CRS on both nodes:

[root@rac1 ~]#  crsctl stop crs
[root@rac1 ~]#  crsctl start crs -wait
[root@rac2 ~]#  crsctl stop crs
[root@rac2 ~]#  crsctl start crs -wait

6. Change timezone on OS level several times and check local & scan connections:

[root@rac1 ~]# timedatectl set-timezone Africa/Conakry
[root@rac2 ~]# timedatectl set-timezone Africa/Conakry

Important: You need to reconnect to the database(so consider that sessions must be disconnected and reconnected again, old connections have old settings)

[oracle@rac1 ~]$ sqlplus / as sysdba

SQL> Select to_char(sysdate,'HH24:MI:SS AM') dbtime from dual;

DBTIME
-----------
17:15:56 PM <<<<<<<<<<<<Correct


[oracle@rac1 ~]$ sqlplus marik/123@ORCL

SQL> Select to_char(sysdate,'HH24:MI:SS AM') dbtime from dual;

DBTIME
-----------
17:15:27 PM <<<<<<<<<<<<Correct

Change one more time:

[root@rac1 ~]# timedatectl set-timezone America/Aruba
[root@rac2 ~]# timedatectl set-timezone America/Aruba

Exit connections and reconnect:

[oracle@rac1 ~]$ sqlplus / as sysdba

SQL> Select to_char(sysdate,'HH24:MI:SS AM') dbtime from dual;

DBTIME
-----------
13:17:47 PM <<<<<<<<<<<<Correct

[oracle@rac1 ~]$ sqlplus marik/123@ORCL

SQL> Select to_char(sysdate,'HH24:MI:SS AM') dbtime from dual;

DBTIME
-----------
13:17:31 PM <<<<<<<<<<<<Correct

Exadata: Rebuild RAC clusterware without deleting data Version 2

This post is differrent from my previous post Rebuild RAC clusterware without deleting data . Because two days ago I was upgrading grid infrastructure from 12.1 to 12.2 that was successfull on first node , but failed on second node. I will not describe why this happend, but the whole process was something complicated instead of being simple. We have installed several patches before the installation(gridSetup has this option to indicate patches before installation)… Seems the 12.2 software has many bugs even during upgrade process.(But I agree  with other DBA-s that 12.2 database is very stable itself).

So what happend now is that during first node upgrade OCR files was changed. I tried deconfigure from 12.2 home and it was also failed. So now I am with my clusterware that has corrupted OCR and voting disks(it belongs 12.2 version). In my presious post I was starting clusterware in exclusive mode with nocrs and restoring OCR from backup, but now because of voting disks are different version  it does not starting in even exclusive mode.

So I have followed the steps that recreate diskgroup , where OCR and voting disks are saved. Because it is Exadata Cell Storage disks , it was more complicated than with ordinary disks, where you can cleanup header using “dd”. Instead of dd you use cellcli.

So let’s start:

Connect to each cell server(I have three of them) and drop grid disks that belong to DBFS(it contains OCR and Voting disks). Be careful dropping griddisk causes data to be erased. So DBFS must contain only OCR and Vdisks not !DATA!

#Find the name, celldisk and size of the grid disk:

CellCLI> list griddisk where name like ‘DBFS_.*’ attributes name, cellDisk, size
DBFS_CD_02_lbcel01_dr_adm CD_02_lbcel01_dr_adm 33.796875G
DBFS_CD_03_lbcel01_dr_adm CD_03_lbcel01_dr_adm 33.796875G
DBFS_CD_04_lbcel01_dr_adm CD_04_lbcel01_dr_adm 33.796875G
DBFS_CD_05_lbcel01_dr_adm CD_05_lbcel01_dr_adm 33.796875G
DBFS_CD_06_lbcel01_dr_adm CD_06_lbcel01_dr_adm 33.796875G
DBFS_CD_07_lbcel01_dr_adm CD_07_lbcel01_dr_adm 33.796875G
DBFS_CD_08_lbcel01_dr_adm CD_08_lbcel01_dr_adm 33.796875G
DBFS_CD_09_lbcel01_dr_adm CD_09_lbcel01_dr_adm 33.796875G
DBFS_CD_10_lbcel01_dr_adm CD_10_lbcel01_dr_adm 33.796875G
DBFS_CD_11_lbcel01_dr_adm CD_11_lbcel01_dr_adm 33.796875G

 

#Drop

CellCLI> drop griddisk DBFS_CD_02_lbcel01_dr_adm
drop griddisk DBFS_CD_03_lbcel01_dr_adm
drop griddisk DBFS_CD_04_lbcel01_dr_adm
drop griddisk DBFS_CD_05_lbcel01_dr_adm
drop griddisk DBFS_CD_06_lbcel01_dr_adm
drop griddisk DBFS_CD_07_lbcel01_dr_adm
drop griddisk DBFS_CD_08_lbcel01_dr_adm
drop griddisk DBFS_CD_09_lbcel01_dr_adm
drop griddisk DBFS_CD_10_lbcel01_dr_adm
drop griddisk DBFS_CD_11_lbcel01_dr_adm

#Create

cellcli> create griddisk DBFS_CD_02_lbcel01_dr_adm celldisk=CD_02_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_03_lbcel01_dr_adm celldisk=CD_03_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_04_lbcel01_dr_adm celldisk=CD_04_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_05_lbcel01_dr_adm celldisk=CD_05_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_06_lbcel01_dr_adm celldisk=CD_06_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_07_lbcel01_dr_adm celldisk=CD_07_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_08_lbcel01_dr_adm celldisk=CD_08_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_09_lbcel01_dr_adm celldisk=CD_09_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_10_lbcel01_dr_adm celldisk=CD_10_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_11_lbcel01_dr_adm celldisk=CD_11_lbcel01_dr_adm, size=33.796875G

Do the same steps on other cells.

2.  Deconfigure root.sh on each node

# Run deconfig

/u01/app/12.1.0.2/grid/crs/install/rootcrs.sh -deconfig -force

#rename gpnp profile

mv /u01/app/12.1.0.2/grid/gpnp/profiles/peer/profile.xml /tmp/profile_backup.xml

3. Run root.sh on first node

/u01/app/12.1.0.2/grid/root.sh

It will fail because will not find the disk group DBFS for mounting and of course OCR inside.  But now asm is started in nomount mode and we are able to recreate diskgroup

4. Create DBFS diskgroup

sqlplus / as sysasm

SQL> create diskgroup DBFS
failgroup LBCEL01_DR_ADM disk ‘o/*/DBFS_CD_02_lbcel01_dr_adm’,’o/*/DBFS_CD_03_lbcel01_dr_adm’,’o/*/DBFS_CD_04_lbcel01_dr_adm’,’o/*/DBFS_CD_05_lbcel01_dr_adm’,’o/*/DBFS_CD_06_lbcel01_dr_adm’,’o/*/DBFS_CD_07_lbcel01_dr_adm’,’o/*/DBFS_CD_08_lbcel01_dr_adm’,’o/*/DBFS_CD_09_lbcel01_dr_adm’,’o/*/DBFS_CD_10_lbcel01_dr_adm’,’o/*/DBFS_CD_11_lbcel01_dr_adm’
failgroup LBCEL02_DR_ADM disk ‘o/*/DBFS_CD_02_lbcel02_dr_adm’,’o/*/DBFS_CD_03_lbcel02_dr_adm’,’o/*/DBFS_CD_04_lbcel02_dr_adm’,’o/*/DBFS_CD_05_lbcel02_dr_adm’,’o/*/DBFS_CD_06_lbcel02_dr_adm’,’o/*/DBFS_CD_07_lbcel02_dr_adm’,’o/*/DBFS_CD_08_lbcel02_dr_adm’,’o/*/DBFS_CD_09_lbcel02_dr_adm’,’o/*/DBFS_CD_10_lbcel02_dr_adm’,’o/*/DBFS_CD_11_lbcel02_dr_adm’
failgroup lbcel03_dr_adm disk ‘o/*/DBFS_CD_02_lbcel03_dr_adm’,’o/*/DBFS_CD_03_lbcel03_dr_adm’,’o/*/DBFS_CD_04_lbcel03_dr_adm’,’o/*/DBFS_CD_05_lbcel03_dr_adm’,’o/*/DBFS_CD_06_lbcel03_dr_adm’,’o/*/DBFS_CD_07_lbcel03_dr_adm’,’o/*/DBFS_CD_08_lbcel03_dr_adm’,’o/*/DBFS_CD_09_lbcel03_dr_adm’,’o/*/DBFS_CD_10_lbcel03_dr_adm’,’o/*/DBFS_CD_11_lbcel03_dr_adm’
ATTRIBUTE
‘compatible.asm’=’12.1.0.2.0’,
‘compatible.rdbms’=’11.2.0.2.0’,
‘au_size’=’4194304’,
‘cell.smart_scan_capable’=’TRUE’;

5. Do the following steps:

* Deconfigure root.sh again from first node
* remove gpnp profile
* run root.sh again on first node

At this time root.sh should be successful.

6. Restore OCR

/u01/app/12.1.0.2/grid/cdata/<clustername> directory contans OCR backups by default

crsctl stop crs -f
crsctl start crs -excl -nocrs
ocrconfig -restore /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup00.ocr
crsctl stop crs -f
crsctl start crs

7. Run root.sh on the second node

/u01/app/12.1.0.2/grid/root.sh