‘udev’ rules continuously being reloaded resulted in ASM nvme disks going offline

Environment:

Linux Server release 7.2
kernel-3.10.0-514.26.2.el7.x86_64

Problem:

When Oracle processes are opening the device for writing and then closing it, this synthesizes a change event. And udev rules having  ACTION=="add|change" gets reloaded. This behavior causes ASM nvme disks to go offline:

Thu Jul 09 16:33:16 2020
WARNING: Disk 18 (rac1$disk1) in group 2 mode 0x7f is now being offlined

Fri Jul 10 10:04:34 2020
WARNING: Disk 19 (rac1$disk5) in group 2 mode 0x7f is now being offlined

Fri Jul 10 13:45:45 2020
WARNING: Disk 15 (rac1$disk8) in group 2 mode 0x7f is now being offlined

Solution:

To suppress the false positive change events disable the inotify watch for devices used for Oracle ASM using following steps:

  1.  Create /etc/udev/rules.d/96-nvme-nowatch.rules file with just one line in it:
ACTION=="add|change", KERNEL=="nvme*", OPTIONS:="nowatch"

2. After creating the file please run the following to activate the rule:

# udevadm control --reload-rules
# udevadm trigger --type=devices --action=change

The above command will reload the complete udev configuration and will trigger all the udev rules. On a busy production system this could disrupt ongoing operations, applications running on the server. Please use the above command during a scheduled maintenance window only.

Source: https://access.redhat.com/solutions/1465913 + our experience with customers.

Change AHF home from /opt/oracle.ahf to /u01/oracle.ahf

Problem:

One of our customers had 2GB space for /opt mount point. After running root.sh script during GI configuration, 926M sized /opt/oracle.ahf folder was created which caused problems later with the available space in /opt.

Please note root.sh runs TFA installation using the following way:

2020-07-07 09:41:10: CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2020-07-07 09:41:10: Executed stage SetupTFA in 0 seconds
2020-07-07 09:41:10: Executing cmd: /u01/app/19.3.0/grid/crs/install/tfa_setup -silent -crshome /u01/app/19.3.0/grid

tfa_setup has an option -ahf_loc which is the Autonomous Health Framework home and the default value for it is /opt/oracle.ahf

There is a question, how can we avoid exhausting /opt space used by AHF?

Solution:

Choose only one: 1,2 or 3.

1. Increase /opt mount point size
2. Or, uninstall TFA (which deletes /opt/oracle.ahf folder and releases space) and reinstall it by indicating -ahf_loc option

# tfactl uninstall
# mkdir /u01/oracle.ahf
# chmod 755 /u01/oracle.ahf
# /u01/app/19.3.0/grid/crs/install/tfa_setup -ahf_loc /u01/oracle.ahf
...
AHF Location : /u01/oracle.ahf
Choose Data Directory from below options :
1. /u01/oracle.ahf [Free Space : 41347 MB]
2. /u01/app [Free Space : 41347 MB]
3. Enter a different Location

Choose Option [1 - 3] : 1
AHF Data Directory : /u01/oracle.ahf/data

Do you want to add AHF Notification Email IDs ? [Y]|N : N
...

3. Or, change default location for AHF home (AHF_HOME) before running root.sh script:

# mkdir /u01/oracle.ahf
# chmod 755 /u01/oracle.ahf
# export AHF_HOME=/u01/oracle.ahf
# /u01/app/19.3.0/grid/root.sh

Check that AHF home was created under /u01/oracle.ahf instead of /opt/oracle.ahf

# ll /opt|grep oracle.ahf

# ll /u01|grep oracle.ahf
drwxr-xr-x 10 root root 134 Jul 7 12:46 oracle.ahf

INS-45511: Installer has detected that an Oracle Grid Infrastructure home is marked incorrectly as configured

Problem:

After deconfiguring Oracle Restart stack using:

[root@rac1 ~]# /u01/app/19.3.0/grid/root.sh -deconfig
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/19.3.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/19.3.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/rac1/crsconfig/hadeconfig.log
2020/07/04 10:49:21 CLSRSC-332: CRS resources for listeners are still configured
2020/07/04 10:49:49 CLSRSC-337: Successfully deconfigured Oracle Restart stack

Tried to configure GI as clusterware stack and got the following error:

INS-45511: Installer has detected that an Oracle Grid Infrastructure home is marked incorrectly  as configured

Solution:

Remove CRS="true" accross GI home entry in /u01/app/oraInventory/ContentsXML/inventory.xml

Original:

<HOME NAME="OraGI19Home1" LOC="/u01/app/19.3.0/grid" TYPE="O" IDX="1" CRS="true"/>

After modification:

<HOME NAME="OraGI19Home1" LOC="/u01/app/19.3.0/grid" TYPE="O" IDX="1"/>

Retry configuration.

How to change listener ports after cluster setup (FG enabled clusters)

The document describes the steps required for modifying listener port numbers after setting up the Flashgrid cluster.

How to change SCAN listener port number

1) On the first node only, modify SCAN listener port number in CRS:

# su - grid
$ srvctl modify scan_listener -p <new_scan_listener_port>
$ srvctl stop scan_listener
$ srvctl start scan_listener

2) On each database node, update the following entry in /etc/flashgrid-scan.cfg (as root):

From:
scan_port: 1521

To:
scan_port: <new_scan_listener_port>

3) On each database node, restart Flashgrid SCAN Proxy service, and make sure that it is running successfully:

# systemctl restart flashgrid-scan-proxy.service
# systemctl status flashgrid-scan-proxy.service

How to change local listener port number

1) On each database node, modify port for NodeFQDN entry in <DATABASE_HOME>/network/admin/tnsnames.ora (as oracle):

DONOTDELETE,NODEFQDN = (ADDRESS = (PROTOCOL = TCP)(Host = <database node hostname>)(Port = <new_local_listener_port>))

2) On the first node only, modify local listener port number in CRS:

# su - grid
$ srvctl modify listener -p <new_local_listener_port>
$ srvctl stop listener
$ srvctl start listener

3) On each database node, update /etc/sysconfig/iptables (as root):

  • Remove immutable flag from iptables file:
  # chattr -i /etc/sysconfig/iptables
  • Modify value in /etc/sysconfig/iptables: From: --dport 1522 To: --dport <new_local_listener_port>
  • Add immutable flah to iptables file:
  # chattr +i /etc/sysconfig/iptables

4) On each database node, restart iptables:

# systemctl restart iptables.service

5) If you have already created database(s), connect to each database and run:

# su - oracle
$ sqlplus / as sysdba
SQL> alter system set local_listener='NODEFQDN';

CRS-6706: Oracle Clusterware Release Patch Level (‘xxxx’) Does Not Match Software Patch Level

Problem:

We have restored OCR from backup, started CRS, and rolled back patch using opatchauto. After restarting CRS we’ve got:

CRS-6706: Oracle Clusterware Release Patch Level ('xxxx') Does Not Match Software Patch Level ('yyyy')

Solution:

1. Unlock GI home from root user:

# cd $GI_HOME
# $GI_HOME/crs/install/rootcrs.sh -unlock

2. Complete the patching setup from root user (local means it will correct OLR, not OCR):

$ $ORACLE_HOME/bin/clscfg -localpatch

3. As the root user lock the GI home:

# $ORACLE_HOME/crs/install/rootcrs.sh -lock

4. Start CRS:

# crsctl start crs -wait


Restore OCR and correct Software Patch Level

Software patch level crsctl query crs softwarepatch and active patch level crsctl query crs activersion -f are introduced to ensure GI home has identical patches on all nodes.

It’s highly recommended to manually backup OCR using # ocrconfig -manualbackup before and after applying a patch or making configurtion changes (adding resources into CRS).

When you apply a patch to GI home, the software patch level will be updated on each node. On the last node, the active patch level will be updated for the cluster.

The patch level is stored in OCR, so if we restore old OCR backup, we need to update the patch level inside it using Oracle provided commands.

  1. Try to find patchlevel in the OCR backup that matches to activeversion:
[root@rac1 ~]# ocrdump -stdout -keyname SYSTEM.version.activeversion.patchlevel -backupfile +GRID:/marirac/OCRBACKUP/backup_20200425_171539.ocr.266.1038676539
...
[SYSTEM.version.activeversion.patchlevel]
UB4 (10) : 2701864972
...

If backup file is located on filesystem, then need to replace +GRID:/marirac/OCRBACKUP/backup_20200425_171539.ocr.266.1038676539 with the full path of the file.

Keep copies of OCR backups on filesystem also using asmcmd cp command.

2. Restore OCR backup

a) Stop crs services on all nodes
b) On one of the node, start cluster in exclusive and restore OCR backup:

# crsctl stop crs -f 
# crsctl start crs -excl -nocrs
# ocrconfig -restore /tmp/OCR/OCR_BACKUP_FILE
# crsctl stop crs -f

3. Correct software and active patch levels

a) Keep CRS down on all nodes
b) Start CRS on the 1st node, run clscfg, and check softwarepatch

# crsctl start crs -wait
# clscfg -patch 
# crsctl query crs softwarepatch

c) On the last node (do not run on other node, only from last node), issue the following command:

# crsctl stop rollingpatch

d) Verify that active patch level was updated

# crsctl query crs activeversion -f

Query the cluster active patchlevel in the OCR backup

To identify OCR backup where the cluster active patch level matches, you need to run the following:

Identify cluster active patch level:

[root@rac1 ~]# crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. 
The cluster upgrade state is [NORMAL]. 
The cluster active patch level is [2701864972].

If backup is located on diskgroup:

[root@rac1 ~]# ocrdump -stdout -keyname SYSTEM.version.activeversion.patchlevel -backupfile +GRID:/marirac/OCRBACKUP/backup_20200425_171539.ocr.266.1038676539
04/26/2020 12:28:20
+GRID:/marirac/OCRBACKUP/backup_20200425_171539.ocr.266.1038676539
/u01/app/19.3.0/grid/bin/ocrdump.bin -stdout -keyname SYSTEM.version.activeversion.patchlevel -backupfile +GRID:/marirac/OCRBACKUP/backup_20200425_171539.ocr.266.1038676539
[SYSTEM.version.activeversion.patchlevel]
UB4 (10) : 2701864972
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root}

If backup is located on filesystem:

[root@rac1 ~]# ocrdump -stdout -keyname SYSTEM.version.activeversion.patchlevel -backupfile /tmp/backup_07_13_47_25
04/26/2020 12:26:57
/tmp/backup_07_13_47_25
/u01/app/19.3.0/grid/bin/ocrdump.bin -stdout -keyname SYSTEM.version.activeversion.patchlevel -backupfile /tmp/backup_07_13_47_25
[SYSTEM.version.activeversion.patchlevel]
UB4 (10) : 2701864972
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root}

ORA-15137: The ASM cluster is in rolling patch state, CRS does not start

Problem:

The customer was not able to start cluster because ASM disks were offline and could not become online because the cluster was in ROLLING patch state.

SQL> ALTER DISKGROUP "GRID" ONLINE REGULAR DISK "RAC2$XVDN"
2020-04-23T17:28:10.851065+08:00ORA-15032: not all alterations performed
ORA-15137: The ASM cluster is in rolling patch state.

Troubleshooting:

  1. Check cluster activeversion
[root@rac1 ~]# crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [12.2.0.1.0].
The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [2250904419].

2. Check softwarepatch of each node:

[root@rac1 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac1 is [2269302628]

[root@rac2 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac2 is [2269302628]

[root@rac3 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac3 is [2269302628]

[root@rac4 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac4 is [3074559134]

As we see software version on rac4 is different then others. In addition to this activeversion [2250904419] is older, this is normal in ROLLING mode, when ROLLING mode is finished then this value in OCR is also updated.

3. From the output of the 2nd step we see that additional troubleshooting on installed patches are necessary, compare patches on each node:

Inventory showed the same output on all 4 nodes:

$ $GRID_HOME/OPatch/opatch lspatches
30593149;Database Jan 2020 Release Update : 12.2.0.1.200114 (30593149)
30591794;TOMCAT RELEASE UPDATE 12.2.0.1.0(ID:RELEASE) (30591794)
30586063;ACFS JAN 2020 RELEASE UPDATE 12.2.0.1.200114 (30586063)
30585969;OCW JAN 2020 RELEASE UPDATE 12.2.0.1.200114 (30585969)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)

But after checking kfod op=patches we’ve found that 4 additional patches existed on rac4 node only, the difference was not visible in inventory because they were inactive, left from old RU (they were not rolled back during applying superset patches, but they were made inactive).

The output from rac1, rac2 and rac3:

$ $GRID_HOME/bin/kfod op=patches
List of Patches
26710464
26737232
26839277
...
<extracted>

The output from rac4:

$ $GRID_HOME/bin/kfod op=patches
---------------
List of Patches
===============
26710464
26737232
26839277
...
<extracted>
28566910<<<<<Bug 28566910:TOMCAT RELEASE UPDATE 12.2.0.1.0
29757449<<<<<Bug 29757449:DATABASE JUL 2019 RELEASE UPDATE 12.2.0.1.190716
29770040<<<< Bug 29770040:OCW JUL 2019 RELEASE UPDATE 12.2.0.1.190716
29770090<<<<< Bug 29770090:ACFS JUL 2019 RELEASE UPDATE 12.2.0.1.190716

To rollback inactive patches you need to use patchgen, using opatch rollback -id it is not possible.

As root:

# $GRID_HOME/crs/install/rootcrs.sh -prepatch

As grid:

$ $GRID_HOME/bin/patchgen commit -rb 28566910
$ $GRID_HOME/bin/patchgen commit -rb 29757449
$ $GRID_HOME/bin/patchgen commit -rb 29770040
$ $GRID_HOME/bin/patchgen commit -rb 29770090

Validate patch level again on rac4, should be 2269302628:

$ $GRID_HOME/bin/kfod op=PATCHLVL
-------------------
Current Patch level
===================
2269302628

As root:

# $GRID_HOME/crs/install/rootcrs.sh -postpatch

4. If CRS was up at least on one node, then we would be able to stop rollingpatch state using crsctl stop rollingpatch (although, if CRS is up, postpatch should stop rolling also). But now our CRS is down, ASM is not able to access OCR file (this was caused by some other problem, which I consider as a bug, but will not mention here otherwise blog will become too long). We need to do extra steps to start CRS.

The only solution that I used here was to restore OCR from old backup. I needed OCR which was not in ROLLING mode. I needed NORMAL OCR backup, which was located on filesystem. I knew after restore my OCR activeversion would be lower than the current software version on nodes, but this is solvable (we will see this step also)

a) Stop crs services on all nodes
b) On one of the node, start cluster in exclusive and restore OCR backup:

# crsctl stop crs -f 
# crsctl start crs -excl -nocrs
# ocrconfig -restore /tmp/OCR/OCR_BACKUP_FILE
# crsctl stop crs -f

c) If you check softwarepatch here, it will not be 2269302628, but it will be something old (because OCR is old), so we need to correct it on each node:

# crsctl start crs -wait
# clscfg -patch
# crsctl query crs softwarepatch

Sofwarepatch will be 2269302628 on all nodes.

d) Note even correcting softwarepatch crsctl query crs activeversion -f still shows something old, which is normal. To correct it, run the following only on last node:

# crsctl stop rollingpatch
# crsctl query crs activeversion -f

e) If you had custom services or configuration in CRS after OCR backup was taken then you need to add them manually. For example, if you had service added srvctl add service, need to readd. That’s is why it’s is highly recommended to backup OCR manually before and after patching or any custom configuration changes using # ocrconfig -manualbackup

Migrate Azure resources between subscriptions

Problem

As MS team mentions migrating third party image based VMs between subscriptions is not possible.

When I try to migrate resources I get the following Error:

{"code":"ResourceMoveFailed","message":"Resource move is not supported for resources that have plan with different subscriptions. Resources are 'Microsoft.Compute/virtualMachines/rac1,Microsoft.Compute/virtualMachines/rac2,Microsoft.Compute/virtualMachines/racq' and correlation id is '14c65b8d-9ca5-4305-98fa-ce9b2d7e82b1'."}

As MS support team mentions we need to move resources using storage account and then create all of them manually in a new subscription. Which is very complicated. I found the following workaround.


Workaround

During migration, I found that the problem existed on VM and PIP resources only, while NSG, VNet, Disks did not have any issue (but they cannot be migrated if dependent resources exist, such as VM) .

HARBOR: Please do not consider this workaround for production systems. Contact MS support, if you encouter the same and get the recommendation from them.

  • I decided to save VM characteristics and deleted VMs from the old subscription. Don’t worry, data will not be lost, your disks are not deleted and you can create VM using OS disk and then attach additional disks.
    Save:
    > disk lun # and attached disk names
    > VM size
    > attached NICs
    > Publisher, Product, and Name for the image: Click VM link -> Export template (on the left side panel) -> find storage profile section inside template script.
  • I deleted PIP because it cannot be moved (we will recreate it in new subscription). If you don’t have PIP, then ignore. These are test servers so using PIP.
  1. Migrate resources NSG, VNet, Disks, Nics, … using the following way:

2. Choose destination Subscription, Resource group, .. click OK

3. When the migration finishes, go to the destination subscription and using powershell run the following commands:

#Select destination subscription:
Select-AzureRmSubscription -SubscriptionId '<your destination subscription id goes here>'

#######For rac1#######
#Define variables, use the same resource names that were migrated
$pipname = "rac1-pip"
$nicname = "rac1-nic1"
$vnetName = "maritestan3-vnet"
$rg = "maritestan3"
$loc = "Central US"

#Create Public IP
$pip = New-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg -Location $loc -AllocationMethod Dynamic
$pip = Get-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg

#Identify VNet, subnet, nic names that were migrated. And assign PIP to nic
$vnet = get-AzureRmVirtualNetwork -Name $vnetName -ResourceGroupName $rg
$subnet = Get-AzVirtualNetworkSubnetConfig -Name "default" -VirtualNetwork $vnet
$nic = get-AzureRmNetworkInterface -Name $nicname -ResourceGroupName $rg
$nic | Set-AzNetworkInterfaceIpConfig -Name ipconfig1 -PublicIPAddress $pip -Subnet $subnet
$nic | Set-AzNetworkInterface

#Define VM size and attach nic
$vm = New-AzureRmVMConfig -VMName "rac1" -VMSize "Standard_D8s_v3"
$vm = Add-AzureRmVMNetworkInterface -VM $vm -Id $nic.Id

#Define your plan, for this you will need Publisher, Product and Name saved from old subscription
Set-AzureRmVMPlan -VM $vm -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol"
Get-AzureRmMarketPlaceTerms -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol" | Set-AzureRmMarketPlaceTerms -Accept

#Provide the name of the OS disk from where VM will be created
$osDiskName = "rac1-root"
$disk = Get-AzureRmDisk -DiskName $osDiskName -ResourceGroupName $rg
$vm = Set-AzVMOSDisk -VM $vm -ManagedDiskId $disk.Id -CreateOption Attach -Linux

#Create new VM
New-AzureRmVM -ResourceGroupName $rg -Location $loc -VM $vm

I am repeating the same steps for other VMs.

#######For rac2#######
#Define variables, use the same resource names that were migrated
$pipname = "rac2-pip"
$nicname = "rac2-nic1"
$vnetName = "maritestan3-vnet"
$rg = "maritestan3"
$loc = "Central US"

#Create Public IP
$pip = New-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg -Location $loc -AllocationMethod Dynamic
$pip = Get-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg

#Identify VNet, subnet, nic names that were migrated. And assign PIP to nic
$vnet = get-AzureRmVirtualNetwork -Name $vnetName -ResourceGroupName $rg
$subnet = Get-AzVirtualNetworkSubnetConfig -Name "default" -VirtualNetwork $vnet
$nic = get-AzureRmNetworkInterface -Name $nicname -ResourceGroupName $rg
$nic | Set-AzNetworkInterfaceIpConfig -Name ipconfig1 -PublicIPAddress $pip -Subnet $subnet
$nic | Set-AzNetworkInterface

#Define VM size and attach nic
$vm = New-AzureRmVMConfig -VMName "rac2" -VMSize "Standard_D8s_v3"
$vm = Add-AzureRmVMNetworkInterface -VM $vm -Id $nic.Id

#Define your plan, for this you will need Publisher, Product and Name
Set-AzureRmVMPlan -VM $vm -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol"
Get-AzureRmMarketPlaceTerms -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol" | Set-AzureRmMarketPlaceTerms -Accept

#Provide the name of the OS disk from where VM will be created
$osDiskName = "rac2-root"
$disk = Get-AzureRmDisk -DiskName $osDiskName -ResourceGroupName $rg
$vm = Set-AzVMOSDisk -VM $vm -ManagedDiskId $disk.Id -CreateOption Attach -Linux

#Create new VM
New-AzureRmVM -ResourceGroupName $rg -Location $loc -VM $vm

#######For racq#######
#Define variables, use the same resource names that were migrated
$pipname = "racq-pip"
$nicname = "racq-nic1"
$vnetName = "maritestan3-vnet"
$rg = "maritestan3"
$loc = "Central US"

#Create Public IP
$pip = New-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg -Location $loc -AllocationMethod Dynamic
$pip = Get-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg

#Identify VNet, subnet, nic names that were migrated. And assign PIP to nic
$vnet = get-AzureRmVirtualNetwork -Name $vnetName -ResourceGroupName $rg
$subnet = Get-AzVirtualNetworkSubnetConfig -Name "default" -VirtualNetwork $vnet
$nic = get-AzureRmNetworkInterface -Name $nicname -ResourceGroupName $rg
$nic | Set-AzNetworkInterfaceIpConfig -Name ipconfig1 -PublicIPAddress $pip -Subnet $subnet
$nic | Set-AzNetworkInterface

#Define VM size and attach nic
$vm = New-AzureRmVMConfig -VMName "racq" -VMSize "Standard_D8s_v3"
$vm = Add-AzureRmVMNetworkInterface -VM $vm -Id $nic.Id

#Define your plan, for this you will need Publisher, Product and Name
Set-AzureRmVMPlan -VM $vm -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol"
Get-AzureRmMarketPlaceTerms -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol" | Set-AzureRmMarketPlaceTerms -Accept

#Provide the name of the OS disk from where VM will be created
$osDiskName = "racq-root"
$disk = Get-AzureRmDisk -DiskName $osDiskName -ResourceGroupName $rg
$vm = Set-AzVMOSDisk -VM $vm -ManagedDiskId $disk.Id -CreateOption Attach -Linux

#Create new VM
New-AzureRmVM -ResourceGroupName $rg -Location $loc -VM $vm

4. Attach additional disks and start VMs.

Configuring TCPS for database client connections on Flashgrid-enabled clusters

In this document we assume the following port assignments (all four ports must be different):

  • TCP SCAN Listener: 1521
  • TCP Local Listener: 1522
  • TCPS SCAN Listener: 1523
  • TCPS Local Listener: 1524

1. Add TCPS port to SCAN Proxy configuration

Do the following on all database nodes.

1.1 Modify /etc/flashgrid-scan.cfg

#scan_port: 1521
scan_port_list: [1521, 1523]

1.2 Restart SCAN Proxy service

# systemctl restart flashgrid-scan-proxy.service

2. Add iptables forwarding rule for Local Listener TCPS port

Do the following on all database nodes.

2.1 Remove immutable flag from /etc/sysconfig/iptables

# chattr -i /etc/sysconfig/iptables

2.2 In /etc/sysconfig/iptables add the following forwarding rule under similar rule for TCP port (replace X with address corresponding to the node):

-A PREROUTING -i eth0 -p tcp -m tcp --dport 1524 -j DNAT --to-destination 192.168.1.X

2.3 Add immutable flag to /etc/sysconfig/iptables

# chattr +i /etc/sysconfig/iptables

2.4 Restart iptables service

# systemctl restart iptables.service

3. Add TCPS port to Local Listener configuration

On the first node:

[grid@rac1 ~]$ srvctl modify listener -p "TCP:1522/TCPS:1524";

On each database node, modify LISTENER parameter in listener.ora under GI home (replace rac1.example.com with corresponding host name):

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac1.example.com)(PORT = 1522))
      (ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
      (ADDRESS = (PROTOCOL = TCPS)(HOST =rac1.example.com)(PORT = 1524))
    )
  )

SECURE_REGISTER_LISTENER_SCAN1 = (IPC,TCPS)
SECURE_REGISTER_LISTENER_SCAN2 = (IPC,TCPS)
SECURE_REGISTER_LISTENER_SCAN3 = (IPC,TCPS)
SECURE_REGISTER_LISTENER = (IPC,TCPS)

From the first node:

[grid@rac1 ~]$ srvctl stop listener
[grid@rac1 ~]$ srvctl start listener

[grid@rac1 ~]$ srvctl config listener
Name: LISTENER
Type: Database Listener
Network: 1, Owner: grid
Home: <CRS home>
End points: TCP:1522/TCPS:1524
Listener is enabled.
Listener is individually enabled on nodes: 
Listener is individually disabled on nodes: 

[grid@rac1 ~]$ lsnrctl status
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac1.example.com)(PORT=1522)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=rac1.example.com)(PORT=1524)))

From the second node:

[grid@rac2 ~]$ lsnrctl status
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac2.example.com)(PORT=1522)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=rac2.example.com)(PORT=1524)))

4. Add TCPS port to SCAN Listener configuration

On the first database node:

[grid@rac1 ~]$ srvctl stop scan_listener
[grid@rac1 ~]$ srvctl stop scan
[grid@rac1 ~]$ srvctl modify scan_listener -p TCP:1521/TCPS:1523
[grid@rac1 ~]$ srvctl start scan
[grid@rac1 ~]$ srvctl start scan_listener

[grid@rac1 ~]$ srvctl config scan_listener
SCAN Listeners for network 1:
Registration invited nodes: 
Registration invited subnets: 
Endpoints: TCP:1521/TCPS:1523
SCAN Listener LISTENER_SCAN1 exists
SCAN Listener is enabled.
SCAN Listener LISTENER_SCAN2 exists
SCAN Listener is enabled.
SCAN Listener LISTENER_SCAN3 exists
SCAN Listener is enabled.

[grid@rac1 ~]$ lsnrctl status listener_scan3
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN3)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=192.168.1.23)(PORT=1523)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.23)(PORT=1521)))
The listener supports no services
The command completed successfully

[grid@rac1 ~]$ crsctl stat res -p |grep ENDPOINTS
ENDPOINTS=TCP:1525
ENDPOINTS=TCP:1522 TCPS:1524
ENDPOINTS=TCP:1521 TCPS:1523
ENDPOINTS=TCP:1521 TCPS:1523
ENDPOINTS=TCP:1521 TCPS:1523

5. Create SSL Certificates and wallets for DB nodes and client

5.1 Create a self-signed CA (You can use your own CA and safely skip this step)

On the first database node:

[oracle@rac1 ~]$ mkdir -p /u01/app/oracle/CA
[oracle@rac1 ~]$ export CA_HOME=/u01/app/oracle/CA
[oracle@rac1 ~]$ orapki wallet create -wallet  $CA_HOME  -auto_login -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet remove -trusted_cert_all -wallet $CA_HOME -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet add -wallet $CA_HOME -self_signed -dn "CN=TEST-CA" -keysize 1024 -validity 3650 -sign_alg sha256 -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet export -wallet $CA_HOME -dn "CN=TEST-CA" -cert /u01/app/oracle/CA/testCA.cer -pwd Oracle123

[oracle@rac1 ~]$ orapki wallet display -wallet $CA_HOME -summary
Oracle PKI Tool Release 19.0.0.0.0 - Production
Version 19.4.0.0.0
Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

Requested Certificates:
User Certificates:
Subject:        CN=TEST-CA
Trusted Certificates:
Subject:        CN=TEST-CA

5.2 Create Cluster Wallet and Certificate

On the first database node:

[oracle@rac1 ~]$ mkdir -p /u01/app/oracle/wallet
[oracle@rac1 ~]$ export W_HOME=/u01/app/oracle/wallet
[oracle@rac1 ~]$ orapki wallet create -wallet $W_HOME -pwd Oracle123 -auto_login -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -trusted_cert -cert /u01/app/oracle/CA/testCA.cer -pwd Oracle123

[oracle@rac1 ~]$ orapki wallet display -wallet $W_HOME -summary
Oracle PKI Tool Release 19.0.0.0.0 - Production
Version 19.4.0.0.0
Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

Requested Certificates:
User Certificates:
Trusted Certificates:
Subject:        CN=TEST-CA

Find out what is your scan name (from oracle or root user):

[oracle@rac1 ~]$ srvctl config scan|grep "SCAN name"
SCAN name: marihcheck-scan.example.com, Network: 1

Instead of marihcheck-scan indicate scan name returned from the previous command:

[oracle@rac1 ~]$ export W_HOME=/u01/app/oracle/wallet
[oracle@rac1 ~]$ export CA_HOME=/u01/app/oracle/CA
[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -dn "CN=marihcheck-scan" -keysize 1024 -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet export -wallet $W_HOME -dn "CN=marihcheck-scan" -request $W_HOME/marihcheck.req -pwd Oracle123
[oracle@rac1 ~]$ orapki cert create -wallet $CA_HOME -request $W_HOME/marihcheck.req -cert $W_HOME/marihcheck.cer -validity 3650 -sign_alg sha256 -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -user_cert -cert $W_HOME/marihcheck.cer -pwd Oracle123

[oracle@rac1 ~]$ orapki wallet display -wallet $W_HOME -summary
Oracle PKI Tool Release 19.0.0.0.0 - Production
Version 19.4.0.0.0
Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

Requested Certificates:
User Certificates:
Subject:        CN=marihcheck-scan
Trusted Certificates:
Subject:        CN=TEST-CA

5.3 Create Client Wallet and Certificate

From the first database node:

Instead of clientvm indicate hostname of the client:

[oracle@rac1 ~]$ export W_HOME=/u01/app/oracle/wallet/client
[oracle@rac1 ~]$ orapki wallet create -wallet $W_HOME -pwd Oracle123 -auto_login -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -trusted_cert -cert /u01/app/oracle/CA/testCA.cer -pwd Oracle123

[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -dn "CN=clientvm" -keysize 1024 -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet export -wallet $W_HOME -dn "CN=clientvm" -request $W_HOME/clientvm.req -pwd Oracle123
[oracle@rac1 ~]$ orapki cert create -wallet $CA_HOME -request $W_HOME/clientvm.req -cert $W_HOME/clientvm.cer -validity 3650 -sign_alg sha256 -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet add -wallet $W_HOME -user_cert -cert $W_HOME/clientvm.cer -pwd Oracle123
[oracle@rac1 ~]$ orapki wallet display -wallet $W_HOME -summary
Oracle PKI Tool Release 19.0.0.0.0 - Production
Version 19.4.0.0.0
Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.

Requested Certificates:
User Certificates:
Subject:        CN=clientvm
Trusted Certificates:
Subject:        CN=TEST-CA

Change permission on server and client cwallet.sso file:

[oracle@rac1 wallet]$ chmod 640 /u01/app/oracle/wallet/cwallet.sso
[oracle@rac1 wallet]$ chmod 640 /u01/app/oracle/wallet/client/cwallet.sso

Create wallet location on the clientvm and copy generated client wallet from rac1 to the client:

[root@clientvm ~]# mkdir /usr/lib/oracle/19.5/client64/wallet
[root@clientvm ~]# cd /usr/lib/oracle/19.5/client64/wallet
[root@clientvm ~]# scp oracle@rac1:/u01/app/oracle/wallet/client/* .

Delete client folder from rac1:

[oracle@rac1 ~]$ rm -rf /u01/app/oracle/wallet/client/

Copy generated server wallet from rac1 to rac2:

[oracle@rac1 ~]$ scp -r /u01/app/oracle/wallet oracle@rac2:/u01/app/oracle/
cwallet.sso              100% 2445   638.3KB/s   00:00
cwallet.sso.lck          100%    0     0.0KB/s   00:00
ewallet.p12              100% 2400   540.6KB/s   00:00
ewallet.p12.lck          100%    0     0.0KB/s   00:00

6. Configure Oracle network files

6.1 Add the following lines to the server side sqlnet.ora file in RDBMS and GI homes accordingly (on all database nodes)

[oracle@rac1 ~]$ cat /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/sqlnet.ora
WALLET_LOCATION =
   (SOURCE =
     (METHOD = FILE)
     (METHOD_DATA =
       (DIRECTORY = /u01/app/oracle/wallet)
     )
   )

SSL_VERSION = 0
SQLNET.AUTHENTICATION_SERVICES = (TCPS,BEQ)
SSL_CLIENT_AUTHENTICATION = FALSE
[grid@rac1 ~]$ cat /u01/app/19.3.0/grid/network/admin/sqlnet.ora
WALLET_LOCATION =
   (SOURCE =
     (METHOD = FILE)
     (METHOD_DATA =
       (DIRECTORY = /u01/app/oracle/wallet)
     )
   )

SSL_CLIENT_AUTHENTICATION = FALSE

6.2 Add the following entries to the client side sqlnet.ora file:

[root@clientvm admin]# cat sqlnet.ora
WALLET_LOCATION =
   (SOURCE =
     (METHOD = FILE)
     (METHOD_DATA =
       (DIRECTORY = /usr/lib/oracle/19.5/client64/wallet)
     )
   )

6.3 Add the following lines to the listener.ora in GI home (/u01/app/19.3.0/grid/network/admin/listener.ora) on all database nodes

SSL_CLIENT_AUTHENTICATION = FALSE

WALLET_LOCATION =
  (SOURCE =
    (METHOD = FILE)
    (METHOD_DATA =
      (DIRECTORY = /u01/app/oracle/wallet)
    )
  )

6.4 Update NODEFQDN entry to TCPS/1524 in tnsnames.ora in RDBMS home on all database nodes

[oracle@rac1 ~]$ cat /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/tnsnames.ora

DONOTDELETE,NODEFQDN =
  (ADDRESS = (PROTOCOL = TCPS)(Host = rac1.example.com)(Port = 1524))
[oracle@rac2 ~]$ cat /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/tnsnames.ora

DONOTDELETE,NODEFQDN =
  (ADDRESS = (PROTOCOL = TCPS)(Host = rac2.example.com)(Port = 1524))

6.5 Reset local_listener parameter to NODEFQND

[oracle@rac1 ~]$ export ORACLE_SID=orclcdb1
[oracle@rac1 ~]$ sqlplus / as sysdba

SQL> alter system set local_listener='NODEFQDN';

6.6 Restart listeners:

[root@rac1 ~]# srvctl stop listener; srvctl start listener
[root@rac1 ~]# srvctl stop scan_listener; srvctl start scan_listener

6.7 Add the following entries to the client side tnsnames.ora

[root@clientvm admin]# cat tnsnames.ora
TEST_SSL =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCPS)(HOST = marihcheck-scan.example.com)(PORT = 1523))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orclcdb)
    )
  )

TEST_SSL1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCPS)(HOST = rac1.example.com)(PORT = 1524))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orclcdb)
    )
  )

TEST_SSL2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCPS)(HOST = rac2.example.com)(PORT = 1524))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orclcdb)
    )
  )

7. Test connections

[root@clientvm admin]# export ORACLE_HOME=/usr/lib/oracle/19.5/client64
[root@clientvm admin]# export PATH=$ORACLE_HOME/bin:$PATH
[root@clientvm admin]# export TNS_ADMIN=$ORACLE_HOME/network/admin

[root@clientvm admin]# sqlplus system/oracle@TEST_SSL

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Mar 19 11:52:33 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Sat Mar 14 2020 19:54:53 +00:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.6.0.0.0

SQL> SELECT SYS_CONTEXT('USERENV', 'network_protocol') FROM DUAL;

SYS_CONTEXT('USERENV','NETWORK_PROTOCOL')
--------------------------------------------------------------------------------
tcps
[root@clientvm admin]# sqlplus system/oracle@TEST_SSL1

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Mar 19 11:53:23 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Thu Mar 19 2020 11:53:20 +00:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.6.0.0.0

SQL>
[root@clientvm admin]# sqlplus system/oracle@TEST_SSL2

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Mar 19 11:53:23 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Thu Mar 19 2020 11:53:20 +00:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.6.0.0.0

SQL>

Note: The same steps are described https://kb.flashgrid.io/configuring-tcps-for-client-connections#6-configure-oracle-network-files, which is also written by me.