ORA-15137: The ASM cluster is in rolling patch state, CRS does not start

Problem:

The customer was not able to start cluster because ASM disks were offline and could not become online because the cluster was in ROLLING patch state.

SQL> ALTER DISKGROUP "GRID" ONLINE REGULAR DISK "RAC2$XVDN"
2020-04-23T17:28:10.851065+08:00ORA-15032: not all alterations performed
ORA-15137: The ASM cluster is in rolling patch state.

Troubleshooting:

  1. Check cluster activeversion
[root@rac1 ~]# crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [12.2.0.1.0].
The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [2250904419].

2. Check softwarepatch of each node:

[root@rac1 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac1 is [2269302628]

[root@rac2 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac2 is [2269302628]

[root@rac3 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac3 is [2269302628]

[root@rac4 ~]# crsctl query crs softwarepatch
Oracle Clusterware patch level on node rac4 is [3074559134]

As we see software version on rac4 is different then others. In addition to this activeversion [2250904419] is older, this is normal in ROLLING mode, when ROLLING mode is finished then this value in OCR is also updated.

3. From the output of the 2nd step we see that additional troubleshooting on installed patches are necessary, compare patches on each node:

Inventory showed the same output on all 4 nodes:

$ $GRID_HOME/OPatch/opatch lspatches
30593149;Database Jan 2020 Release Update : 12.2.0.1.200114 (30593149)
30591794;TOMCAT RELEASE UPDATE 12.2.0.1.0(ID:RELEASE) (30591794)
30586063;ACFS JAN 2020 RELEASE UPDATE 12.2.0.1.200114 (30586063)
30585969;OCW JAN 2020 RELEASE UPDATE 12.2.0.1.200114 (30585969)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)

But after checking kfod op=patches we’ve found that 4 additional patches existed on rac4 node only, the difference was not visible in inventory because they were inactive, left from old RU (they were not rolled back during applying superset patches, but they were made inactive).

The output from rac1, rac2 and rac3:

$ $GRID_HOME/bin/kfod op=patches
List of Patches
26710464
26737232
26839277
...
<extracted>

The output from rac4:

$ $GRID_HOME/bin/kfod op=patches
---------------
List of Patches
===============
26710464
26737232
26839277
...
<extracted>
28566910<<<<<Bug 28566910:TOMCAT RELEASE UPDATE 12.2.0.1.0
29757449<<<<<Bug 29757449:DATABASE JUL 2019 RELEASE UPDATE 12.2.0.1.190716
29770040<<<< Bug 29770040:OCW JUL 2019 RELEASE UPDATE 12.2.0.1.190716
29770090<<<<< Bug 29770090:ACFS JUL 2019 RELEASE UPDATE 12.2.0.1.190716

To rollback inactive patches you need to use patchgen, using opatch rollback -id it is not possible.

As root:

# $GRID_HOME/crs/install/rootcrs.sh -prepatch

As grid:

$ $GRID_HOME/bin/patchgen commit -rb 28566910
$ $GRID_HOME/bin/patchgen commit -rb 29757449
$ $GRID_HOME/bin/patchgen commit -rb 29770040
$ $GRID_HOME/bin/patchgen commit -rb 29770090

Validate patch level again on rac4, should be 2269302628:

$ $GRID_HOME/bin/kfod op=PATCHLVL
-------------------
Current Patch level
===================
2269302628

As root:

# $GRID_HOME/crs/install/rootcrs.sh -postpatch

4. If CRS was up at least on one node, then we would be able to stop rollingpatch state using crsctl stop rollingpatch (although, if CRS is up, postpatch should stop rolling also). But now our CRS is down, ASM is not able to access OCR file (this was caused by some other problem, which I consider as a bug, but will not mention here otherwise blog will become too long). We need to do extra steps to start CRS.

The only solution that I used here was to restore OCR from old backup. I needed OCR which was not in ROLLING mode. I needed NORMAL OCR backup, which was located on filesystem. I knew after restore my OCR activeversion would be lower than the current software version on nodes, but this is solvable (we will see this step also)

a) Stop crs services on all nodes
b) On one of the node, start cluster in exclusive and restore OCR backup:

# crsctl stop crs -f 
# crsctl start crs -excl -nocrs
# ocrconfig -restore /tmp/OCR/OCR_BACKUP_FILE
# crsctl stop crs -f

c) If you check softwarepatch here, it will not be 2269302628, but it will be something old (because OCR is old), so we need to correct it on each node:

# crsctl start crs -wait
# clscfg -patch
# crsctl query crs softwarepatch

Sofwarepatch will be 2269302628 on all nodes.

d) Note even correcting softwarepatch crsctl query crs activeversion -f still shows something old, which is normal. To correct it, run the following only on last node:

# crsctl stop rollingpatch
# crsctl query crs activeversion -f

e) If you had custom services or configuration in CRS after OCR backup was taken then you need to add them manually. For example, if you had service added srvctl add service, need to readd. That’s is why it’s is highly recommended to backup OCR manually before and after patching or any custom configuration changes using # ocrconfig -manualbackup

Migrate Azure resources between subscriptions

Problem

As MS team mentions migrating third party image based VMs between subscriptions is not possible.

When I try to migrate resources I get the following Error:

{"code":"ResourceMoveFailed","message":"Resource move is not supported for resources that have plan with different subscriptions. Resources are 'Microsoft.Compute/virtualMachines/rac1,Microsoft.Compute/virtualMachines/rac2,Microsoft.Compute/virtualMachines/racq' and correlation id is '14c65b8d-9ca5-4305-98fa-ce9b2d7e82b1'."}

As MS support team mentions we need to move resources using storage account and then create all of them manually in a new subscription. Which is very complicated. I found the following workaround.


Workaround

During migration, I found that the problem existed on VM and PIP resources only, while NSG, VNet, Disks did not have any issue (but they cannot be migrated if dependent resources exist, such as VM) .

HARBOR: Please do not consider this workaround for production systems. Contact MS support, if you encouter the same and get the recommendation from them.

  • I decided to save VM characteristics and deleted VMs from the old subscription. Don’t worry, data will not be lost, your disks are not deleted and you can create VM using OS disk and then attach additional disks.
    Save:
    > disk lun # and attached disk names
    > VM size
    > attached NICs
    > Publisher, Product, and Name for the image: Click VM link -> Export template (on the left side panel) -> find storage profile section inside template script.
  • I deleted PIP because it cannot be moved (we will recreate it in new subscription). If you don’t have PIP, then ignore. These are test servers so using PIP.
  1. Migrate resources NSG, VNet, Disks, Nics, … using the following way:

2. Choose destination Subscription, Resource group, .. click OK

3. When the migration finishes, go to the destination subscription and using powershell run the following commands:

#Select destination subscription:
Select-AzureRmSubscription -SubscriptionId '<your destination subscription id goes here>'

#######For rac1#######
#Define variables, use the same resource names that were migrated
$pipname = "rac1-pip"
$nicname = "rac1-nic1"
$vnetName = "maritestan3-vnet"
$rg = "maritestan3"
$loc = "Central US"

#Create Public IP
$pip = New-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg -Location $loc -AllocationMethod Dynamic
$pip = Get-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg

#Identify VNet, subnet, nic names that were migrated. And assign PIP to nic
$vnet = get-AzureRmVirtualNetwork -Name $vnetName -ResourceGroupName $rg
$subnet = Get-AzVirtualNetworkSubnetConfig -Name "default" -VirtualNetwork $vnet
$nic = get-AzureRmNetworkInterface -Name $nicname -ResourceGroupName $rg
$nic | Set-AzNetworkInterfaceIpConfig -Name ipconfig1 -PublicIPAddress $pip -Subnet $subnet
$nic | Set-AzNetworkInterface

#Define VM size and attach nic
$vm = New-AzureRmVMConfig -VMName "rac1" -VMSize "Standard_D8s_v3"
$vm = Add-AzureRmVMNetworkInterface -VM $vm -Id $nic.Id

#Define your plan, for this you will need Publisher, Product and Name saved from old subscription
Set-AzureRmVMPlan -VM $vm -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol"
Get-AzureRmMarketPlaceTerms -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol" | Set-AzureRmMarketPlaceTerms -Accept

#Provide the name of the OS disk from where VM will be created
$osDiskName = "rac1-root"
$disk = Get-AzureRmDisk -DiskName $osDiskName -ResourceGroupName $rg
$vm = Set-AzVMOSDisk -VM $vm -ManagedDiskId $disk.Id -CreateOption Attach -Linux

#Create new VM
New-AzureRmVM -ResourceGroupName $rg -Location $loc -VM $vm

I am repeating the same steps for other VMs.

#######For rac2#######
#Define variables, use the same resource names that were migrated
$pipname = "rac2-pip"
$nicname = "rac2-nic1"
$vnetName = "maritestan3-vnet"
$rg = "maritestan3"
$loc = "Central US"

#Create Public IP
$pip = New-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg -Location $loc -AllocationMethod Dynamic
$pip = Get-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg

#Identify VNet, subnet, nic names that were migrated. And assign PIP to nic
$vnet = get-AzureRmVirtualNetwork -Name $vnetName -ResourceGroupName $rg
$subnet = Get-AzVirtualNetworkSubnetConfig -Name "default" -VirtualNetwork $vnet
$nic = get-AzureRmNetworkInterface -Name $nicname -ResourceGroupName $rg
$nic | Set-AzNetworkInterfaceIpConfig -Name ipconfig1 -PublicIPAddress $pip -Subnet $subnet
$nic | Set-AzNetworkInterface

#Define VM size and attach nic
$vm = New-AzureRmVMConfig -VMName "rac2" -VMSize "Standard_D8s_v3"
$vm = Add-AzureRmVMNetworkInterface -VM $vm -Id $nic.Id

#Define your plan, for this you will need Publisher, Product and Name
Set-AzureRmVMPlan -VM $vm -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol"
Get-AzureRmMarketPlaceTerms -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol" | Set-AzureRmMarketPlaceTerms -Accept

#Provide the name of the OS disk from where VM will be created
$osDiskName = "rac2-root"
$disk = Get-AzureRmDisk -DiskName $osDiskName -ResourceGroupName $rg
$vm = Set-AzVMOSDisk -VM $vm -ManagedDiskId $disk.Id -CreateOption Attach -Linux

#Create new VM
New-AzureRmVM -ResourceGroupName $rg -Location $loc -VM $vm

#######For racq#######
#Define variables, use the same resource names that were migrated
$pipname = "racq-pip"
$nicname = "racq-nic1"
$vnetName = "maritestan3-vnet"
$rg = "maritestan3"
$loc = "Central US"

#Create Public IP
$pip = New-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg -Location $loc -AllocationMethod Dynamic
$pip = Get-AzureRmPublicIpAddress -Name $pipname -ResourceGroupName $rg

#Identify VNet, subnet, nic names that were migrated. And assign PIP to nic
$vnet = get-AzureRmVirtualNetwork -Name $vnetName -ResourceGroupName $rg
$subnet = Get-AzVirtualNetworkSubnetConfig -Name "default" -VirtualNetwork $vnet
$nic = get-AzureRmNetworkInterface -Name $nicname -ResourceGroupName $rg
$nic | Set-AzNetworkInterfaceIpConfig -Name ipconfig1 -PublicIPAddress $pip -Subnet $subnet
$nic | Set-AzNetworkInterface

#Define VM size and attach nic
$vm = New-AzureRmVMConfig -VMName "racq" -VMSize "Standard_D8s_v3"
$vm = Add-AzureRmVMNetworkInterface -VM $vm -Id $nic.Id

#Define your plan, for this you will need Publisher, Product and Name
Set-AzureRmVMPlan -VM $vm -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol"
Get-AzureRmMarketPlaceTerms -Publisher "flashgrid-inc" -Product "flashgrid-skycluster" -Name "skycluster-ol-priv-byol" | Set-AzureRmMarketPlaceTerms -Accept

#Provide the name of the OS disk from where VM will be created
$osDiskName = "racq-root"
$disk = Get-AzureRmDisk -DiskName $osDiskName -ResourceGroupName $rg
$vm = Set-AzVMOSDisk -VM $vm -ManagedDiskId $disk.Id -CreateOption Attach -Linux

#Create new VM
New-AzureRmVM -ResourceGroupName $rg -Location $loc -VM $vm

4. Attach additional disks and start VMs.