Exadata: Rebuild RAC clusterware without deleting data Version 2

This post is differrent from my previous post Rebuild RAC clusterware without deleting data . Because two days ago I was upgrading grid infrastructure from 12.1 to 12.2 that was successfull on first node , but failed on second node. I will not describe why this happend, but the whole process was something complicated instead of being simple. We have installed several patches before the installation(gridSetup has this option to indicate patches before installation)… Seems the 12.2 software has many bugs even during upgrade process.(But I agree  with other DBA-s that 12.2 database is very stable itself).

So what happend now is that during first node upgrade OCR files was changed. I tried deconfigure from 12.2 home and it was also failed. So now I am with my clusterware that has corrupted OCR and voting disks(it belongs 12.2 version). In my presious post I was starting clusterware in exclusive mode with nocrs and restoring OCR from backup, but now because of voting disks are different version  it does not starting in even exclusive mode.

So I have followed the steps that recreate diskgroup , where OCR and voting disks are saved. Because it is Exadata Cell Storage disks , it was more complicated than with ordinary disks, where you can cleanup header using “dd”. Instead of dd you use cellcli.

So let’s start:

Connect to each cell server(I have three of them) and drop grid disks that belong to DBFS(it contains OCR and Voting disks). Be careful dropping griddisk causes data to be erased. So DBFS must contain only OCR and Vdisks not !DATA!

#Find the name, celldisk and size of the grid disk:

CellCLI> list griddisk where name like ‘DBFS_.*’ attributes name, cellDisk, size
DBFS_CD_02_lbcel01_dr_adm CD_02_lbcel01_dr_adm 33.796875G
DBFS_CD_03_lbcel01_dr_adm CD_03_lbcel01_dr_adm 33.796875G
DBFS_CD_04_lbcel01_dr_adm CD_04_lbcel01_dr_adm 33.796875G
DBFS_CD_05_lbcel01_dr_adm CD_05_lbcel01_dr_adm 33.796875G
DBFS_CD_06_lbcel01_dr_adm CD_06_lbcel01_dr_adm 33.796875G
DBFS_CD_07_lbcel01_dr_adm CD_07_lbcel01_dr_adm 33.796875G
DBFS_CD_08_lbcel01_dr_adm CD_08_lbcel01_dr_adm 33.796875G
DBFS_CD_09_lbcel01_dr_adm CD_09_lbcel01_dr_adm 33.796875G
DBFS_CD_10_lbcel01_dr_adm CD_10_lbcel01_dr_adm 33.796875G
DBFS_CD_11_lbcel01_dr_adm CD_11_lbcel01_dr_adm 33.796875G



CellCLI> drop griddisk DBFS_CD_02_lbcel01_dr_adm
drop griddisk DBFS_CD_03_lbcel01_dr_adm
drop griddisk DBFS_CD_04_lbcel01_dr_adm
drop griddisk DBFS_CD_05_lbcel01_dr_adm
drop griddisk DBFS_CD_06_lbcel01_dr_adm
drop griddisk DBFS_CD_07_lbcel01_dr_adm
drop griddisk DBFS_CD_08_lbcel01_dr_adm
drop griddisk DBFS_CD_09_lbcel01_dr_adm
drop griddisk DBFS_CD_10_lbcel01_dr_adm
drop griddisk DBFS_CD_11_lbcel01_dr_adm


cellcli> create griddisk DBFS_CD_02_lbcel01_dr_adm celldisk=CD_02_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_03_lbcel01_dr_adm celldisk=CD_03_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_04_lbcel01_dr_adm celldisk=CD_04_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_05_lbcel01_dr_adm celldisk=CD_05_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_06_lbcel01_dr_adm celldisk=CD_06_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_07_lbcel01_dr_adm celldisk=CD_07_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_08_lbcel01_dr_adm celldisk=CD_08_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_09_lbcel01_dr_adm celldisk=CD_09_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_10_lbcel01_dr_adm celldisk=CD_10_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_11_lbcel01_dr_adm celldisk=CD_11_lbcel01_dr_adm, size=33.796875G

Do the same steps on other cells.

2.  Deconfigure root.sh on each node

# Run deconfig

/u01/app/ -deconfig -force

#rename gpnp profile

mv /u01/app/ /tmp/profile_backup.xml

3. Run root.sh on first node


It will fail because will not find the disk group DBFS for mounting and of course OCR inside.  But now asm is started in nomount mode and we are able to recreate diskgroup

4. Create DBFS diskgroup

sqlplus / as sysasm

SQL> create diskgroup DBFS
failgroup LBCEL01_DR_ADM disk ‘o/*/DBFS_CD_02_lbcel01_dr_adm’,’o/*/DBFS_CD_03_lbcel01_dr_adm’,’o/*/DBFS_CD_04_lbcel01_dr_adm’,’o/*/DBFS_CD_05_lbcel01_dr_adm’,’o/*/DBFS_CD_06_lbcel01_dr_adm’,’o/*/DBFS_CD_07_lbcel01_dr_adm’,’o/*/DBFS_CD_08_lbcel01_dr_adm’,’o/*/DBFS_CD_09_lbcel01_dr_adm’,’o/*/DBFS_CD_10_lbcel01_dr_adm’,’o/*/DBFS_CD_11_lbcel01_dr_adm’
failgroup LBCEL02_DR_ADM disk ‘o/*/DBFS_CD_02_lbcel02_dr_adm’,’o/*/DBFS_CD_03_lbcel02_dr_adm’,’o/*/DBFS_CD_04_lbcel02_dr_adm’,’o/*/DBFS_CD_05_lbcel02_dr_adm’,’o/*/DBFS_CD_06_lbcel02_dr_adm’,’o/*/DBFS_CD_07_lbcel02_dr_adm’,’o/*/DBFS_CD_08_lbcel02_dr_adm’,’o/*/DBFS_CD_09_lbcel02_dr_adm’,’o/*/DBFS_CD_10_lbcel02_dr_adm’,’o/*/DBFS_CD_11_lbcel02_dr_adm’
failgroup lbcel03_dr_adm disk ‘o/*/DBFS_CD_02_lbcel03_dr_adm’,’o/*/DBFS_CD_03_lbcel03_dr_adm’,’o/*/DBFS_CD_04_lbcel03_dr_adm’,’o/*/DBFS_CD_05_lbcel03_dr_adm’,’o/*/DBFS_CD_06_lbcel03_dr_adm’,’o/*/DBFS_CD_07_lbcel03_dr_adm’,’o/*/DBFS_CD_08_lbcel03_dr_adm’,’o/*/DBFS_CD_09_lbcel03_dr_adm’,’o/*/DBFS_CD_10_lbcel03_dr_adm’,’o/*/DBFS_CD_11_lbcel03_dr_adm’

5. Do the following steps:

* Deconfigure root.sh again from first node
* remove gpnp profile
* run root.sh again on first node

At this time root.sh should be successful.

6. Restore OCR

/u01/app/<clustername> directory contans OCR backups by default

crsctl stop crs -f
crsctl start crs -excl -nocrs
ocrconfig -restore /u01/app/
crsctl stop crs -f
crsctl start crs

7. Run root.sh on the second node