Restart Exadata storage cell service without affecting ASM

Brief history:

One week ago on our DR Exadata cell service hanged, which caused all databases located on Exadata to become inaccessible.

CellCLI> LIST ALERTHISTORY
9 2017-10-13T11:56:05+04:00 critical “RS-7445 [Serv CELLSRV hang detected] [It will be restarted] [] [] [] [] [] [] [] [] [] []”

In cell’s alert history there was written that the service would be restarted itself , but it did not and I restarted it by the following way:

CellCLI> ALTER CELL RESTART SERVICES CELLSRV

The databases started to work correctly.

Today, the same problem happend on the HQ side which of course caused to stop everything for a while until I’ve restarted the service.

But identifying which cell was problematic was a little bit difficult, because there was no error in alerthistory.

BUT when I entered the following command on the third cell node – it hanged, other cells were OK.

CellCLI> LIST ACTIVEREQUEST

So I restarted the same service on that node and problem was resolved.

CellCLI> ALTER CELL RESTART SERVICES CELLSRV

Of course, this is not a solution and cell service must not hang! , but this is the simple workaround when you have stopped PRODUCTION database.

I have created SR and waiting answer from them , if there is any usefull news will update this post.

===================================================================================================

Writing down the correct steps of restarting Cell Services without affecting ASM:

1.  Run the following command to check if there are offline disks on other cells that are mirrored with disks on this cell:

CellCLI > LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != ‘Yes’

Warning : If any grid disks are listed in the returned output, then it is not safe to stop or re-start the CELLSRV process because proper Oracle ASM disk group redundancy will not be intact and will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.

If no grid disks are listed in the returned output, you can safely restart cellsrv or all services in step #2 below.

2.  Re-start the cell services using either of the following commands:

CellCLI> ALTER CELL RESTART SERVICES CELLSRV

CellCLI> ALTER CELL RESTART SERVICES ALL

BUT what is good news cell has self-defence on reduced redundancy, if you try to restart it when redundancy check is not satisfied you get:

CellCLI> ALTER CELL RESTART SERVICES ALL;

Stopping the RS, CELLSRV, and MS services…
The SHUTDOWN of ALL services was not successful.
CELL-01548: Unable to shut down CELLSRV because disk group DATA, RECO may be forced to dismount due to reduced redundancy.
Getting the state of CELLSRV services… running
Getting the state of MS services… running
Getting the state of RS services… running