Part 2: ora.storage fails to start, ORA-01017

Problem:

One of our customers changed ASM password file by mistake and regarding other actions, we are not sure. After node restart, they encountered ora.storage startup issue on the second node.

CRS-2672: Attempting to start 'ora.storage' on 'orcl02'
ORA-01017: invalid username/password; logon denied
CRS-5055: unable to connect to an ASM instance because no ASM instance is running in the cluster
CRS-2883: Resource 'ora.storage' failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-41053: checking Oracle Grid Infrastructure for file permission issues
CRS-4000: Command Start failed, or completed with errors.

I have followed my blog post to recover ASM passwordfile and add CRSUSER__ASM_001. The CRS started successfully on the first node but it still was not able to start on the second.

Reason:

When we checked password for CRSUSER__ASM_001 on both nodes, we got different results:

[grid@orcl01 ~]$ crsctl get credmaint -path ASM/Self/0b5330fe4bdf6f6ebffb09beab078d6e -credtype userpass -id 0 -attr passwd -local 
zSZDts1PQx8v7gRrdmH1EjIpSBsAt
[grid@orcl02 ~]$ crsctl get credmaint -path ASM/Self/0b5330fe4bdf6f6ebffb09beab078d6e -credtype userpass -id 0 -attr passwd -local 
rHgulYGfY17Uxbb9Tbd9VF3yr2Kvr

Which is not normal and they must be the same. This was the reason CRS was not able to start on the second node, because ASM passwordfile for CRSUSER__ASM_001 had value zSZDts1PQx8v7gRrdmH1EjIpSBsAt

Solution:

Verify and fix the credentials:

If you are not able to set up root ssh passwordless connectivity, you can run the following command as grid. Note in that case you will get “credfix: could not delete crs credentials for jxrucJl3”, this is because the command was not run as root and old credentials were not deleted. But new credentials are successfully created.

[grid@orcl01 ~]$ asmcmd --nocp credverify
credverify: More than one credential in password file, please run 'credfix' to fix the credentials.
‚Äč
[grid@orcl01 ~]$ asmcmd --nocp credfix
credfix: Credentials for JXRUCJL3 not in password file, trying next credential.
op=addcrscreds wrap=/tmp/creds0.xml
credfix: Creating new credentials, no valid credentials in OCR.
credfix: New user CRSUSER__ASM_004 created.
op=credimport wrap=/tmp/creds0.xml olr=true force=true
credfix: OLR for orcl01 has been fixed if credentials were created incorrectly.
credfix: Starting SSH session on node orcl02.
credfix: OLR for orcl02 has been fixed if credentials were created incorrectly. Exiting SSH session.
op=delcrscreds crs_user=jxrucJl3
ASMCMD-8202: internal error:
credfix: could not delete crs credentials for jxrucJl3

It is recommended to setup passwordless ssh connectivity for root user and then run credfix as root to have clean configuration without old entries:

[root@rac1 ~]# asmcmd --nocp credfix
..

Monitoring ASM disk performance using IOSTAT

iostat in asmcmd displays I/O statistics for Oracle ASM disks in mounted disk groups.

Connect to the database node via GI owner:

# su - grid

Run iostat with the following options (Reads & Writes are in bytes):

# asmcmd
ASMCMD> iostat -t -G FRA 5
Group_Name  Dsk_Name   Reads      Writes    Read_Time  Write_Time
FRA         RAC1$LUN3  585083392  98942464  94.659862  4.03044
FRA         RAC2$LUN3  1847296    98942464  .054822    4.134049
FRA         RACQ$LUN4  57344      24576     .035944    .018594

Group_Name  Dsk_Name   Reads      Writes  Read_Time  Write_Time
FRA         RAC1$LUN3  368640.00  0.00    0.01       0.00
FRA         RAC2$LUN3  0.00       0.00    0.00       0.00
FRA         RACQ$LUN4  0.00       0.00    0.00       0.00

Where
-t displays time statistics (Read_Time, Write_Time)
-G FRA displays statistics for the FRA diskgroup, change the diskgroup name according to your needs.
5 is a refresh interval. When the interval is specified then the value displayed (bytes or I/Os) is the difference between the previous and current values, not the total value. But if a refresh interval is not specified, the number displayed represents the total number of bytes or I/Os.

For synopsis and description about all available iostat options, run help:

ASMCMD> help iostat
iostat
        Displays I/O statistics for Oracle ASM disks in mounted disk groups.

Synopsis
        iostat [-et][--io] [--suppressheader] [--region] [-G <diskgroup>] [<interval>]

Description
        iostat lists disk group statistics using the V$ASM_DISK_STAT view.
        The options for the iostat command are described below.
        -e		- Displays error statistics (Read_Err, Write_Err).
        -G diskgroup	- Displays statistics for the disk group name.
        --suppressheader	- Suppresses column headings.
        --io		- Displays information in number of I/Os, instead
                          of bytes.
        -t		- Displays time statistics (Read_Time, Write_Time).
        --region	- Displays information for cold and hot disk regions
                          (Cold_Reads, Cold_Writes, Hot_Reads, Hot_Writes).
        interval	- Refreshes the statistics display based on the
                          interval value (seconds).
        The attribute descriptions for iostat command output are described
	below. To view the complete set of statistics for a disk group,
	use the V$ASM_DISK_STAT view.
        Group_Name	        Name of the disk group.
        Dsk_Name	        Name of the disk.
        Reads	        	Total number of bytes read from the disk.
				If the --io option is entered, then the value
				is displayed as number of I/Os.
        Writes	        	Total number of bytes written to the disk.
				If the --io option is entered, then the value
				is displayed as number of I/Os.
        Cold_Reads	        Total number of bytes read from the cold disk
				region. If the --io option is entered, then
				the value is displayed as number of I/Os.
        Writes	        	Total number of bytes written to the disk.
        Cold_Writes	        Total number of bytes written to the cold
				disk region. If the --io option is entered,
				then the value is displayed as number of I/Os.
        Hot_Reads	        Total number of bytes read from the hot
				disk region. If the --io option is entered,
				then the value is displayed as number of I/Os.
        Writes	        	Total number of bytes written to the disk.
        Cold_Writes	        Total number of bytes written to the cold
        Hot_Writes	        Total number of bytes written to the hot disk
				region. If the --io option is entered, then the
				value is displayed as number of I/Os.
        Read_Err	        Total number of failed I/O read requests for
				the disk.
        Write_Err	        Total number of failed I/O write requests for
				the disk.
        Read_Time	        Total I/O time (in seconds) for
				read requests for the disk if the
				TIMED_STATISTICS initialization parameter is
				set to TRUE (0 if set to FALSE).
        Write_Time	        Total I/O time (in seconds) for
				write requests for the disk if the
				TIMED_STATISTICS initialization parameter is
				set to TRUE (0 if set to FALSE).
        Writes	        	Total number of bytes written to the disk.
        Cold_Writes	        Total number of bytes written to the cold
        Hot_Writes	        Total number of bytes written to the hot disk
        If a refresh interval is not specified, the number displayed represents
        the total number of bytes or I/Os.  Ifa refresh interval is specified,
        then the value displayed (bytes or I/Os) is the difference between the
        previous and current values, not the total value.

Examples
        The following are examples of the iostat command. The first example
        displays disk I/O statistics for the data disk group in total number
        of bytes. The second example displays disk I/O statistics for the data
        disk group in total number of I/O operations.
        ASMCMD [+] > iostat -G data
        Group_Name  Dsk_Name   Reads       Writes
        DATA        DATA_0000  180488192   473707520
        DATA        DATA_0001  1089585152  469538816
        DATA        DATA_0002  191648256   489570304
        DATA        DATA_0003  175724032   424845824
        DATA        DATA_0004  183421952   781429248
        DATA        DATA_0005  1102540800  855269888
        DATA        DATA_0006  171290624   447662592
        DATA        DATA_0007  172281856   361337344
        DATA        DATA_0008  173225472   390840320
        DATA        DATA_0009  288497152   838680576
        DATA        DATA_0010  196657152   375764480
        DATA        DATA_0011  436420096   356003840
        ASMCMD [+] > iostat --io -G data
        Group_Name  Dsk_Name   Reads  Writes
        DATA        DATA_0000  2801   34918
        DATA        DATA_0001  58301  35700
        DATA        DATA_0002  3320   36345
        DATA        DATA_0003  2816   10629
        DATA        DATA_0004  2883   34850
        DATA        DATA_0005  59306  38097
        DATA        DATA_0006  2151   10129
        DATA        DATA_0007  2686   10376
        DATA        DATA_0008  2105   8955
        DATA        DATA_0009  9121   36713
        DATA        DATA_0010  3557   8596
        DATA        DATA_0011  17458  9269

asmcmd does not return error text for root user

Problem:

While running asmcmd commands via root user, error messages are not displayed. But if we run the same command via grid – it returns messages. In the following example, ORACLE_HOME is set to RDBMS home (instead of GI) and we are trying to list the file which does not exist:

[root@rac1 ~]# echo $ORACLE_HOME
/u01/app/oracle/product/19.3.0/dbhome_1

[root@rac1 ~]# echo $PATH
/u01/app/19.3.0/grid/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin

[root@rac1 ~]# asmcmd ls +demodg/orcl/tempfile/TEMP.263.10617958
<<<<<<Nothing is displayed here

When a file does not exist, ls command should display ASMCMD-8002: entry 'TEMP.263.10617958' does not exist in directory '+demodg/orcl/tempfile/' error. But in our example, it does not return anything.

In the following example, if we try to copy a file, we see a message about copying a file but actually, the file is not copied because grid user does not have permission under /u01:

[root@rac1 ~]# asmcmd cp +demodg/orcl/tempfile/TEMP.263.1061795851 /u01
copying +demodg/orcl/tempfile/TEMP.263.1061795851 -> /u01/TEMP.263.1061795851

If we run the same command via grid, we get understandable error message:

[grid@rac1 ~]$  asmcmd cp +demodg/orcl/tempfile/TEMP.263.1061795851 /u01
 ASMCMD-9463: operation failed due to lack of write permissions

Reason:

Environment variables are not set correctly. ORACLE_HOME should be pointing to GI home.

Solution:

[root@rac1 ~]# export ORACLE_HOME=/u01/app/19.3.0/grid

[root@rac1 ~]# asmcmd ls +demodg/orcl/tempfile/TEMP.263.10617958
ASMCMD-8002: entry 'TEMP.263.10617958' does not exist in directory '+demodg/orcl/tempfile/'