Part 2: ora.storage fails to start, ORA-01017

Problem:

One of our customers changed ASM password file by mistake and regarding other actions, we are not sure. After node restart, they encountered ora.storage startup issue on the second node.

CRS-2672: Attempting to start 'ora.storage' on 'orcl02'
ORA-01017: invalid username/password; logon denied
CRS-5055: unable to connect to an ASM instance because no ASM instance is running in the cluster
CRS-2883: Resource 'ora.storage' failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-41053: checking Oracle Grid Infrastructure for file permission issues
CRS-4000: Command Start failed, or completed with errors.

I have followed my blog post to recover ASM passwordfile and add CRSUSER__ASM_001. The CRS started successfully on the first node but it still was not able to start on the second.

Reason:

When we checked password for CRSUSER__ASM_001 on both nodes, we got different results:

[grid@orcl01 ~]$ crsctl get credmaint -path ASM/Self/0b5330fe4bdf6f6ebffb09beab078d6e -credtype userpass -id 0 -attr passwd -local 
zSZDts1PQx8v7gRrdmH1EjIpSBsAt
[grid@orcl02 ~]$ crsctl get credmaint -path ASM/Self/0b5330fe4bdf6f6ebffb09beab078d6e -credtype userpass -id 0 -attr passwd -local 
rHgulYGfY17Uxbb9Tbd9VF3yr2Kvr

Which is not normal and they must be the same. This was the reason CRS was not able to start on the second node, because ASM passwordfile for CRSUSER__ASM_001 had value zSZDts1PQx8v7gRrdmH1EjIpSBsAt

Solution:

Verify and fix the credentials:

If you are not able to set up root ssh passwordless connectivity, you can run the following command as grid. Note in that case you will get “credfix: could not delete crs credentials for jxrucJl3”, this is because the command was not run as root and old credentials were not deleted. But new credentials are successfully created.

[grid@orcl01 ~]$ asmcmd --nocp credverify
credverify: More than one credential in password file, please run 'credfix' to fix the credentials.
​
[grid@orcl01 ~]$ asmcmd --nocp credfix
credfix: Credentials for JXRUCJL3 not in password file, trying next credential.
op=addcrscreds wrap=/tmp/creds0.xml
credfix: Creating new credentials, no valid credentials in OCR.
credfix: New user CRSUSER__ASM_004 created.
op=credimport wrap=/tmp/creds0.xml olr=true force=true
credfix: OLR for orcl01 has been fixed if credentials were created incorrectly.
credfix: Starting SSH session on node orcl02.
credfix: OLR for orcl02 has been fixed if credentials were created incorrectly. Exiting SSH session.
op=delcrscreds crs_user=jxrucJl3
ASMCMD-8202: internal error:
credfix: could not delete crs credentials for jxrucJl3

It is recommended to setup passwordless ssh connectivity for root user and then run credfix as root to have clean configuration without old entries:

[root@rac1 ~]# asmcmd --nocp credfix
..

ora.storage fails, Error 4 querying length of attr ASM_DISCOVERY_ADDRESS, ORA-01017

Problem:

CRS on the 1st node is able to start, but not on the 2nd node.

CRS on the 2nd node hangs and later fails:

CRS-2672: Attempting to start 'ora.storage' on 'rac2'
ORA-01017: invalid username/password; logon denied
CRS-5055: unable to connect to an ASM instance because no ASM instance is running in the cluster

during that time CRS alert.log shows:

2022-03-15 20:15:23.722 [ORAROOTAGENT(63477)]CRS-5019: All OCR locations are on ASM disk groups [GRID], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/rac2/crs/trace/ohasd_orarootagent_root.trc".

ohasd_orarootagent_root.trc shows:

2022-03-15 20:23:35.108 : USRTHRD:1769867008: [     INFO] {0:5:3} [ora.storage] 9788 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2022-03-15 20:23:35.110 : USRTHRD:1769867008: [     INFO] {0:5:3} [ora.storage] 9788 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS

2022-03-15 20:23:35.136 : USRTHRD:1769867008: [     INFO] {0:5:3} [ora.storage] 9506 Error 4 opening dom root in 0x7fa3100013a0

Reason:

Either the password file is corrupted or it does not exist. In our case, GRID diskgroup was created after clearing disk headers and forgot to copy ASM password file.

Solution:

  1. If you have ASM password file backup, then you can simply place it to the asm diskgroup:
$ asmcmd pwcopy --asm /tmp/asm_passwordfile +GRID/orapwASM -f

and stop/start CRS.

2. If you don’t have password file backup, you need to create a new one and add necessary users into it:

[grid@rac1 ~]$ asmcmd pwcreate --asm +GRID/orapwasm -f
Enter password: **********

Check existing users:

[grid@rac1 ~]$ asmcmd lspwusr
Username sysdba sysoper sysasm
SYS TRUE TRUE FALSE

Add necessary users and grant permissions:

$ asmcmd orapwusr --grant sysasm SYS
$ asmcmd orapwusr --add ASMSNMP
Enter password: *********
$ asmcmd orapwusr --grant sysdba ASMSNMP

Check permissions again:

$ asmcmd lspwusr
Username sysdba sysoper sysasm
     SYS   TRUE    TRUE   TRUE
 ASMSNMP   TRUE   FALSE  FALSE

Find out the user name and password for CRSD to connect, GI uses internal user CRSUSER__ASM_001 with an internally generated password to access ASM during startup:

Find the string SYSTEM.ASM.CREDENTIALS.USERS.CRSUSER__ASM_001 in the following output and save. ORATEXT value:

# ocrdump -stdout | less
...
[SYSTEM.ASM.CREDENTIALS.USERS.CRSUSER__ASM_001]
ORATEXT : d68aec9585136fa8ff8f79f483e4ae64:grid
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_NONE, USER_NAME : grid, GROUP_NAME : oinstall}

Query password for GUID-user. GUID will be different in your case. Retrieve value from your output:

# crsctl get credmaint -path /ASM/Self/d68aec9585136fa8ff8f79f483e4ae64 -credtype userpass -id 0 -attr passwd -local
mB28wSM4AVFAVEYamUIvrMjEo2Nfa

Add this user to ASM password file:

$ asmcmd orapwusr --add CRSUSER__ASM_001
>>>>> provide <password> you retrieved earlier

Add necessary credentials to this user:

$ asmcmd orapwusr --grant sysdba CRSUSER__ASM_001
$ asmcmd orapwusr --grant sysasm CRSUSER__ASM_001

Check the list again:

$ asmcmd lspwusr
        Username sysdba sysoper sysasm
             SYS   TRUE    TRUE   TRUE
         ASMSNMP   TRUE   FALSE  FALSE
CRSUSER__ASM_001   TRUE   FALSE   TRUE

Stop/Start CRS on the remaining node.

RAC: root.sh | CRS-2672: Attempting to start ‘ora.storage’ | ORA-01017: invalid username/password

I was configuring clusterware on node1 and got the following error:

CRS-2672: Attempting to start ‘ora.storage’ on ‘node1’
ORA-01017: invalid username/password; logon denied
CRS-5017: The resource action “ora.storage start” encountered the following error:
Storage agent start action aborted. For details refer to “(:CLSN00107:)” in “/u01/app/oracle/diag/crs/node1/crs/trace/ohasd_orarootagent_root.trc”.
CRS-2883: Resource ‘ora.storage’ failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-4000: Command Start failed, or completed with errors.
2016/09/27 05:41:01 CLSRSC-117: Failed to start Oracle Clusterware stack

Died at /u01/app/12.1.0.2/grid/crs/install/crsinstall.pm line 930.
The command ‘/u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/rootcrs.pl ‘ exe ution failed

 

/u01/app/oracle/diag/crs/node1/crs/trace/ohasd_orarootagent_root.trc file says:

2016-09-27 05:40:56.787330*:kgfn.c@6018: kgfnConnect2Int: sysasm=0 envflags=0x10 srvrflags=0x3 unam=NULL password is NULL pstr=_ocr
2016-09-27 05:40:56.787330*:kgfn.c@6194: kgfnConnect2Int: cstr=(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/12.1.0.2/grid/bin/oracle)(ARGV0=oracle+ASM1_ocr)(ENVS=’ORACLE_HOME=/u01/app/12.1.0.2/grid,ORACLE_SID=+ASM1′)(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))’)(PRIVS=(USER=root)(GROUP=root)))(enable=setuser))
2016-09-27 05:40:57.273302 : AGENT:2583111424: {0:9:3} {0:9:3} Created alert : (:CRSAGF00113:) : Aborting the command: start for resource: ora.storage 1 1

 

So why user root???

See, when I connect using root I got ORA-01017

[root@node1 ~]# . oraenv
ORACLE_SID = [+ASM1] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[root@node1 ~]# sqlplus / as sysasm

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 27 05:59:01 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied

If I connect through Oracle it is OK:

su – oracle

[oracle@node1 ~]$ . oraenv
ORACLE_SID = [LBTCI1] ? +ASM1

[oracle@node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 27 05:59:45 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL>

 

Look the connection string again there is “PROGRAM=/u01/app/12.1.0.2/grid/bin/oracle”, so let’s check file permissions.

[oracle@node1 ~]$ ll /u01/app/12.1.0.2/grid/bin/oracle
-rwsr-s–x 1 root root 295054213 Sep 27 05:26 /u01/app/12.1.0.2/grid/bin/oracle

It must be oracle:oinstall  not root:root 

chown oracle:oinstall /u01/app/12.1.0.2/grid/bin/oracle
chmod 6751 /u01/app/12.1.0.2/grid/bin/oracle

 

deconfigure(rootcrs.pl -deconfig  -verbose) crs and reconfigure(run root.sh) it again.