DBT-06103 The port (1,521) is already used

Thanks Tornike Kupatadze for this testing case!

Problem:

During my OCA class, after successful 19c database software installation, we were creating a database using dbca and got the following error:

If the listener had already been configured we would have had an error DBT-06103 The port (5,500) is already used while configuring EM express. But still, the solution is the same.

We have checked and the port was not used:

# netstat -a |grep 1521

Reason:

The hostname is not reachable. The reason in our case was that /etc/hosts did not contain entries about this server.

Solution:

Qualify the hostname into the /etc/hosts:

After adding the above entry, we were able to continue.

ORA-15120: ASM file name ‘ORA-27090: Unable to reserve kernel resources f’ does not begin with the ASM prefix character

Problem:

The customer created 36 databases on the same server and while creating the 37th using dbca got the following error:

Reason:

fs.aio-max-nr value was set too low in /etc/sysctl.conf. In general, value 3145728 that was set in their case, suits many environments, but if the number of databases on the server increases then this parameter should be adjusted accordingly.

Solution:

The formula used while calculating the value for this parameter is the following:

aio-max-nr = no of process per DB * no of databases * 4096

In their case, the number of processes per DB was 1000, the number of databases that planned to be created was 80. Based on the above value should be:

aio-max-nr = 327680000
  1. Add/update value in /etc/sysctl.conf:
# vim /etc/sysctl.conf

fs.aio-max-nr = 327680000


2. Run /sbin/sysctl -p to immediately enforce the changes:

# sysctl -p 

Delete already created files and recreate the database, it will succeed this time.

RMAN spread a backup job between many RAC instances in parallel to increase throughput

There are two options to allocate RMAN channels on different RAC instances to increase the throughput.

I will start with the option, that assures all RAC instances get one channel. Regarding the other option, it does load balance but in a random way, so with a small number of channels, you may see that all of them are allocated in one instance. So you decide which option is better for you.

The test is done on a 2-node cluster.

  1. Configure parallelism and two channels in RMAN. Indicate a connect string, one per instance:
$ RMAN target /
RMAN> CONFIGURE DEVICE TYPE disk PARALLELISM 2;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL1 as sysdba';
RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL2 as sysdba';

2. Define ORCL1 and ORCL2 aliases on each database node under $ORACLE_HOME/network/admin/tnsnames.ora:

ORCL1=
(DESCRIPTION=
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac1.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )
 
ORCL2=
(DESCRIPTION=
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )

Please note, in my case 1522 is a local listener port.

3. Run backup:

RMAN> backup database;
 
Starting backup at 22-MAR-22
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=507 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=753 instance=orcl2 device type=DISK
channel ORA_DISK_2: SID=753 instance=orcl2 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/ORCL/DATAFILE/system.257.1098460673
input datafile file number=00004 name=+DATA/ORCL/DATAFILE/undotbs1.259.1098460743
channel ORA_DISK_1: starting piece 1 at 22-MAR-22
channel ORA_DISK_2: starting full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00003 name=+DATA/ORCL/DATAFILE/sysaux.258.1098460717
input datafile file number=00005 name=+DATA/ORCL/DATAFILE/undotbs2.265.1098461311
input datafile file number=00007 name=+DATA/ORCL/DATAFILE/users.260.1098460743

As you see, two files were backed up by the 1st channel (1st instance) and the other three files by the 2nd channel (2nd instance).

Now let’s explain another possible variant:

  1. Configure one TNS string with load balance parameter:
ORCL_BALANCE=
     (DESCRIPTION=
           (TRANSPORT_CONNECT_TIMEOUT=3) (RETRY_COUNT=6)(LOAD_BALANCE=on)
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac1.example.com) (PORT=1522))
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )

Both node addresses are defined and Oracle will pick each address randomly.

2. Configure parallelism and one channel with ORCL_BALANCE string:

Please note, I did this test case on the same server where I’ve already defined CHANNEL 1 and CHANNEL 2, so I had to clear them:

RMAN> CONFIGURE CHANNEL DEVICE TYPE DISK clear;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK clear;
RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK clear;

Define channel and parallelism:

RMAN> CONFIGURE DEVICE TYPE disk PARALLELISM 2;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL_BALANCE as sysdba';

RMAN> backup database;
 
Starting backup at 22-MAR-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=752 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=138 instance=orcl1 device type=DISK

Both channels were allocated on orcl1. Try one more time, or better configure parallelism 3, CHANNEL parameter is already defined and it is permanent until changed:

RMAN>  CONFIGURE DEVICE TYPE disk PARALLELISM 3;
 
RMAN> backup database;
 
Starting backup at 22-MAR-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=752 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=138 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=753 instance=orcl2 device type=DISK

As you see it really did a random choice. But two channels were allocated on the 1st node and the last one on the 2nd node.

I think here random algorithm is not a good option, but you better know which variant is appropriate in your case.

Change default kernel using grubby Tool

There are several ways to fulfill the same task, I am providing one of them.

  1. Check the information about currently loaded kernel:
# uname -r
5.4.17-2036.101.2.el7uek.x86_64

2. Find all available kernels in your system and locate their index number:

# grubby --info=ALL
index=0
kernel=/boot/vmlinuz-5.4.17-2036.101.2.el7uek.x86_64
args="ro console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 numa=off transparent_hugepage=never net.ifnames=0"
root=/dev/mapper/rootvg-rootlv
initrd=/boot/initramfs-5.4.17-2036.101.2.el7uek.x86_64.img
title=Oracle Linux Server 7.9, with Unbreakable Enterprise Kernel 5.4.17-2036.101.2.el7uek.x86_64

index=1
kernel=/boot/vmlinuz-3.10.0-1160.42.2.el7.x86_64
args="ro console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 numa=off transparent_hugepage=never net.ifnames=0"
root=/dev/mapper/rootvg-rootlv
initrd=/boot/initramfs-3.10.0-1160.42.2.el7.x86_64.img
title=Oracle Linux Server 7.9, with Linux 3.10.0-1160.42.2.el7.x86_64

index=2
kernel=/boot/vmlinuz-0-rescue-d3dd3af16fd242cebb997c6041d68ad3
args="ro console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 numa=off transparent_hugepage=never net.ifnames=0"
root=/dev/mapper/rootvg-rootlv
initrd=/boot/initramfs-0-rescue-d3dd3af16fd242cebb997c6041d68ad3.img

3. Check currently loaded kernel index using grubby tool (actually, we could find the same from 1st and 2nd steps, but let’s do one more time):

# grubby --default-index
0

4. Change the default kernel, in my case I want to set it to vmlinuz-3.10.0-1160.42.2.el7.x86_64 and it’s index number is 1:

# grubby --set-default 1

5. Reboot the system and check the kernel again:

# reboot
# uname -r
3.10.0-1160.42.2.el7.x86_64

OPATCHAUTO-72050: System instance creation failed, cluvfy fails Verifying ‘/tmp/’ …FAILED (PRVF-7546)

Problem:

While running cluvfy we get the following error:

[grid@rac1 grid]$ ./runcluvfy.sh stage -pre crsinst -n rac1,rac2 -verbose
...
Failures were encountered during execution of CVU verification request "stage -pre crsinst".
...
Verifying '/tmp/' ...FAILED
Cannot run program "/usr/bin/scp": error=13, Permission denied

You should not continue patching when runcluvfy fails, strongly recommended to solve all failed items and only after that run opatchauto. But if you do insist to run patching then you will get the following:

[root@rac1 33509923]# /u01/app/19.3.0/grid/OPatch/opatchauto apply /u01/app/patchinstall/33509923 -oh /u01/app/19.3.0/grid
OPatchauto session is initiated at Sun Mar 20 06:27:17 2022

System initialization log file is /u01/app/19.3.0/grid/cfgtoollogs/opatchautodb/systemconfig2022-03-20_06-27-42AM.log.
...
OPATCHAUTO-72050: System instance creation failed.
OPATCHAUTO-72050: Failed while retrieving system information.
OPATCHAUTO-72050: Please check log file for more details.

Reason:

The issue in our case was that /usr/bin/scp did not have correct permissions. It had 600 while it should have had 755. Why this happened? Don’t really know… it should not be happening.

Solution:

Set correct permissions on both nodes for scp binary using the following way:

# chmod 755 /bin/scp

Retry patching, in our case it was successful:

ora.storage fails, Error 4 querying length of attr ASM_DISCOVERY_ADDRESS, ORA-01017

Problem:

CRS on the 1st node is able to start, but not on the 2nd node.

CRS on the 2nd node hangs and later fails:

CRS-2672: Attempting to start 'ora.storage' on 'rac2'
ORA-01017: invalid username/password; logon denied
CRS-5055: unable to connect to an ASM instance because no ASM instance is running in the cluster

during that time CRS alert.log shows:

2022-03-15 20:15:23.722 [ORAROOTAGENT(63477)]CRS-5019: All OCR locations are on ASM disk groups [GRID], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/rac2/crs/trace/ohasd_orarootagent_root.trc".

ohasd_orarootagent_root.trc shows:

2022-03-15 20:23:35.108 : USRTHRD:1769867008: [     INFO] {0:5:3} [ora.storage] 9788 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2022-03-15 20:23:35.110 : USRTHRD:1769867008: [     INFO] {0:5:3} [ora.storage] 9788 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS

2022-03-15 20:23:35.136 : USRTHRD:1769867008: [     INFO] {0:5:3} [ora.storage] 9506 Error 4 opening dom root in 0x7fa3100013a0

Reason:

Either the password file is corrupted or it does not exist. In our case, GRID diskgroup was created after clearing disk headers and forgot to copy ASM password file.

Solution:

  1. If you have ASM password file backup, then you can simply place it to the asm diskgroup:
$ asmcmd pwcopy --asm /tmp/asm_passwordfile +GRID/orapwASM -f

and stop/start CRS.

2. If you don’t have password file backup, you need to create a new one and add necessary users into it:

[grid@rac1 ~]$ asmcmd pwcreate --asm +GRID/orapwasm -f
Enter password: **********

Check existing users:

[grid@rac1 ~]$ asmcmd lspwusr
Username sysdba sysoper sysasm
SYS TRUE TRUE FALSE

Add necessary users and grant permissions:

$ asmcmd orapwusr --grant sysasm SYS
$ asmcmd orapwusr --add ASMSNMP
Enter password: *********
$ asmcmd orapwusr --grant sysdba ASMSNMP

Check permissions again:

$ asmcmd lspwusr
Username sysdba sysoper sysasm
     SYS   TRUE    TRUE   TRUE
 ASMSNMP   TRUE   FALSE  FALSE

Find out the user name and password for CRSD to connect, GI uses internal user CRSUSER__ASM_001 with an internally generated password to access ASM during startup:

Find the string SYSTEM.ASM.CREDENTIALS.USERS.CRSUSER__ASM_001 in the following output and save. ORATEXT value:

# ocrdump -stdout | less
...
[SYSTEM.ASM.CREDENTIALS.USERS.CRSUSER__ASM_001]
ORATEXT : d68aec9585136fa8ff8f79f483e4ae64:grid
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_NONE, USER_NAME : grid, GROUP_NAME : oinstall}

Query password for GUID-user. GUID will be different in your case. Retrieve value from your output:

# crsctl get credmaint -path /ASM/Self/d68aec9585136fa8ff8f79f483e4ae64 -credtype userpass -id 0 -attr passwd -local
mB28wSM4AVFAVEYamUIvrMjEo2Nfa

Add this user to ASM password file:

$ asmcmd orapwusr --add CRSUSER__ASM_001
>>>>> provide <password> you retrieved earlier

Add necessary credentials to this user:

$ asmcmd orapwusr --grant sysdba CRSUSER__ASM_001
$ asmcmd orapwusr --grant sysasm CRSUSER__ASM_001

Check the list again:

$ asmcmd lspwusr
        Username sysdba sysoper sysasm
             SYS   TRUE    TRUE   TRUE
         ASMSNMP   TRUE   FALSE  FALSE
CRSUSER__ASM_001   TRUE   FALSE   TRUE

Stop/Start CRS on the remaining node.