RMAN backup on NFS v3 takes too much time

Problem:

I was trying to test RMAN backup on Azure blob storage with NFS v3 access. I have noticed a huge delay even when backing up a small controlfile.

The same backup on disk took 7sec, but on NFS 15min. While watching IO on NFS mountpoint using nfsiostat I saw that initial 13min there was no IO at all and actual backup was taken only at the last minute.

Another interesting thing while watching netstat output was the following:

[root@rac1 dbbackup2]# netstat -na|grep 10.0.0.
tcp 0 0 10.0.0.5:875 10.0.0.16:2048 ESTABLISHED
tcp 0 1 10.0.0.5:54997 10.0.0.16:2049 SYN_SENT

You can also turn debug on before running backup in RMAN and check the output:

RMAN> RUN
{
  ALLOCATE CHANNEL disk1 DEVICE TYPE DISK FORMAT '/dataload/%U';
  debug on;
  BACKUP current controlfile;
  debug off;
}

Solution:

Disable dNFS.

$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk dnfs_off

Retry RMAN backup, in my case it took only 1sec:

...
piece handle=/dbbackup/0q0rruhd_26_1_1 tag=TAG20220425T202309 comment=NONE
channel disk1: backup set complete, elapsed time: 00:00:01
Finished backup at 25-APR-22

RMAN spread a backup job between many RAC instances in parallel to increase throughput

There are two options to allocate RMAN channels on different RAC instances to increase the throughput.

I will start with the option, that assures all RAC instances get one channel. Regarding the other option, it does load balance but in a random way, so with a small number of channels, you may see that all of them are allocated in one instance. So you decide which option is better for you.

The test is done on a 2-node cluster.

  1. Configure parallelism and two channels in RMAN. Indicate a connect string, one per instance:
$ RMAN target /
RMAN> CONFIGURE DEVICE TYPE disk PARALLELISM 2;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL1 as sysdba';
RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL2 as sysdba';

2. Define ORCL1 and ORCL2 aliases on each database node under $ORACLE_HOME/network/admin/tnsnames.ora:

ORCL1=
(DESCRIPTION=
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac1.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )
 
ORCL2=
(DESCRIPTION=
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )

Please note, in my case 1522 is a local listener port.

3. Run backup:

RMAN> backup database;
 
Starting backup at 22-MAR-22
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=507 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=753 instance=orcl2 device type=DISK
channel ORA_DISK_2: SID=753 instance=orcl2 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/ORCL/DATAFILE/system.257.1098460673
input datafile file number=00004 name=+DATA/ORCL/DATAFILE/undotbs1.259.1098460743
channel ORA_DISK_1: starting piece 1 at 22-MAR-22
channel ORA_DISK_2: starting full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00003 name=+DATA/ORCL/DATAFILE/sysaux.258.1098460717
input datafile file number=00005 name=+DATA/ORCL/DATAFILE/undotbs2.265.1098461311
input datafile file number=00007 name=+DATA/ORCL/DATAFILE/users.260.1098460743

As you see, two files were backed up by the 1st channel (1st instance) and the other three files by the 2nd channel (2nd instance).

Now let’s explain another possible variant:

  1. Configure one TNS string with load balance parameter:
ORCL_BALANCE=
     (DESCRIPTION=
           (TRANSPORT_CONNECT_TIMEOUT=3) (RETRY_COUNT=6)(LOAD_BALANCE=on)
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac1.example.com) (PORT=1522))
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )

Both node addresses are defined and Oracle will pick each address randomly.

2. Configure parallelism and one channel with ORCL_BALANCE string:

Please note, I did this test case on the same server where I’ve already defined CHANNEL 1 and CHANNEL 2, so I had to clear them:

RMAN> CONFIGURE CHANNEL DEVICE TYPE DISK clear;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK clear;
RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK clear;

Define channel and parallelism:

RMAN> CONFIGURE DEVICE TYPE disk PARALLELISM 2;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL_BALANCE as sysdba';

RMAN> backup database;
 
Starting backup at 22-MAR-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=752 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=138 instance=orcl1 device type=DISK

Both channels were allocated on orcl1. Try one more time, or better configure parallelism 3, CHANNEL parameter is already defined and it is permanent until changed:

RMAN>  CONFIGURE DEVICE TYPE disk PARALLELISM 3;
 
RMAN> backup database;
 
Starting backup at 22-MAR-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=752 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=138 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=753 instance=orcl2 device type=DISK

As you see it really did a random choice. But two channels were allocated on the 1st node and the last one on the 2nd node.

I think here random algorithm is not a good option, but you better know which variant is appropriate in your case.

RMAN restore on ASM fails ORA-12547: TNS:lost contact, WARNING: ASMB0 exiting with error

Problem:

Our client was not able to restore controlfile on +FRA diskgroup:

RMAN> restore controlfile to '+FRA' from '<backup file location>';
....
RMAN-03002 failure of restore command at 2021/05/24 19:12:19
ORA-19870: error while restoring backup piece 
ORA-19504: failed to create file '+FRA'
ORA-17502: ksfdcre:4 Failed to create file +FRA
ORA-15001: diskgroup "FRA" does not exist or is not mounted
ORA-12547: TNS:lost contact
ORA-12547: TNS:lost contact

Reason:

oracle binary under GI home did not have correct permissions:

[grid@rac1 bin]$ ll oracle
-rwxr-x--x 1 grid oinstall 420332360 Oct 13  2020 oracle

Solution:

Change permissions to 6751 and retry the restore:

[grid@rac1 bin]$ chmod 6751 oracle

[grid@rac1 bin]$ ll oracle
-rwsr-s--x 1 grid oinstall 420332360 Oct 13  2020 oracle

ORA-01103: when creating a Standby Database on the same Host as the Primary Database

Typically the standby and the primary databases are located on the different hosts to ensure the full DR capabilities. However, there are some situations when you want to have the primary and standby database on the same Host.

Problem #1: You are not able to start two databases with the same SID on the same server.

Problem #2: You cannot change db_name, because it is used in the controlfile and if you try to duplicate the standby database from the primary using different db_name, you will get the following error:

ORA-01103: database name 'orcldgst' in control file is not 'orcldg'

Assume db_name=orcldg and ORACLE_SID for the primary is orcldg1. To solve problem #1 and problem #2, you need to the following steps:

db_name must be the same for both databases. But during startup nomount of the standby database, you need to set ORACLE_SID to the different value:

$ export ORACLE_SID=orcldgst1
$ sqlplus / as sysdba
SQL> startup nomount pfile='/tmp/mypfile.ora'

After that you will be able to run RMAN duplicate command to create the standby database on the same host as the primary.

RMAN: Displaying current backup progress

To check the progress of your current RMAN backup use this script:

#This is my favorite script

select recid
 , output_device_type
 , dbsize_mbytes
 , input_bytes/1024/1024 input_mbytes
 , output_bytes/1024/1024 output_mbytes
 , (output_bytes/input_bytes*100) compression
 , (mbytes_processed/dbsize_mbytes*100) complete
 , to_char(start_time + (sysdate-start_time)/(mbytes_processed/dbsize_mbytes),'DD-MON-YYYY HH24:MI:SS') est_complete
 from v$rman_status rs
 , (select sum(bytes)/1024/1024 dbsize_mbytes from v$datafile) 
 where status='RUNNING'
 and output_device_type is not null