RMAN spread a backup job between many RAC instances in parallel to increase throughput

There are two options to allocate RMAN channels on different RAC instances to increase the throughput.

I will start with the option, that assures all RAC instances get one channel. Regarding the other option, it does load balance but in a random way, so with a small number of channels, you may see that all of them are allocated in one instance. So you decide which option is better for you.

The test is done on a 2-node cluster.

  1. Configure parallelism and two channels in RMAN. Indicate a connect string, one per instance:
$ RMAN target /
RMAN> CONFIGURE DEVICE TYPE disk PARALLELISM 2;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL1 as sysdba';
RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL2 as sysdba';

2. Define ORCL1 and ORCL2 aliases on each database node under $ORACLE_HOME/network/admin/tnsnames.ora:

ORCL1=
(DESCRIPTION=
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac1.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )
 
ORCL2=
(DESCRIPTION=
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )

Please note, in my case 1522 is a local listener port.

3. Run backup:

RMAN> backup database;
 
Starting backup at 22-MAR-22
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=507 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=753 instance=orcl2 device type=DISK
channel ORA_DISK_2: SID=753 instance=orcl2 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/ORCL/DATAFILE/system.257.1098460673
input datafile file number=00004 name=+DATA/ORCL/DATAFILE/undotbs1.259.1098460743
channel ORA_DISK_1: starting piece 1 at 22-MAR-22
channel ORA_DISK_2: starting full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00003 name=+DATA/ORCL/DATAFILE/sysaux.258.1098460717
input datafile file number=00005 name=+DATA/ORCL/DATAFILE/undotbs2.265.1098461311
input datafile file number=00007 name=+DATA/ORCL/DATAFILE/users.260.1098460743

As you see, two files were backed up by the 1st channel (1st instance) and the other three files by the 2nd channel (2nd instance).

Now let’s explain another possible variant:

  1. Configure one TNS string with load balance parameter:
ORCL_BALANCE=
     (DESCRIPTION=
           (TRANSPORT_CONNECT_TIMEOUT=3) (RETRY_COUNT=6)(LOAD_BALANCE=on)
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac1.example.com) (PORT=1522))
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )

Both node addresses are defined and Oracle will pick each address randomly.

2. Configure parallelism and one channel with ORCL_BALANCE string:

Please note, I did this test case on the same server where I’ve already defined CHANNEL 1 and CHANNEL 2, so I had to clear them:

RMAN> CONFIGURE CHANNEL DEVICE TYPE DISK clear;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK clear;
RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK clear;

Define channel and parallelism:

RMAN> CONFIGURE DEVICE TYPE disk PARALLELISM 2;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL_BALANCE as sysdba';

RMAN> backup database;
 
Starting backup at 22-MAR-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=752 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=138 instance=orcl1 device type=DISK

Both channels were allocated on orcl1. Try one more time, or better configure parallelism 3, CHANNEL parameter is already defined and it is permanent until changed:

RMAN>  CONFIGURE DEVICE TYPE disk PARALLELISM 3;
 
RMAN> backup database;
 
Starting backup at 22-MAR-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=752 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=138 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=753 instance=orcl2 device type=DISK

As you see it really did a random choice. But two channels were allocated on the 1st node and the last one on the 2nd node.

I think here random algorithm is not a good option, but you better know which variant is appropriate in your case.

Understanding LOAD_BALANCE parameter in TNSNAMES.ORA

This parameter can be entered under DESCRIPTION_LIST, DESCRIPTION or ADDRESS_LIST. After setting LOAD_BALANCE to a positive value, the list of addresses is processed in a random sequence.

Values: on, yes, true, off, no, false.

Default: ON for DESCRIPTION_LIST. Please note that for DESCRIPTION  and ADDRESS_LIST it is OFF by default.

Don’t be confused with the keyword balance. Oracle client does not know which database node is least loaded, this parameter only chooses addresses randomly there is no real balancing here.

The real balancing is a server-side task, when you connect to the SCAN listener it finds a least loaded node and redirects the connection to that node. LOAD_BALANCE=ON will help you to distribute the load between SCAN listeners but not evenly.

In the following test scenario, we will see how behaves client connection when using LOAD_BALANCE parameter.

Client side TNS:

  CLIENT_CON =
  (DESCRIPTION =
   (LOAD_BALANCE=ON)
    (TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
    (ADDRESS = (PROTOCOL = TCP)(HOST =10.10.10.10)(PORT = 1522))
    (ADDRESS = (PROTOCOL = TCP)(HOST =11.11.11.11)(PORT = 1522))
    (ADDRESS = (PROTOCOL = TCP)(HOST =12.12.12.12)(PORT = 1522))
    (ADDRESS = (PROTOCOL = TCP)(HOST =13.13.13.13)(PORT = 1522))
    (ADDRESS = (PROTOCOL = TCP)(HOST =14.14.14.14)(PORT = 1522))
    (ADDRESS = (PROTOCOL = TCP)(HOST =15.15.15.15)(PORT = 1522))
   (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orclgg)
    ))

In the above connection string, I have used two other parameters. If you are not faimiliar with them, please see a brief explanation bellow:

TRANSPORT_CONNECT_TIMEOUT:

“The TRANSPORT_CONNECT_TIMEOUT parameter specifies the time, in seconds, for a client to establish a TCP connection to the database server. The default value is 60 seconds.” For more information, click here

RETRY_COUNT:

“To specify the number of times an ADDRESS list is traversed before the connection attempt is terminated.” For more information, click here         

Enable client tracing by specifying the following parameters in client sqlnet.ora file:

TRACE_LEVEL_CLIENT = USER
TRACE_FILE_CLIENT = MY_SQLNET.TRC
TRACE_DIRECTORY_CLIENT = /SQLTRACE_FOLDER
TRACE_TIMESTAMP_CLIENT = on
TRACE_UNIQUE_CLIENT = ON
DIAG_ADR_ENABLED = OFF

Try the connection using the above TNS alias and analyze generated trace file.

[10-MAR-2019 15:38:27:271] nttbnd2addr: using host IP address: 13.13.13.13
[10-MAR-2019 15:38:30:274] nttbnd2addr: using host IP address: 14.14.14.14
[10-MAR-2019 15:38:33:276] nttbnd2addr: using host IP address: 11.11.11.11
[10-MAR-2019 15:38:36:279] nttbnd2addr: using host IP address: 15.15.15.15 <- instead of 12 it chose 15
[10-MAR-2019 15:38:39:282] nttbnd2addr: using host IP address: 12.12.12.12
[10-MAR-2019 15:38:42:286] nttbnd2addr: using host IP address: 10.10.10.10
[10-MAR-2019 15:38:45:289] nttbnd2addr: using host IP address: 15.15.15.15
[10-MAR-2019 15:38:48:292] nttbnd2addr: using host IP address: 14.14.14.14
[10-MAR-2019 15:38:51:294] nttbnd2addr: using host IP address: 13.13.13.13
[10-MAR-2019 15:38:54:297] nttbnd2addr: using host IP address: 12.12.12.12
[10-MAR-2019 15:38:57:298] nttbnd2addr: using host IP address: 10.10.10.10
[10-MAR-2019 15:39:00:299] nttbnd2addr: using host IP address: 11.11.11.11
[10-MAR-2019 15:39:03:302] nttbnd2addr: using host IP address: 11.11.11.11 <- Here it used the same address again
[10-MAR-2019 15:39:06:303] nttbnd2addr: using host IP address: 14.14.14.14
[10-MAR-2019 15:39:09:306] nttbnd2addr: using host IP address: 15.15.15.15
[10-MAR-2019 15:39:12:309] nttbnd2addr: using host IP address: 12.12.12.12
[10-MAR-2019 15:39:15:312] nttbnd2addr: using host IP address: 10.10.10.10
[10-MAR-2019 15:39:18:314] nttbnd2addr: using host IP address: 13.13.13.13
[10-MAR-2019 15:39:21:315] nttbnd2addr: using host IP address: 15.15.15.15
[10-MAR-2019 15:39:24:318] nttbnd2addr: using host IP address: 13.13.13.13

From the above output we can conclude that addresses were chosen randomly.

Let’s comment LOAD_BALANCE parameter in connection string or explicitly specify LOAD_BALANCE=OFF. Increase RETRY_COUNT until 5 to see a better picture.

[10-MAR-2019 15:53:08:108] nttbnd2addr: using host IP address: 10.10.10.10
[10-MAR-2019 15:53:11:109] nttbnd2addr: using host IP address: 11.11.11.11
[10-MAR-2019 15:53:14:110] nttbnd2addr: using host IP address: 12.12.12.12
[10-MAR-2019 15:53:17:111] nttbnd2addr: using host IP address: 13.13.13.13
[10-MAR-2019 15:53:20:112] nttbnd2addr: using host IP address: 14.14.14.14
[10-MAR-2019 15:53:23:114] nttbnd2addr: using host IP address: 15.15.15.15

[10-MAR-2019 15:53:26:117] nttbnd2addr: using host IP address: 10.10.10.10
[10-MAR-2019 15:53:29:120] nttbnd2addr: using host IP address: 11.11.11.11
[10-MAR-2019 15:53:32:123] nttbnd2addr: using host IP address: 12.12.12.12
[10-MAR-2019 15:53:35:124] nttbnd2addr: using host IP address: 13.13.13.13
[10-MAR-2019 15:53:38:127] nttbnd2addr: using host IP address: 14.14.14.14
[10-MAR-2019 15:53:41:131] nttbnd2addr: using host IP address: 15.15.15.15

[10-MAR-2019 15:53:44:132] nttbnd2addr: using host IP address: 10.10.10.10
[10-MAR-2019 15:53:47:135] nttbnd2addr: using host IP address: 11.11.11.11
[10-MAR-2019 15:53:50:139] nttbnd2addr: using host IP address: 12.12.12.12
[10-MAR-2019 15:53:53:142] nttbnd2addr: using host IP address: 13.13.13.13
[10-MAR-2019 15:53:56:144] nttbnd2addr: using host IP address: 14.14.14.14
[10-MAR-2019 15:53:59:147] nttbnd2addr: using host IP address: 15.15.15.15

[10-MAR-2019 15:54:02:150] nttbnd2addr: using host IP address: 10.10.10.10
[10-MAR-2019 15:54:05:153] nttbnd2addr: using host IP address: 11.11.11.11
[10-MAR-2019 15:54:08:156] nttbnd2addr: using host IP address: 12.12.12.12
[10-MAR-2019 15:54:11:159] nttbnd2addr: using host IP address: 13.13.13.13
[10-MAR-2019 15:54:14:160] nttbnd2addr: using host IP address: 14.14.14.14
[10-MAR-2019 15:54:17:161] nttbnd2addr: using host IP address: 15.15.15.15

[10-MAR-2019 15:54:20:164] nttbnd2addr: using host IP address: 10.10.10.10
[10-MAR-2019 15:54:23:165] nttbnd2addr: using host IP address: 11.11.11.11
[10-MAR-2019 15:54:26:167] nttbnd2addr: using host IP address: 12.12.12.12
[10-MAR-2019 15:54:29:170] nttbnd2addr: using host IP address: 13.13.13.13
[10-MAR-2019 15:54:32:171] nttbnd2addr: using host IP address: 14.14.14.14
[10-MAR-2019 15:54:35:174] nttbnd2addr: using host IP address: 15.15.15.15

[10-MAR-2019 15:54:38:175] nttbnd2addr: using host IP address: 10.10.10.10
[10-MAR-2019 15:54:41:178] nttbnd2addr: using host IP address: 11.11.11.11
[10-MAR-2019 15:54:44:182] nttbnd2addr: using host IP address: 12.12.12.12
[10-MAR-2019 15:54:47:184] nttbnd2addr: using host IP address: 13.13.13.13
[10-MAR-2019 15:54:50:187] nttbnd2addr: using host IP address: 14.14.14.14
[10-MAR-2019 15:54:53:190] nttbnd2addr: using host IP address: 15.15.15.15

If you turn off LOAD_BALANCE then addresses are chosen using round-robin until one succeeds or until (RETRY_COUNT * #_of_addresses).