RMAN backup on NFS v3 takes too much time

Problem:

I was trying to test RMAN backup on Azure blob storage with NFS v3 access. I have noticed a huge delay even when backing up a small controlfile.

The same backup on disk took 7sec, but on NFS 15min. While watching IO on NFS mountpoint using nfsiostat I saw that initial 13min there was no IO at all and actual backup was taken only at the last minute.

Another interesting thing while watching netstat output was the following:

[root@rac1 dbbackup2]# netstat -na|grep 10.0.0.
tcp 0 0 10.0.0.5:875 10.0.0.16:2048 ESTABLISHED
tcp 0 1 10.0.0.5:54997 10.0.0.16:2049 SYN_SENT

You can also turn debug on before running backup in RMAN and check the output:

RMAN> RUN
{
  ALLOCATE CHANNEL disk1 DEVICE TYPE DISK FORMAT '/dataload/%U';
  debug on;
  BACKUP current controlfile;
  debug off;
}

Solution:

Disable dNFS.

$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk dnfs_off

Retry RMAN backup, in my case it took only 1sec:

...
piece handle=/dbbackup/0q0rruhd_26_1_1 tag=TAG20220425T202309 comment=NONE
channel disk1: backup set complete, elapsed time: 00:00:01
Finished backup at 25-APR-22

Monitor NFS mount point IO performance

Problem:

I am configuring RMAN backup of my Oracle database and redirecting backup sets to Azure Blob Storage with NFSv3 access (quite a new feature for this time and needs to be tested). But I don’t know what will be the writing performance for this type of storage.

Solution:

One of the useful tools is nfsiostat, we will test it in this blog post.

1. After mounting Azure Blob Storage to my database node as /dbbackup mount point, instead of triggering RMAN backup I’d prefer dd command at this time:

$  dd if=/dev/zero of=/dbbackup/myfile oflag=direct bs=1M count=512000

2. Run nfsiostat with interval 1sec and monitor values:

[oracle@rac1 data]$ nfsiostat 1

marirmanstorage.blob.core.windows.net:/marirmanstorage/dbbackup mounted on /dbbackup:

   op/s	   rpc bklog
   0.11	   0.00
read:      ops/s    kB/s      kB/op.    retrans	 avg RTT (ms)	avg exe (ms)
	   0.000    0.000     0.000     0 (0.0%) 0.000	        0.000
write:     ops/s    kB/s      kB/op     retrans	 avg RTT (ms)	avg exe (ms)
	   18.000   18437.977 1024.332  0 (0.0%) 55.500	        55.611

For information, interval specifies the amount of time in seconds between each report. The first report contains statistics for the time since each file system was mounted. Each subsequent report contains statistics collected during the interval since the previous report

3. Cancel dd operation, you will also get information about the speed :

[oracle@rac1 dbbackup]$  dd if=/dev/zero of=/dbbackup/myfile oflag=direct bs=1M count=512000
^C
124+0 records in
124+0 records out
130023424 bytes (130 MB) copied, 6.85939 s, 19.0 MB/s

More information about nfsiostat can of course be found using man nfsiostat.

Database Express Setup: This site can not be reached

Note: These steps are for TEST non-production databases only.

Configuring/enabling EM express on HTTPS should be simple, by running only:

SQL> exec DBMS_XDB_CONFIG.SETHTTPSPORT(5500);

But after configuring it you may still not be able to access the web page and show the error “This site can’t be reached”

The reason for this can be permissions on the wallet files:

[grid@rac1 ~]$ ll /u01/app/oracle/product/19.3.0/dbhome_1/admin/orcl/xdb_wallet
total 8
-rw------- 1 oracle asmadmin 3864 Apr 15 17:40 cwallet.sso
-rw------- 1 oracle asmadmin 3819 Apr 15 17:40 ewallet.p12

The correct permission is 600, but when database is on top of ASM with role separation, then group should also have read permission on these files:

[root@rac1 ~]# chmod 640 /u01/app/oracle/product/19.3.0/dbhome_1/admin/orcl/xdb_wallet/*

After changing it, the web page is displayed.

Install Google Chrome on Linux 7.9 using terminal

There are several ways to do that, I found the simplest (I hope so) and want to share it with you:

0. Create repo file:

# vi /etc/yum.repos.d/google-chrome.repo

[google-chrome]
name=google-chrome
baseurl=https://dl.google.com/linux/chrome/rpm/stable/x86_64
enabled=1
gpgcheck=1
gpgkey=https://dl.google.com/linux/linux_signing_key.pub

1. Enable repo ol7_optional_latest for vulkan dependency:

# yum-config-manager --enable ol7_optional_latest

2. Install google-chrome-stable package:

# yum install google-chrome-stable -y

3. Run:

$ google-chrome

Or in the background:

$ google-chrome &

The window will come up in VNC or X Window whichever you’ve configured before.

Monitoring ASM disk performance using IOSTAT

iostat in asmcmd displays I/O statistics for Oracle ASM disks in mounted disk groups.

Connect to the database node via GI owner:

# su - grid

Run iostat with the following options (Reads & Writes are in bytes):

# asmcmd
ASMCMD> iostat -t -G FRA 5
Group_Name  Dsk_Name   Reads      Writes    Read_Time  Write_Time
FRA         RAC1$LUN3  585083392  98942464  94.659862  4.03044
FRA         RAC2$LUN3  1847296    98942464  .054822    4.134049
FRA         RACQ$LUN4  57344      24576     .035944    .018594

Group_Name  Dsk_Name   Reads      Writes  Read_Time  Write_Time
FRA         RAC1$LUN3  368640.00  0.00    0.01       0.00
FRA         RAC2$LUN3  0.00       0.00    0.00       0.00
FRA         RACQ$LUN4  0.00       0.00    0.00       0.00

Where
-t displays time statistics (Read_Time, Write_Time)
-G FRA displays statistics for the FRA diskgroup, change the diskgroup name according to your needs.
5 is a refresh interval. When the interval is specified then the value displayed (bytes or I/Os) is the difference between the previous and current values, not the total value. But if a refresh interval is not specified, the number displayed represents the total number of bytes or I/Os.

For synopsis and description about all available iostat options, run help:

ASMCMD> help iostat
iostat
        Displays I/O statistics for Oracle ASM disks in mounted disk groups.

Synopsis
        iostat [-et][--io] [--suppressheader] [--region] [-G <diskgroup>] [<interval>]

Description
        iostat lists disk group statistics using the V$ASM_DISK_STAT view.
        The options for the iostat command are described below.
        -e		- Displays error statistics (Read_Err, Write_Err).
        -G diskgroup	- Displays statistics for the disk group name.
        --suppressheader	- Suppresses column headings.
        --io		- Displays information in number of I/Os, instead
                          of bytes.
        -t		- Displays time statistics (Read_Time, Write_Time).
        --region	- Displays information for cold and hot disk regions
                          (Cold_Reads, Cold_Writes, Hot_Reads, Hot_Writes).
        interval	- Refreshes the statistics display based on the
                          interval value (seconds).
        The attribute descriptions for iostat command output are described
	below. To view the complete set of statistics for a disk group,
	use the V$ASM_DISK_STAT view.
        Group_Name	        Name of the disk group.
        Dsk_Name	        Name of the disk.
        Reads	        	Total number of bytes read from the disk.
				If the --io option is entered, then the value
				is displayed as number of I/Os.
        Writes	        	Total number of bytes written to the disk.
				If the --io option is entered, then the value
				is displayed as number of I/Os.
        Cold_Reads	        Total number of bytes read from the cold disk
				region. If the --io option is entered, then
				the value is displayed as number of I/Os.
        Writes	        	Total number of bytes written to the disk.
        Cold_Writes	        Total number of bytes written to the cold
				disk region. If the --io option is entered,
				then the value is displayed as number of I/Os.
        Hot_Reads	        Total number of bytes read from the hot
				disk region. If the --io option is entered,
				then the value is displayed as number of I/Os.
        Writes	        	Total number of bytes written to the disk.
        Cold_Writes	        Total number of bytes written to the cold
        Hot_Writes	        Total number of bytes written to the hot disk
				region. If the --io option is entered, then the
				value is displayed as number of I/Os.
        Read_Err	        Total number of failed I/O read requests for
				the disk.
        Write_Err	        Total number of failed I/O write requests for
				the disk.
        Read_Time	        Total I/O time (in seconds) for
				read requests for the disk if the
				TIMED_STATISTICS initialization parameter is
				set to TRUE (0 if set to FALSE).
        Write_Time	        Total I/O time (in seconds) for
				write requests for the disk if the
				TIMED_STATISTICS initialization parameter is
				set to TRUE (0 if set to FALSE).
        Writes	        	Total number of bytes written to the disk.
        Cold_Writes	        Total number of bytes written to the cold
        Hot_Writes	        Total number of bytes written to the hot disk
        If a refresh interval is not specified, the number displayed represents
        the total number of bytes or I/Os.  Ifa refresh interval is specified,
        then the value displayed (bytes or I/Os) is the difference between the
        previous and current values, not the total value.

Examples
        The following are examples of the iostat command. The first example
        displays disk I/O statistics for the data disk group in total number
        of bytes. The second example displays disk I/O statistics for the data
        disk group in total number of I/O operations.
        ASMCMD [+] > iostat -G data
        Group_Name  Dsk_Name   Reads       Writes
        DATA        DATA_0000  180488192   473707520
        DATA        DATA_0001  1089585152  469538816
        DATA        DATA_0002  191648256   489570304
        DATA        DATA_0003  175724032   424845824
        DATA        DATA_0004  183421952   781429248
        DATA        DATA_0005  1102540800  855269888
        DATA        DATA_0006  171290624   447662592
        DATA        DATA_0007  172281856   361337344
        DATA        DATA_0008  173225472   390840320
        DATA        DATA_0009  288497152   838680576
        DATA        DATA_0010  196657152   375764480
        DATA        DATA_0011  436420096   356003840
        ASMCMD [+] > iostat --io -G data
        Group_Name  Dsk_Name   Reads  Writes
        DATA        DATA_0000  2801   34918
        DATA        DATA_0001  58301  35700
        DATA        DATA_0002  3320   36345
        DATA        DATA_0003  2816   10629
        DATA        DATA_0004  2883   34850
        DATA        DATA_0005  59306  38097
        DATA        DATA_0006  2151   10129
        DATA        DATA_0007  2686   10376
        DATA        DATA_0008  2105   8955
        DATA        DATA_0009  9121   36713
        DATA        DATA_0010  3557   8596
        DATA        DATA_0011  17458  9269

DBT-06103 The port (1,521) is already used

Thanks Tornike Kupatadze for this testing case!

Problem:

During my OCA class, after successful 19c database software installation, we were creating a database using dbca and got the following error:

If the listener had already been configured we would have had an error DBT-06103 The port (5,500) is already used while configuring EM express. But still, the solution is the same.

We have checked and the port was not used:

# netstat -a |grep 1521

Reason:

The hostname is not reachable. The reason in our case was that /etc/hosts did not contain entries about this server.

Solution:

Qualify the hostname into the /etc/hosts:

After adding the above entry, we were able to continue.

ORA-15120: ASM file name ‘ORA-27090: Unable to reserve kernel resources f’ does not begin with the ASM prefix character

Problem:

The customer created 36 databases on the same server and while creating the 37th using dbca got the following error:

Reason:

fs.aio-max-nr value was set too low in /etc/sysctl.conf. In general, value 3145728 that was set in their case, suits many environments, but if the number of databases on the server increases then this parameter should be adjusted accordingly.

Solution:

The formula used while calculating the value for this parameter is the following:

aio-max-nr = no of process per DB * no of databases * 4096

In their case, the number of processes per DB was 1000, the number of databases that planned to be created was 80. Based on the above value should be:

aio-max-nr = 327680000
  1. Add/update value in /etc/sysctl.conf:
# vim /etc/sysctl.conf

fs.aio-max-nr = 327680000


2. Run /sbin/sysctl -p to immediately enforce the changes:

# sysctl -p 

Delete already created files and recreate the database, it will succeed this time.

RMAN spread a backup job between many RAC instances in parallel to increase throughput

There are two options to allocate RMAN channels on different RAC instances to increase the throughput.

I will start with the option, that assures all RAC instances get one channel. Regarding the other option, it does load balance but in a random way, so with a small number of channels, you may see that all of them are allocated in one instance. So you decide which option is better for you.

The test is done on a 2-node cluster.

  1. Configure parallelism and two channels in RMAN. Indicate a connect string, one per instance:
$ RMAN target /
RMAN> CONFIGURE DEVICE TYPE disk PARALLELISM 2;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL1 as sysdba';
RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL2 as sysdba';

2. Define ORCL1 and ORCL2 aliases on each database node under $ORACLE_HOME/network/admin/tnsnames.ora:

ORCL1=
(DESCRIPTION=
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac1.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )
 
ORCL2=
(DESCRIPTION=
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )

Please note, in my case 1522 is a local listener port.

3. Run backup:

RMAN> backup database;
 
Starting backup at 22-MAR-22
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=507 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=753 instance=orcl2 device type=DISK
channel ORA_DISK_2: SID=753 instance=orcl2 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/ORCL/DATAFILE/system.257.1098460673
input datafile file number=00004 name=+DATA/ORCL/DATAFILE/undotbs1.259.1098460743
channel ORA_DISK_1: starting piece 1 at 22-MAR-22
channel ORA_DISK_2: starting full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00003 name=+DATA/ORCL/DATAFILE/sysaux.258.1098460717
input datafile file number=00005 name=+DATA/ORCL/DATAFILE/undotbs2.265.1098461311
input datafile file number=00007 name=+DATA/ORCL/DATAFILE/users.260.1098460743

As you see, two files were backed up by the 1st channel (1st instance) and the other three files by the 2nd channel (2nd instance).

Now let’s explain another possible variant:

  1. Configure one TNS string with load balance parameter:
ORCL_BALANCE=
     (DESCRIPTION=
           (TRANSPORT_CONNECT_TIMEOUT=3) (RETRY_COUNT=6)(LOAD_BALANCE=on)
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac1.example.com) (PORT=1522))
           (ADDRESS= (PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1522))
           (CONNECT_DATA =
              (SERVER = DEDICATED)
              (SERVICE_NAME = orcl)
           )
      )

Both node addresses are defined and Oracle will pick each address randomly.

2. Configure parallelism and one channel with ORCL_BALANCE string:

Please note, I did this test case on the same server where I’ve already defined CHANNEL 1 and CHANNEL 2, so I had to clear them:

RMAN> CONFIGURE CHANNEL DEVICE TYPE DISK clear;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK clear;
RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK clear;

Define channel and parallelism:

RMAN> CONFIGURE DEVICE TYPE disk PARALLELISM 2;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/Oracle123@ORCL_BALANCE as sysdba';

RMAN> backup database;
 
Starting backup at 22-MAR-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=752 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=138 instance=orcl1 device type=DISK

Both channels were allocated on orcl1. Try one more time, or better configure parallelism 3, CHANNEL parameter is already defined and it is permanent until changed:

RMAN>  CONFIGURE DEVICE TYPE disk PARALLELISM 3;
 
RMAN> backup database;
 
Starting backup at 22-MAR-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=752 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=138 instance=orcl1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=753 instance=orcl2 device type=DISK

As you see it really did a random choice. But two channels were allocated on the 1st node and the last one on the 2nd node.

I think here random algorithm is not a good option, but you better know which variant is appropriate in your case.

Change default kernel using grubby Tool

There are several ways to fulfill the same task, I am providing one of them.

  1. Check the information about currently loaded kernel:
# uname -r
5.4.17-2036.101.2.el7uek.x86_64

2. Find all available kernels in your system and locate their index number:

# grubby --info=ALL
index=0
kernel=/boot/vmlinuz-5.4.17-2036.101.2.el7uek.x86_64
args="ro console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 numa=off transparent_hugepage=never net.ifnames=0"
root=/dev/mapper/rootvg-rootlv
initrd=/boot/initramfs-5.4.17-2036.101.2.el7uek.x86_64.img
title=Oracle Linux Server 7.9, with Unbreakable Enterprise Kernel 5.4.17-2036.101.2.el7uek.x86_64

index=1
kernel=/boot/vmlinuz-3.10.0-1160.42.2.el7.x86_64
args="ro console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 numa=off transparent_hugepage=never net.ifnames=0"
root=/dev/mapper/rootvg-rootlv
initrd=/boot/initramfs-3.10.0-1160.42.2.el7.x86_64.img
title=Oracle Linux Server 7.9, with Linux 3.10.0-1160.42.2.el7.x86_64

index=2
kernel=/boot/vmlinuz-0-rescue-d3dd3af16fd242cebb997c6041d68ad3
args="ro console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 numa=off transparent_hugepage=never net.ifnames=0"
root=/dev/mapper/rootvg-rootlv
initrd=/boot/initramfs-0-rescue-d3dd3af16fd242cebb997c6041d68ad3.img

3. Check currently loaded kernel index using grubby tool (actually, we could find the same from 1st and 2nd steps, but let’s do one more time):

# grubby --default-index
0

4. Change the default kernel, in my case I want to set it to vmlinuz-3.10.0-1160.42.2.el7.x86_64 and it’s index number is 1:

# grubby --set-default 1

5. Reboot the system and check the kernel again:

# reboot
# uname -r
3.10.0-1160.42.2.el7.x86_64

OPATCHAUTO-72050: System instance creation failed, cluvfy fails Verifying ‘/tmp/’ …FAILED (PRVF-7546)

Problem:

While running cluvfy we get the following error:

[grid@rac1 grid]$ ./runcluvfy.sh stage -pre crsinst -n rac1,rac2 -verbose
...
Failures were encountered during execution of CVU verification request "stage -pre crsinst".
...
Verifying '/tmp/' ...FAILED
Cannot run program "/usr/bin/scp": error=13, Permission denied

You should not continue patching when runcluvfy fails, strongly recommended to solve all failed items and only after that run opatchauto. But if you do insist to run patching then you will get the following:

[root@rac1 33509923]# /u01/app/19.3.0/grid/OPatch/opatchauto apply /u01/app/patchinstall/33509923 -oh /u01/app/19.3.0/grid
OPatchauto session is initiated at Sun Mar 20 06:27:17 2022

System initialization log file is /u01/app/19.3.0/grid/cfgtoollogs/opatchautodb/systemconfig2022-03-20_06-27-42AM.log.
...
OPATCHAUTO-72050: System instance creation failed.
OPATCHAUTO-72050: Failed while retrieving system information.
OPATCHAUTO-72050: Please check log file for more details.

Reason:

The issue in our case was that /usr/bin/scp did not have correct permissions. It had 600 while it should have had 755. Why this happened? Don’t really know… it should not be happening.

Solution:

Set correct permissions on both nodes for scp binary using the following way:

# chmod 755 /bin/scp

Retry patching, in our case it was successful: