“kernel: serial8250: too much work for irq4” potential problem caused by Azure OMS Agent

Problem:

There are a lot of warnings “kernel: serial8250: too much work for irq4 ” in /var/log/messages and are likely your system experiences stability problems. And can lead to Oracle cluster node evictions.

Cause:

The problem was related to Azure OAM Agent pushing very large messages to serial console. The problem was introduced by the latest update of the Azure OMS agent.

Temporary Solution:

Temporarily remove OMS Linux Agent Extension until Microsoft resolves this bug:

1. On Azure portal click the link of the affected VM.
2. Click the “Extensions” section.
3. Click the OMS Linux Agent in the list.
4. Click the “Uninstall” button at the top

When you make sure that OMS agent bug is fixed (should be verified with Microsoft support), then you can reinstall the pluggin.

Detach diskgroup from 12c GI and attach to 19c GI

Task:

We have two separate Real Application Clusters, one 12c and another 19c. We decided to migrate data from 12c to 19c by simply detaching all ASM disks from the source and attaching to the destination.

Steps:

1. Connect to the 12c GI via grid user and dismount FRA diskgroup on all nodes:

[grid@rac1 ~]$ sqlplus  / as sysasm
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> alter diskgroup FRA dismount;
Diskgroup altered. 
[grid@rac2 ~]$ sqlplus  / as sysasm
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> alter diskgroup FRA dismount;
Diskgroup altered.

You can also use srvctl to stop the diskgroup with one command.

2. Detach disks belonging to the specific diskgroup from 12c cluster and attach to 19c cluster.

3. After ASM disks are visible on 19c cluster, connect as sysasm via grid user and mount the diskgroup:

# Check that there is no FRA resource registered with CRS:

[root@rac1 ~]# crsctl status res -t |grep FRA

# Mount the diskgroup on all nodes

[grid@rac1 ~]$ sqlplus / as sysasm
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> alter diskgroup FRA mount;
Diskgroup altered.
[grid@rac2 ~]$ sqlplus / as sysasm
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> alter diskgroup FRA mount;
Diskgroup altered.

# FRA diskgroup resource will automatically be registered with CRS:

[root@rac1 ~]# crsctl status res -t |grep FRA
ora.FRA.dg(ora.asmgroup)

And data will be there…

Postfix: connect to gmail-smtp-in.l.google.com [2607:f8b0:400c:c0b::1a]:25: Network is unreachable

Problem:

I am not able to receive email alerts from database server. Because message transfer agent is trying to connect to the Google SMTP via IPv6, which fails.

# tail /var/log/maillog

Jun 12 15:35:10 rac1 postfix/smtp[19725]:connect to 
gmail-smtp-in.l.google.com [2607:f8b0:400c:c0b::1a]:25: 
Network is unreachable

Solution:

Configure Postfix not to use IPv6 by editing /etc/postfix/main.cf with the following:

[root@rac1 ~]# cat /etc/postfix/main.cf | grep inet_protocols
inet_protocols = ipv4

Restart Postfix and check the status:

[root@rac1 ~]# systemctl restart postfix

[root@rac1 ~]# systemctl status  postfix
 ● postfix.service - Postfix Mail Transport Agent
    Loaded: loaded (/usr/lib/systemd/system/postfix.service; enabled; vendor preset: disabled)
    Active: active (running) since Thu 2019-06-13 10:20:48 UTC; 52s ago
   Process: 17431 ExecStop=/usr/sbin/postfix stop (code=exited, status=0/SUCCESS)
   Process: 17449 ExecStart=/usr/sbin/postfix start (code=exited, status=0/SUCCESS)
   Process: 17445 ExecStartPre=/usr/libexec/postfix/chroot-update (code=exited, status=0/SUCCESS)
   Process: 17442 ExecStartPre=/usr/libexec/postfix/aliasesdb (code=exited, status=0/SUCCESS)
  Main PID: 17520 (master)
    Memory: 3.0M
    CGroup: /system.slice/postfix.service
            ├─17520 /usr/libexec/postfix/master -w
            ├─17521 pickup -l -t unix -u
            └─17522 qmgr -l -t unix -u
 Jun 13 10:20:48 rac1.example.com systemd[1]: Starting Postfix Mail Transport Agent…
 Jun 13 10:20:48 rac1.example.com postfix/postfix-script[17518]: starting the Postfix mail system
 Jun 13 10:20:48 rac1.example.com postfix/master[17520]: daemon started -- version 2.10.1, configuration /etc/postfix
 Jun 13 10:20:48 rac1.example.com systemd[1]: Started Postfix Mail Transport Agent

Upgrading/Installing TFA with OSWatcher

The whole process is very simple and straightforward.
Post seems big but most of the content is a command output.

1. Download TFA Collector – TFA with Database Support Tools Bundle from Doc ID 1513912.1

2. Place downloaded zip file on rac1 and unzip it:

# cd /u01/app/sw
# ll
…
-rw-r--r-- 1 root root      264751391 Apr 25 19:18 TFA-LINUX_v19.2.1

# unzip TFA-LINUX_v19.2.1 

3. Install TFA:

[root@rac1 sw]# ./installTFA-LINUX 

TFA Installation Log will be written to File : /tmp/tfa_install_21556_2019_06_03-10_39_10.log
Starting TFA installation
 TFA Version: 192100 Build Date: 201904251105
 TFA HOME : /u01/app/12.2.0/grid/tfa/rac1/tfa_home
 Installed Build Version: 184100 Build Date: 201902262137
 TFA is already installed. Upgrading TFA
 TFA Upgrade Log : /u01/app/12.2.0/grid/tfa/rac1/tfapatch.log
 TFA will be upgraded on : 
 rac1
 rac2
 Do you want to continue with TFA Upgrade ? [Y|N] [Y]: Y
 Checking for ssh equivalency in rac2
 Node rac2 is not configured for ssh user equivalency
 SSH is not configured on these nodes : 
 rac2
 Do you want to configure SSH on these nodes ? [Y|N] [Y]: N
 Patching remote nodes using TFA Installer /u01/app/sw/installTFA-LINUX…
 Copying TFA Installer to rac2…
 lost connection
 Starting TFA Installer on rac2…
 Upgrading TFA on rac1 :
 Stopping TFA Support Tools…
 Shutting down TFA for Patching…
 Shutting down TFA
 Removed symlink /etc/systemd/system/multi-user.target.wants/oracle-tfa.service.
 Removed symlink /etc/systemd/system/graphical.target.wants/oracle-tfa.service.
 Successfully shutdown TFA..
 No Berkeley DB upgrade required
 Copying TFA Certificates…
 Starting TFA in rac1…
 Starting TFA..
 Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
 Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
 Waiting up to 100 seconds for TFA to be started..
 . . . . . 
 Successfully started TFA Process..
 . . . . . 
 TFA Started and listening for commands
 Enabling Access for Non-root Users on rac1…
 Connection refused!rac2
 RemoteUtil : Connection refused!rac2
 .------------------------------------------------------------.
 | Host | TFA Version | TFA Build ID         | Upgrade Status |
 +------+-------------+----------------------+----------------+
 | rac1 |  19.2.1.0.0 | 19210020190425110550 | UPGRADED       |
 | rac2 | -           | -                    | NOT UPGRADED   |
 '------+-------------+----------------------+----------------'

The reason why it did not upgrade on rac2, is that I did not have ssh equivalency between nodes for root user.

I could enable ssh passwordless authentication and TFA would be upgraded in one step, but because of the security I will not enable it and just manually install TFA on the second node:

4. Copy installation file to rac2 and install:

[root@rac2 ~]# mkdir -p /u01/app/sw/
[root@rac2 ~]# chmod -R 777 /u01/app/sw/
[root@rac2 ~]# su - oracle
[oracle@rac2 ~]$ cd /u01/app/sw/  
[oracle@rac2 sw]$ scp rac1:/u01/app/sw/installTFA-LINUX .
installTFA-LINUX                           100%  254MB 114.9MB/s   00:02 
[root@rac2 sw]# ./installTFA-LINUX 
 TFA Installation Log will be written to File : /tmp/tfa_install_15370_2019_06_03-10_50_05.log
 Starting TFA installation
 TFA Version: 192100 Build Date: 201904251105
 TFA HOME : /u01/app/12.2.0/grid/tfa/rac2/tfa_home
 Installed Build Version: 184100 Build Date: 201902262137
 TFA is already installed. Upgrading TFA
 TFA Upgrade Log : /u01/app/12.2.0/grid/tfa/rac2/tfapatch.log
 TFA-00002 Oracle Trace File Analyzer (TFA) is not running
 TFA-00002 Oracle Trace File Analyzer (TFA) is not running
 Unable to determine the status of TFA in other nodes.
 TFA will be upgraded on Node rac2:
 Do you want to continue with TFA Upgrade ? [Y|N] [Y]: 
 Upgrading TFA on rac2 :
 Stopping TFA Support Tools…
 Shutting down TFA for Patching…
 Shutting down TFA
 Removed symlink /etc/systemd/system/multi-user.target.wants/oracle-tfa.service.
 Removed symlink /etc/systemd/system/graphical.target.wants/oracle-tfa.service.
 . . . . . 
 . . . 
 Successfully shutdown TFA..
 No Berkeley DB upgrade required
 Copying TFA Certificates…
 Starting TFA in rac2…
 Starting TFA..
 Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
 Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
 Waiting up to 100 seconds for TFA to be started..
 . . . . . 
 Successfully started TFA Process..
 . . . . . 
 TFA Started and listening for commands
 Enabling Access for Non-root Users on rac2…
 .------------------------------------------------------------.
 | Host | TFA Version | TFA Build ID         | Upgrade Status |
 +------+-------------+----------------------+----------------+
 | rac2 |  19.2.1.0.0 | 19210020190425110550 | UPGRADED       |
 | rac1 |  19.2.1.0.0 | 19210020190425110550 | UPGRADED       |
 '------+-------------+----------------------+----------------'

5. Stop and Start TFA on rac1 and rac2:

# tfactl stop
Sending stoptfa
Success
Stopping TFA from the Command Line
Nothing to do !
LCM is not running
TFA is running  - Will wait 5 seconds (up to 3 times)  
TFA-00104 Cannot establish connection with TFA Server. Please check TFA Certificates
Killing TFA running with pid 16627
. . . 
Successfully stopped TFA..

# tfactl start
TFA-00002 Oracle Trace File Analyzer (TFA) is not running
Starting TFA..
Waiting up to 100 seconds for TFA to be started..
Successfully started TFA Process..

# ps -ef|grep OSWatcher
root     14528     1  0 May30 ?        00:44:11 /bin/sh ./OSWatcher.sh
root     14850 14528  0 May30 ?        00:00:58 /bin/sh ./OSWatcherFM.sh 48 /home/fg/oswbb/archive

6. Check OSWatcher repository location:

# ll /home/fg/oswbb/archive

drwxr-xr-x 2 root root  202 May 30 22:00 oswcpuinfo
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswifconfig
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswiostat
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswmeminfo
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswmpstat
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswnetstat
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswnfsiostat
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswpidstat
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswpidstatd
drwxr-xr-x 2 root root    6 May 30 21:56 oswprvtnet
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswps
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswslabinfo
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswtop
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswvmstat
drwxr-xr-x 2 root root    6 May 30 21:56 oswxentop

Rollback RU patches from 12c GI home using opatchauto

Junior DBAs will find these steps useful 🙂

Environment details:

Two-node Real Application Cluster.
Database version: 12.2.0.1
Applied RU: 16-04-2019

1. Check existing patches

[grid@rac1 ~]$  /u01/app/12.2.0/grid/OPatch/opatch lspatches
29314424;OCW APR 2019 RELEASE UPDATE 12.2.0.1.190416 (29314424)
29314339;Database Apr 2019 Release Update : 12.2.0.1.190416 (29314339)
29301676;ACFS APR 2019 RELEASE UPDATE 12.2.0.1.190416 (29301676)
28566910;TOMCAT RELEASE UPDATE 12.2.0.1.0(ID:180802.1448.S) (28566910)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)
OPatch succeeded.

Note that all these patches are part of RU 16-04-2019.

2. Stop all database instances on that node:

# srvctl stop instance -db orclA -i orclA1

3. Download Release Update 16-04-2019 (p29301687_122010_Linux-x86-64.zip), unzip and go to the unzipped patch location:

To rollback all these patches it is easier to have unzipped Release Update 16-04-2019 patch (all existing patches are part of it) on the server.

If you cannot download zipped RU then you need to indicate all patch ids in the list during opatchauto rollback -id 29314424,29314339,29301676,28566910,26839277

As long as I have unzipped RU on rac1, I will do by the following way:

[root@rac1 ~]# cd /u01/app/sw/29301687

[root@rac1 29301687]# ll
 total 132
 drwxr-x--- 4 grid oinstall     48 Mar 25 01:09 26839277
 drwxr-x--- 4 grid oinstall     48 Mar 25 01:08 28566910
 drwxr-x--- 5 grid oinstall     62 Mar 25 01:03 29301676
 drwxr-x--- 4 grid oinstall     67 Mar 25 01:08 29314339
 drwxr-x--- 5 grid oinstall     62 Mar 25 01:06 29314424
 drwxr-x--- 2 grid oinstall   4096 Mar 25 01:03 automation
 -rw-rw-r-- 1 grid oinstall   5828 Mar 25 01:29 bundle.xml
 -rw-r--r-- 1 grid oinstall 120219 Apr 10 18:07 README.html
 -rw-r----- 1 grid oinstall      0 Mar 25 01:03 README.txt

4. Rollback patches using opatchauto:

[root@rac1 29301687]# /u01/app/12.2.0/grid/OPatch/opatchauto rollback -oh /u01/app/12.2.0/grid
 ….
 ==Following patches were SUCCESSFULLY rolled back:
 Patch: /u01/app/sw/29301687/29314424
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/29301676
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/26839277
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/28566910
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log
 Patch: /u01/app/sw/29301687/29314339
 Log: /u01/app/12.2.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-05-29_12-56-19PM_1.log

5. Start database instance on the first node and shutdown on the second:

# srvctl start instance -db orclA -i orclA1
# srvctl stop instance -db orclA -i orclA2

6. Connect to the second node and repeat the same steps:

[root@rac2 ~]# cd /u01/app/sw/29301687

[root@rac2 29301687]# /u01/app/12.2.0/grid/OPatch/opatchauto rollback -oh /u01/app/12.2.0/grid

7. Start database instance on rac2

# srvctl start instance -db orclA -i orclA2

8. Check inventory

$  /u01/app/12.2.0/grid/OPatch/opatch lspatches

There are no Interim patches installed in this Oracle Home "/u01/app/12.2.0/grid".
 OPatch succeeded.