FlashGrid SkyCluster Now Supports Oracle Database 19c

FlashGrid SkyCluster Version 19.06 now has full support of GI/DB 19c, which means using FG launcher tool (https://www.flashgrid.io/skycluster-for-aws/#launch , https://www.flashgrid.io/skycluster-for-azure/#launch , https://www.flashgrid.io/skycluster-for-gcp/#launch ) , you can setup multi-node Real Application Clusters in the cloud automatically in about 2 hours.

“Oracle 19c is a long-term support release from Oracle with extended support available through 2026”

https://www.kb.flashgrid.io/release-notes/cloud-provisioningFlashGrid

sshd: /etc/ssh/sshd_config: Permission denied

Problem:

sshd and chronyd services on the database server were in a failed state and not able to start because of the permission problem on their configuration files. Permissions on these files were correct and services should have been able to start, so there was something else… let’s dig into the details.

# systemctl status sshd
 â sshd.service - OpenSSH server daemon
    Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
    Active: activating (auto-restart) (Result: exit-code) since Tue 2019-07-09 12:21:49 UTC; 32s ago
      Docs: man:sshd(8)
            man:sshd_config(5)
   Process: 124026 ExecStart=/usr/sbin/sshd -D $OPTIONS (code=exited, status=1/FAILURE)
Main PID: 124026 (code=exited, status=1/FAILURE)
Jul 09 12:21:49 node03 systemd[1]: Failed to start OpenSSH server daemon.
Jul 09 12:21:49 node03 systemd[1]: Unit sshd.service entered failed state.
Jul 09 12:21:49 node03 systemd[1]: sshd.service failed

`journalctl -xe` shows:

-- Unit sshd.service has begun starting up.
Jul 09 12:26:03 node03 sshd[129121]: /etc/ssh/sshd_config: Permission denied
Jul 09 12:26:03 node03 systemd[1]: sshd.service: main process exited, code=exited, status=1/FAILURE
Jul 09 12:26:03 node03 systemd[1]: Failed to start OpenSSH server daemon.
-- Subject: Unit sshd.service has failed

The same problem was happening with chronyd service. It was claiming about /etc/chrony.conf file. Incorrect time on database servers can cause node evictions.

Reason:

If permissions on these files are correct, we can think about SELinux, let’s check:

# getenforce 
Enforcing

Solution:

Disable SELinux and reboot the server:

# vim /etc/selinux/config
SELINUX=disabled

# reboot

Summary:

I consider SELinux as a non-desirable service on the database servers. But I appreciate opinion of my colleages/friends and I want to share it with you.

SELinux can be enabled with the correct config in RHEL 4,5,6 – “Starting with Oracle Database 11g Release 2 (11.2), the Security Enhanced Linux (SELinux) feature is supported for Oracle Linux 4, Oracle Linux 5, Oracle Linux 6, Red Hat Enterprise Linux 4, Red Hat Enterprise Linux 5, and Red Hat Enterprise Linux 6.
https://docs.oracle.com/cd/E11882_01/install.112/e47689/pre_install.htm#LADBI1092

SELinux is a good security tool and usually I only disable it as a last resort or if the software doesn’t support it.

PRCS-1007 : Server pool RPPCOMS already exists

Problem:

While adding database using srvctl add database command got PRCS-1007 and PRCR-1086 errors:

$ srvctl add database -db RPPCOMS -oraclehome /u01/app/oracle/product/12.1.0/dbhome_1 -spfile +DATA/RPPCOMS/spfileRPPCOMS.ora -pwfile +DATA/RPPCOMS/PASSWORD/pwdrppcoms.256.1005727427 -dbname RPPCOMS  -diskgroup FRA,DATA

PRCS-1007 : Server pool RPPCOMS already exists
PRCR-1086 : server pool ora.RPPCOMS is already registered

Solution:

Remove mentioned server pool via root user:

# crsctl delete serverpool ora.RPPCOMS

And retry adding database service.

19cGI & 12cRDBMS opatchauto: Re-link fails on target “procob”

Problem:

Environment: 19c GI | 12c RDBMS | RHEL 7.6

While applying Apr 2019 RU on top of the RDBMS home, opatchauto failed:

# /u01/app/oracle/product/12.2.0/dbhome_1/OPatch/opatchauto apply /u01/swtmp/29314339/ -oh /u01/app/oracle/product/12.2.0/dbhome_1 

==Following patches FAILED in apply:
Patch: /u01/swtmp/29314339
Log: /u01/app/oracle/product/12.2.0/dbhome_1/cfgtoollogs/opatchauto/core/opatch/opatch2019-06-24_18-51-22PM_1.log
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: Re-link fails on target "procob".
Re-link fails on target "proc". 

We’ve tried using opatch instead of opatchauto and it succeeded:

$ /u01/app/oracle/product/12.2.0/dbhome_1/OPatch/opatch apply

Patch 29314339 successfully applied.
OPatch succeeded.

The problem is definitelly related to opatchauto . We’ve opened SR to Oracle and I will update this post as soon as I get the solution from them. Before that, I want to share three workarounds for this problem with you.

Workarounds:

1. The first workaround, as you have already guessed is to use opatch instead of opatchauto.

$ /u01/app/oracle/product/12.2.0/dbhome_1/OPatch/opatch apply

2. The second workaround is to edit actions.xml file under PSU (e.g /u01/swtmp/29314339/etc/config/actions.xml) and remove the following entries (see Doc ID 2056670.1):

<oracle.precomp.lang opt_req="O" version="12.2.0.1.0">
<make change_dir="%ORACLE_HOME%/precomp/lib" make_file="ins_precomp.mk" make_target="procob"/>
</oracle.precomp.lang>
  
<oracle.precomp.common opt_req="O" version="12.2.0.1.0">
<make change_dir="%ORACLE_HOME%/precomp/lib" make_file="ins_precomp.mk" make_target="proc"/>
</oracle.precomp.common> 

Retry opatchauto.

3. The third workaround is to backup libons.so file in GI home and then copy from RDBMS home (You should return it back, after opatchauto succeeds).

# mv  /u01/app/19.3.0/grid/lib/libons.so  /u01/app/19.3.0/grid/lib/libons.so_backup 
# cp  /u01/app/oracle/product/12.2.0/dbhome_1/lib/libons.so  /u01/app/19.3.0/grid/lib/libons.so  

The reason I’ve used this workaround is that, I’ve found opatchauto was using libons.so file from GI home instead of RDBMS:

[WARNING]OUI-67200:Make failed to invoke "
/usr/bin/make -f ins_precomp.mk proc ORACLE_HOME=/u01/app/oracle/product/12.2.0/dbhome_1"….
'/u01/app/19.3.0/grid/lib/libons.so: undefined reference to `memcpy@GLIBC_2.14'                         

If we compare libons.so files between GI and RDBMS homes, we will find that they are not the same. The problem is that opatchauto uses that file from wrong location. As long as it is not easy to find a way to force opatchauto to use libons.so file from the correct location (working on this with Oracle Support), this workaournd can also be considered.

Please keep in mind, you must return GI libons.so back after opatchauto succeeds. We don’t know what happens if we have 12c libons.so file under 19c GI.

“kernel: serial8250: too much work for irq4” potential problem caused by Azure OMS Agent

Problem:

There are a lot of warnings “kernel: serial8250: too much work for irq4 ” in /var/log/messages and are likely your system experiences stability problems. And can lead to Oracle cluster node evictions.

Cause:

The problem was related to Azure OAM Agent pushing very large messages to serial console. The problem was introduced by the latest update of the Azure OMS agent.

Temporary Solution:

Temporarily remove OMS Linux Agent Extension until Microsoft resolves this bug:

1. On Azure portal click the link of the affected VM.
2. Click the “Extensions” section.
3. Click the OMS Linux Agent in the list.
4. Click the “Uninstall” button at the top

When you make sure that OMS agent bug is fixed (should be verified with Microsoft support), then you can reinstall the pluggin.

Detach diskgroup from 12c GI and attach to 19c GI

Task:

We have two separate Real Application Clusters, one 12c and another 19c. We decided to migrate data from 12c to 19c by simply detaching all ASM disks from the source and attaching to the destination.

Steps:

1. Connect to the 12c GI via grid user and dismount FRA diskgroup on all nodes:

[grid@rac1 ~]$ sqlplus  / as sysasm
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> alter diskgroup FRA dismount;
Diskgroup altered. 
[grid@rac2 ~]$ sqlplus  / as sysasm
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> alter diskgroup FRA dismount;
Diskgroup altered.

You can also use srvctl to stop the diskgroup with one command.

2. Detach disks belonging to the specific diskgroup from 12c cluster and attach to 19c cluster.

3. After ASM disks are visible on 19c cluster, connect as sysasm via grid user and mount the diskgroup:

# Check that there is no FRA resource registered with CRS:

[root@rac1 ~]# crsctl status res -t |grep FRA

# Mount the diskgroup on all nodes

[grid@rac1 ~]$ sqlplus / as sysasm
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> alter diskgroup FRA mount;
Diskgroup altered.
[grid@rac2 ~]$ sqlplus / as sysasm
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> alter diskgroup FRA mount;
Diskgroup altered.

# FRA diskgroup resource will automatically be registered with CRS:

[root@rac1 ~]# crsctl status res -t |grep FRA
ora.FRA.dg(ora.asmgroup)

And data will be there…

Postfix: connect to gmail-smtp-in.l.google.com [2607:f8b0:400c:c0b::1a]:25: Network is unreachable

Problem:

I am not able to receive email alerts from database server. Because message transfer agent is trying to connect to the Google SMTP via IPv6, which fails.

# tail /var/log/maillog

Jun 12 15:35:10 rac1 postfix/smtp[19725]:connect to 
gmail-smtp-in.l.google.com [2607:f8b0:400c:c0b::1a]:25: 
Network is unreachable

Solution:

Configure Postfix not to use IPv6 by editing /etc/postfix/main.cf with the following:

[root@rac1 ~]# cat /etc/postfix/main.cf | grep inet_protocols
inet_protocols = ipv4

Restart Postfix and check the status:

[root@rac1 ~]# systemctl restart postfix

[root@rac1 ~]# systemctl status  postfix
 ● postfix.service - Postfix Mail Transport Agent
    Loaded: loaded (/usr/lib/systemd/system/postfix.service; enabled; vendor preset: disabled)
    Active: active (running) since Thu 2019-06-13 10:20:48 UTC; 52s ago
   Process: 17431 ExecStop=/usr/sbin/postfix stop (code=exited, status=0/SUCCESS)
   Process: 17449 ExecStart=/usr/sbin/postfix start (code=exited, status=0/SUCCESS)
   Process: 17445 ExecStartPre=/usr/libexec/postfix/chroot-update (code=exited, status=0/SUCCESS)
   Process: 17442 ExecStartPre=/usr/libexec/postfix/aliasesdb (code=exited, status=0/SUCCESS)
  Main PID: 17520 (master)
    Memory: 3.0M
    CGroup: /system.slice/postfix.service
            ├─17520 /usr/libexec/postfix/master -w
            ├─17521 pickup -l -t unix -u
            └─17522 qmgr -l -t unix -u
 Jun 13 10:20:48 rac1.example.com systemd[1]: Starting Postfix Mail Transport Agent…
 Jun 13 10:20:48 rac1.example.com postfix/postfix-script[17518]: starting the Postfix mail system
 Jun 13 10:20:48 rac1.example.com postfix/master[17520]: daemon started -- version 2.10.1, configuration /etc/postfix
 Jun 13 10:20:48 rac1.example.com systemd[1]: Started Postfix Mail Transport Agent