LGWR: Primary database is in MAXIMUM AVAILABILITY mode | ORA-16072: a minimum of one standby database destination is required

Problem:

One of our customer cloned database from a DG environment to a different host and tried to open it as a standalone database. Controlfile and datafiles still considered the database in maximum availability mode.

Errors after trying to open the database:

LGWR: Primary database is in MAXIMUM AVAILABILITY mode
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
LGWR: Minimum of 1 LGWR standby database required
Thu Jul 18 18:43:14 2019
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lgwr_39735_39805.trc:
ORA-16072: a minimum of one standby database destination is required

Solution:

SQL> startup mount;
SQL> alter database set standby database to maximize performance;
SQL> shutdown immediate;

$ srvctl start database -d orcl 

UDEV rules for configuring ASM disks

Problem:

During my previous installations I used the following udev rule on multipath devices:

KERNEL=="dm-[0-9]*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", RESULT=="360050768028200a9a40000000000001c", NAME="oracleasm/asm-disk1", OWNER="oracle", GROUP="asmadmin", MODE="0660"

So to identify the exact disk I used PROGRAM option. The above script looks through `/dev/dm-*` devices and if any of them satisfy the condition, for example:

# scsci_id -gud /dev/dm-3
360050768028200a9a40000000000001c 

then device name will be changed to /dev/oracleasm/asm-disk1, owner:group to grid:asmadmin and permission to 0660

But on my new servers same udev rule was not working anymore. (Of course, it needs more investigation, but our time is really valuable and never enough and if we know another solution that works and is acceptable- let’s just use it)

Solution:

I used udevadm command to identify other properties of these devices and wrote new udev rule (to see all properties, just remove grep):

# udevadm info --query=property --name /dev/mapper/asm1 | grep DM_UUID
DM_UUID=mpath-360050768028200a9a40000000000001c

New udev rule looks like this:

# cat /etc/udev/rules.d/99-oracle-asmdevices.rules
ENV{DM_UUID}=="mpath-360050768028200a9a40000000000001c",  SUBSYSTEM=="block", NAME="oracleasm/asm-disk1", OWNER="grid", GROUP="asmadmin", MODE="0660"

Trigger udev rules:

# udevadm trigger

Verify that name, owner, group and permissions are changed:

# ll /dev/oracleasm/
total 0
brw-rw---- 1 grid asmadmin 253, 3 Jul 17 17:33 asm-disk1

TNS-12518: TNS:listener could not hand off client connection | TNS-12547: TNS:lost contact

Problem:

In two-node cluster, client was not able to connect to the second node, but connection to the first node was successful.

Connection from SQL developer threw error: Status: Failure - Test failed: IO Error: Got minus one from a read call, connect lapse 16ms, Authentication lapse 0ms

Connection from sqlplus using TNS string showed:

[oracle@rac02 ~]$ sqlplus "sys@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac02.example.com)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=orcl)))" as sysdba

ORA-12537: TNS:connection closed

Listener.log showed:

 2019-07-18T11:19:23.568231+00:00
 TNS-12518: TNS:listener could not hand off client connection
  TNS-12547: TNS:lost contact
   TNS-12560: TNS:protocol adapter error
    TNS-00517: Lost contact
     Linux Error: 32: Broken pipe

Solution:

This problem can happen in other cases (entries in sqlnet.ora .. in our case it was ok) and we could think about network problem, because initially we were trying to connect from the application sever and from the SQL developer remotely. But after getting ORA-12537: TNS:connection closed error while trying to connect via sqlplus from the local server, we could only think about local non-network related problem.

The reason of this problem was that setuid bit was not set on /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle binary:

Problematic node:

[root@rac02 ~]# ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
 -rwxr-s--x 1 oracle asmadmin 408607040 Apr  4 19:51 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

Healthy node:

[oracle@rac01 ~]$ ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle asmadmin 408607040 Apr  4 19:48 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

We have set setuid bit on oracle binary in RDBMS home:

[root@rac02 ~]# chmod u+s /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

[root@rac02 ~]# ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle asmadmin 408607040 Apr  4 19:51 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

The problem was resolved without restarting the database instance, so clients were able to connect to the 2nd node. But because of it was staging cluster – I still restarted the database, I just made sure that database was started with the correct binary.

FlashGrid SkyCluster Now Supports Oracle Database 19c

FlashGrid SkyCluster Version 19.06 now has full support of GI/DB 19c, which means using FG launcher tool (https://www.flashgrid.io/skycluster-for-aws/#launch , https://www.flashgrid.io/skycluster-for-azure/#launch , https://www.flashgrid.io/skycluster-for-gcp/#launch ) , you can setup multi-node Real Application Clusters in the cloud automatically in about 2 hours.

“Oracle 19c is a long-term support release from Oracle with extended support available through 2026”

https://www.kb.flashgrid.io/release-notes/cloud-provisioningFlashGrid

sshd: /etc/ssh/sshd_config: Permission denied

Problem:

sshd and chronyd services on the database server were in a failed state and not able to start because of the permission problem on their configuration files. Permissions on these files were correct and services should have been able to start, so there was something else… let’s dig into the details.

# systemctl status sshd
 â sshd.service - OpenSSH server daemon
    Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
    Active: activating (auto-restart) (Result: exit-code) since Tue 2019-07-09 12:21:49 UTC; 32s ago
      Docs: man:sshd(8)
            man:sshd_config(5)
   Process: 124026 ExecStart=/usr/sbin/sshd -D $OPTIONS (code=exited, status=1/FAILURE)
Main PID: 124026 (code=exited, status=1/FAILURE)
Jul 09 12:21:49 node03 systemd[1]: Failed to start OpenSSH server daemon.
Jul 09 12:21:49 node03 systemd[1]: Unit sshd.service entered failed state.
Jul 09 12:21:49 node03 systemd[1]: sshd.service failed

`journalctl -xe` shows:

-- Unit sshd.service has begun starting up.
Jul 09 12:26:03 node03 sshd[129121]: /etc/ssh/sshd_config: Permission denied
Jul 09 12:26:03 node03 systemd[1]: sshd.service: main process exited, code=exited, status=1/FAILURE
Jul 09 12:26:03 node03 systemd[1]: Failed to start OpenSSH server daemon.
-- Subject: Unit sshd.service has failed

The same problem was happening with chronyd service. It was claiming about /etc/chrony.conf file. Incorrect time on database servers can cause node evictions.

Reason:

If permissions on these files are correct, we can think about SELinux, let’s check:

# getenforce 
Enforcing

Solution:

Disable SELinux and reboot the server:

# vim /etc/selinux/config
SELINUX=disabled

# reboot

Summary:

I consider SELinux as a non-desirable service on the database servers. But I appreciate opinion of my colleages/friends and I want to share it with you.

SELinux can be enabled with the correct config in RHEL 4,5,6 – “Starting with Oracle Database 11g Release 2 (11.2), the Security Enhanced Linux (SELinux) feature is supported for Oracle Linux 4, Oracle Linux 5, Oracle Linux 6, Red Hat Enterprise Linux 4, Red Hat Enterprise Linux 5, and Red Hat Enterprise Linux 6.
https://docs.oracle.com/cd/E11882_01/install.112/e47689/pre_install.htm#LADBI1092

SELinux is a good security tool and usually I only disable it as a last resort or if the software doesn’t support it.

PRCS-1007 : Server pool RPPCOMS already exists

Problem:

While adding database using srvctl add database command got PRCS-1007 and PRCR-1086 errors:

$ srvctl add database -db RPPCOMS -oraclehome /u01/app/oracle/product/12.1.0/dbhome_1 -spfile +DATA/RPPCOMS/spfileRPPCOMS.ora -pwfile +DATA/RPPCOMS/PASSWORD/pwdrppcoms.256.1005727427 -dbname RPPCOMS  -diskgroup FRA,DATA

PRCS-1007 : Server pool RPPCOMS already exists
PRCR-1086 : server pool ora.RPPCOMS is already registered

Solution:

Remove mentioned server pool via root user:

# crsctl delete serverpool ora.RPPCOMS

And retry adding database service.

19cGI & 12cRDBMS opatchauto: Re-link fails on target “procob”

Problem:

Environment: 19c GI | 12c RDBMS | RHEL 7.6

While applying Apr 2019 RU on top of the RDBMS home, opatchauto failed:

# /u01/app/oracle/product/12.2.0/dbhome_1/OPatch/opatchauto apply /u01/swtmp/29314339/ -oh /u01/app/oracle/product/12.2.0/dbhome_1 

==Following patches FAILED in apply:
Patch: /u01/swtmp/29314339
Log: /u01/app/oracle/product/12.2.0/dbhome_1/cfgtoollogs/opatchauto/core/opatch/opatch2019-06-24_18-51-22PM_1.log
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: Re-link fails on target "procob".
Re-link fails on target "proc". 

We’ve tried using opatch instead of opatchauto and it succeeded:

$ /u01/app/oracle/product/12.2.0/dbhome_1/OPatch/opatch apply

Patch 29314339 successfully applied.
OPatch succeeded.

The problem is definitelly related to opatchauto . We’ve opened SR to Oracle and I will update this post as soon as I get the solution from them. Before that, I want to share three workarounds for this problem with you.

Workarounds:

1. The first workaround, as you have already guessed is to use opatch instead of opatchauto.

$ /u01/app/oracle/product/12.2.0/dbhome_1/OPatch/opatch apply

2. The second workaround is to edit actions.xml file under PSU (e.g /u01/swtmp/29314339/etc/config/actions.xml) and remove the following entries (see Doc ID 2056670.1):

<oracle.precomp.lang opt_req="O" version="12.2.0.1.0">
<make change_dir="%ORACLE_HOME%/precomp/lib" make_file="ins_precomp.mk" make_target="procob"/>
</oracle.precomp.lang>
  
<oracle.precomp.common opt_req="O" version="12.2.0.1.0">
<make change_dir="%ORACLE_HOME%/precomp/lib" make_file="ins_precomp.mk" make_target="proc"/>
</oracle.precomp.common> 

Retry opatchauto.

3. The third workaround is to backup libons.so file in GI home and then copy from RDBMS home (You should return it back, after opatchauto succeeds).

# mv  /u01/app/19.3.0/grid/lib/libons.so  /u01/app/19.3.0/grid/lib/libons.so_backup 
# cp  /u01/app/oracle/product/12.2.0/dbhome_1/lib/libons.so  /u01/app/19.3.0/grid/lib/libons.so  

The reason I’ve used this workaround is that, I’ve found opatchauto was using libons.so file from GI home instead of RDBMS:

[WARNING]OUI-67200:Make failed to invoke "
/usr/bin/make -f ins_precomp.mk proc ORACLE_HOME=/u01/app/oracle/product/12.2.0/dbhome_1"….
'/u01/app/19.3.0/grid/lib/libons.so: undefined reference to `memcpy@GLIBC_2.14'                         

If we compare libons.so files between GI and RDBMS homes, we will find that they are not the same. The problem is that opatchauto uses that file from wrong location. As long as it is not easy to find a way to force opatchauto to use libons.so file from the correct location (working on this with Oracle Support), this workaournd can also be considered.

Please keep in mind, you must return GI libons.so back after opatchauto succeeds. We don’t know what happens if we have 12c libons.so file under 19c GI.

“kernel: serial8250: too much work for irq4” potential problem caused by Azure OMS Agent

Problem:

There are a lot of warnings “kernel: serial8250: too much work for irq4 ” in /var/log/messages and are likely your system experiences stability problems. And can lead to Oracle cluster node evictions.

Cause:

The problem was related to Azure OAM Agent pushing very large messages to serial console. The problem was introduced by the latest update of the Azure OMS agent.

Temporary Solution:

Temporarily remove OMS Linux Agent Extension until Microsoft resolves this bug:

1. On Azure portal click the link of the affected VM.
2. Click the “Extensions” section.
3. Click the OMS Linux Agent in the list.
4. Click the “Uninstall” button at the top

When you make sure that OMS agent bug is fixed (should be verified with Microsoft support), then you can reinstall the pluggin.

Detach diskgroup from 12c GI and attach to 19c GI

Task:

We have two separate Real Application Clusters, one 12c and another 19c. We decided to migrate data from 12c to 19c by simply detaching all ASM disks from the source and attaching to the destination.

Steps:

1. Connect to the 12c GI via grid user and dismount FRA diskgroup on all nodes:

[grid@rac1 ~]$ sqlplus  / as sysasm
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> alter diskgroup FRA dismount;
Diskgroup altered. 
[grid@rac2 ~]$ sqlplus  / as sysasm
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> alter diskgroup FRA dismount;
Diskgroup altered.

You can also use srvctl to stop the diskgroup with one command.

2. Detach disks belonging to the specific diskgroup from 12c cluster and attach to 19c cluster.

3. After ASM disks are visible on 19c cluster, connect as sysasm via grid user and mount the diskgroup:

# Check that there is no FRA resource registered with CRS:

[root@rac1 ~]# crsctl status res -t |grep FRA

# Mount the diskgroup on all nodes

[grid@rac1 ~]$ sqlplus / as sysasm
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> alter diskgroup FRA mount;
Diskgroup altered.
[grid@rac2 ~]$ sqlplus / as sysasm
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> alter diskgroup FRA mount;
Diskgroup altered.

# FRA diskgroup resource will automatically be registered with CRS:

[root@rac1 ~]# crsctl status res -t |grep FRA
ora.FRA.dg(ora.asmgroup)

And data will be there…

Postfix: connect to gmail-smtp-in.l.google.com [2607:f8b0:400c:c0b::1a]:25: Network is unreachable

Problem:

I am not able to receive email alerts from database server. Because message transfer agent is trying to connect to the Google SMTP via IPv6, which fails.

# tail /var/log/maillog

Jun 12 15:35:10 rac1 postfix/smtp[19725]:connect to 
gmail-smtp-in.l.google.com [2607:f8b0:400c:c0b::1a]:25: 
Network is unreachable

Solution:

Configure Postfix not to use IPv6 by editing /etc/postfix/main.cf with the following:

[root@rac1 ~]# cat /etc/postfix/main.cf | grep inet_protocols
inet_protocols = ipv4

Restart Postfix and check the status:

[root@rac1 ~]# systemctl restart postfix

[root@rac1 ~]# systemctl status  postfix
 ● postfix.service - Postfix Mail Transport Agent
    Loaded: loaded (/usr/lib/systemd/system/postfix.service; enabled; vendor preset: disabled)
    Active: active (running) since Thu 2019-06-13 10:20:48 UTC; 52s ago
   Process: 17431 ExecStop=/usr/sbin/postfix stop (code=exited, status=0/SUCCESS)
   Process: 17449 ExecStart=/usr/sbin/postfix start (code=exited, status=0/SUCCESS)
   Process: 17445 ExecStartPre=/usr/libexec/postfix/chroot-update (code=exited, status=0/SUCCESS)
   Process: 17442 ExecStartPre=/usr/libexec/postfix/aliasesdb (code=exited, status=0/SUCCESS)
  Main PID: 17520 (master)
    Memory: 3.0M
    CGroup: /system.slice/postfix.service
            ├─17520 /usr/libexec/postfix/master -w
            ├─17521 pickup -l -t unix -u
            └─17522 qmgr -l -t unix -u
 Jun 13 10:20:48 rac1.example.com systemd[1]: Starting Postfix Mail Transport Agent…
 Jun 13 10:20:48 rac1.example.com postfix/postfix-script[17518]: starting the Postfix mail system
 Jun 13 10:20:48 rac1.example.com postfix/master[17520]: daemon started -- version 2.10.1, configuration /etc/postfix
 Jun 13 10:20:48 rac1.example.com systemd[1]: Started Postfix Mail Transport Agent