Patch Planner to check and request conflict patches

Problem:

Recently, I was applying p29963428_194000ACFSRU_Linux-x86-64.zip on top of 19.4 GI home and got the following error:

==Following patches FAILED in analysis for apply:
 Patch: /u01/swtmp/29963428/29963428
 Log: /u01/app/19.3.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-08-07_10-07-56AM_1.log
 Reason: Failed during Analysis: CheckConflictAgainstOracleHome Failed, [ Prerequisite Status: FAILED, Prerequisite output: 
 Summary of Conflict Analysis:
 There are no patches that can be applied now.
 Following patches have conflicts. Please contact Oracle Support and get the merged patch of the patches : 
 29851014, 29963428
 Conflicts/Supersets for each patch are:
 Patch : 29963428
 Bug Conflict with 29851014 Conflicting bugs are: 29039918, 27494830, 29338628, 29031452, 29264772, 29760083, 28855761 ... 
 After fixing the cause of failure Run opatchauto resume

Solution:

Oracle MOS note ID 1317012.1 describes steps how to check such conflicts and request conflict/merged patches in previous:

1. Run lsinventory from the target home:

[grid@rac1 ~]$ /u01/app/19.3.0/grid/OPatch/opatch lsinventory > GI_lsinventory.txt

2. Logon to support.oracle.com -> Click the “Patch and Updates” tab -> Enter the patch number you want to apply:

2. Click Analyze with OPatch…

3. Attach GI_lsinventory.txt file created in the first step and click “Analyze for Conflict”:

4. Wait for a while and you will see the result. According to it, patch 29963428 conflicts with my current patches:

From the same screen I can “Request Patch”.

5. After clicking “Request Patch” button I got the following error:

Click “See Details”:

The message actually means that fix for the same bug is already included in currently installed 19.4.0.0.190716ACFSRU.

So I don’t have to apply 29963428 patch. I wanted to share the steps with you , because the mentioned tool is really useful.

Resize ASM disks in AWS (FG enabled cluster)

  1. Connect to AWS console https://console.aws.amazon.com
  2. On the left side -> under the section ELASTIC BLOCK STORE -> choose Volumes
  3. Choose necessary disk -> click Actions button -> choose Modify Volume -> change Size
    Please note that all data disks (not quorum disk) must be increased under the same diskgroup, otherwise ASM will not let you to have different sized disks.

Choose another data disks and repeat the same steps.

4. Run the following on database nodes via root user:

# for i in /sys/block/*/device/rescan; do echo 1 > $i; done

5. Check that disks have correct sizes:

# flashgrid-node

6. Connect to the ASM instance from any database node and run:

[grid@rac1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 19.0.0.0.0 - Production on Fri Aug 23 10:17:50 2019
Version 19.4.0.0.0
Copyright (c) 1982, 2019, Oracle.  All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.4.0.0.0

SQL> alter diskgroup GRID resize all; 
Diskgroup altered.

Presentation: Oracle GoldenGate Microservices Overview (with DEMO)

Webinar: Oracle GoldenGate Microservices Overview (with DEMO)

PRVF-6402 : Core file name pattern is not same on all the nodes

Problem:

Oracle 18c GI configuration prerequisite checks failed with the following error:

PRVF-6402 : Core file name pattern is not same on all the nodes. Found core filename pattern "core" on nodes "primrac1". Found core filename pattern "core.%p" on nodes "primrac2".  
- Cause:  The core file name pattern is not same on all the nodes.  
- Action:  Ensure that the mechanism for core file naming works consistently on all the nodes. Typically for Linux, the elements to look into are the contents of two files /proc/sys/kernel/core_pattern or /proc/sys/kernel/core_uses_pid. Refer OS vendor documentation for platforms AIX, HP-UX, and Solaris.

Comparing parameter values on both nodes:

[root@primrac1 ~]# cat /proc/sys/kernel/core_uses_pid
0
[root@primrac2 ~]# cat /proc/sys/kernel/core_uses_pid
1 

[root@primrac1 ~]# sysctl -a|grep core_uses_pid
kernel.core_uses_pid = 0

[root@primrac2 ~]# sysctl -a|grep core_uses_pid
kernel.core_uses_pid = 1

Strange fact was that this parameter was not defined explicitly in sysctl.conf file, but still had different default values:

[root@primrac1 ~]# cat /etc/sysctl.conf |grep core_uses_pid
[root@primrac2 ~]# cat /etc/sysctl.conf |grep core_uses_pid 

Solution:

I’ve set parameter to 1 explicitly in sysctl.conf on both nodes:

[root@primrac1 ~]# cat /etc/sysctl.conf |grep core_uses_pid
kernel.core_uses_pid=1 

[root@primrac2 ~]# cat /etc/sysctl.conf |grep core_uses_pid
kernel.core_uses_pid=1

[root@primrac1 ~]# sysctl -p 
[root@primrac2 ~]# sysctl -p

[root@primrac1 ~]# sysctl -a|grep core_uses_pid 
kernel.core_uses_pid = 1

[root@primrac2 ~]# sysctl -a|grep core_uses_pid 
kernel.core_uses_pid = 1

Pressed Check Again button and GI configuration succeeded.

LGWR: Primary database is in MAXIMUM AVAILABILITY mode | ORA-16072: a minimum of one standby database destination is required

Problem:

One of our customer cloned database from a DG environment to a different host and tried to open it as a standalone database. Controlfile and datafiles still considered the database in maximum availability mode.

Errors after trying to open the database:

LGWR: Primary database is in MAXIMUM AVAILABILITY mode
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
LGWR: Minimum of 1 LGWR standby database required
Thu Jul 18 18:43:14 2019
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lgwr_39735_39805.trc:
ORA-16072: a minimum of one standby database destination is required

Solution:

SQL> startup mount;
SQL> alter database set standby database to maximize performance;
SQL> shutdown immediate;

$ srvctl start database -d orcl 

UDEV rules for configuring ASM disks

Problem:

During my previous installations I used the following udev rule on multipath devices:

KERNEL=="dm-[0-9]*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", RESULT=="360050768028200a9a40000000000001c", NAME="oracleasm/asm-disk1", OWNER="oracle", GROUP="asmadmin", MODE="0660"

So to identify the exact disk I used PROGRAM option. The above script looks through `/dev/dm-*` devices and if any of them satisfy the condition, for example:

# scsci_id -gud /dev/dm-3
360050768028200a9a40000000000001c 

then device name will be changed to /dev/oracleasm/asm-disk1, owner:group to grid:asmadmin and permission to 0660

But on my new servers same udev rule was not working anymore. (Of course, it needs more investigation, but our time is really valuable and never enough and if we know another solution that works and is acceptable- let’s just use it)

Solution:

I used udevadm command to identify other properties of these devices and wrote new udev rule (to see all properties, just remove grep):

# udevadm info --query=property --name /dev/mapper/asm1 | grep DM_UUID
DM_UUID=mpath-360050768028200a9a40000000000001c

New udev rule looks like this:

# cat /etc/udev/rules.d/99-oracle-asmdevices.rules
ENV{DM_UUID}=="mpath-360050768028200a9a40000000000001c",  SUBSYSTEM=="block", NAME="oracleasm/asm-disk1", OWNER="grid", GROUP="asmadmin", MODE="0660"

Trigger udev rules:

# udevadm trigger

Verify that name, owner, group and permissions are changed:

# ll /dev/oracleasm/
total 0
brw-rw---- 1 grid asmadmin 253, 3 Jul 17 17:33 asm-disk1

TNS-12518: TNS:listener could not hand off client connection | TNS-12547: TNS:lost contact

Problem:

In two-node cluster, client was not able to connect to the second node, but connection to the first node was successful.

Connection from SQL developer threw error: Status: Failure - Test failed: IO Error: Got minus one from a read call, connect lapse 16ms, Authentication lapse 0ms

Connection from sqlplus using TNS string showed:

[oracle@rac02 ~]$ sqlplus "sys@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac02.example.com)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=orcl)))" as sysdba

ORA-12537: TNS:connection closed

Listener.log showed:

 2019-07-18T11:19:23.568231+00:00
 TNS-12518: TNS:listener could not hand off client connection
  TNS-12547: TNS:lost contact
   TNS-12560: TNS:protocol adapter error
    TNS-00517: Lost contact
     Linux Error: 32: Broken pipe

Solution:

This problem can happen in other cases (entries in sqlnet.ora .. in our case it was ok) and we could think about network problem, because initially we were trying to connect from the application sever and from the SQL developer remotely. But after getting ORA-12537: TNS:connection closed error while trying to connect via sqlplus from the local server, we could only think about local non-network related problem.

The reason of this problem was that setuid bit was not set on /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle binary:

Problematic node:

[root@rac02 ~]# ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
 -rwxr-s--x 1 oracle asmadmin 408607040 Apr  4 19:51 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

Healthy node:

[oracle@rac01 ~]$ ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle asmadmin 408607040 Apr  4 19:48 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

We have set setuid bit on oracle binary in RDBMS home:

[root@rac02 ~]# chmod u+s /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

[root@rac02 ~]# ll /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle asmadmin 408607040 Apr  4 19:51 /u01/app/oracle/product/12.2.0/dbhome_1/bin/oracle

The problem was resolved without restarting the database instance, so clients were able to connect to the 2nd node. But because of it was staging cluster – I still restarted the database, I just made sure that database was started with the correct binary.

FlashGrid SkyCluster Now Supports Oracle Database 19c

FlashGrid SkyCluster Version 19.06 now has full support of GI/DB 19c, which means using FG launcher tool (https://www.flashgrid.io/skycluster-for-aws/#launch , https://www.flashgrid.io/skycluster-for-azure/#launch , https://www.flashgrid.io/skycluster-for-gcp/#launch ) , you can setup multi-node Real Application Clusters in the cloud automatically in about 2 hours.

“Oracle 19c is a long-term support release from Oracle with extended support available through 2026”

https://www.kb.flashgrid.io/release-notes/cloud-provisioningFlashGrid

sshd: /etc/ssh/sshd_config: Permission denied

Problem:

sshd and chronyd services on the database server were in a failed state and not able to start because of the permission problem on their configuration files. Permissions on these files were correct and services should have been able to start, so there was something else… let’s dig into the details.

# systemctl status sshd
 â sshd.service - OpenSSH server daemon
    Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
    Active: activating (auto-restart) (Result: exit-code) since Tue 2019-07-09 12:21:49 UTC; 32s ago
      Docs: man:sshd(8)
            man:sshd_config(5)
   Process: 124026 ExecStart=/usr/sbin/sshd -D $OPTIONS (code=exited, status=1/FAILURE)
Main PID: 124026 (code=exited, status=1/FAILURE)
Jul 09 12:21:49 node03 systemd[1]: Failed to start OpenSSH server daemon.
Jul 09 12:21:49 node03 systemd[1]: Unit sshd.service entered failed state.
Jul 09 12:21:49 node03 systemd[1]: sshd.service failed

`journalctl -xe` shows:

-- Unit sshd.service has begun starting up.
Jul 09 12:26:03 node03 sshd[129121]: /etc/ssh/sshd_config: Permission denied
Jul 09 12:26:03 node03 systemd[1]: sshd.service: main process exited, code=exited, status=1/FAILURE
Jul 09 12:26:03 node03 systemd[1]: Failed to start OpenSSH server daemon.
-- Subject: Unit sshd.service has failed

The same problem was happening with chronyd service. It was claiming about /etc/chrony.conf file. Incorrect time on database servers can cause node evictions.

Reason:

If permissions on these files are correct, we can think about SELinux, let’s check:

# getenforce 
Enforcing

Solution:

Disable SELinux and reboot the server:

# vim /etc/selinux/config
SELINUX=disabled

# reboot

Summary:

I consider SELinux as a non-desirable service on the database servers. But I appreciate opinion of my colleages/friends and I want to share it with you.

SELinux can be enabled with the correct config in RHEL 4,5,6 – “Starting with Oracle Database 11g Release 2 (11.2), the Security Enhanced Linux (SELinux) feature is supported for Oracle Linux 4, Oracle Linux 5, Oracle Linux 6, Red Hat Enterprise Linux 4, Red Hat Enterprise Linux 5, and Red Hat Enterprise Linux 6.
https://docs.oracle.com/cd/E11882_01/install.112/e47689/pre_install.htm#LADBI1092

SELinux is a good security tool and usually I only disable it as a last resort or if the software doesn’t support it.