Boot in single user mode and rescue your RHEL7

Problem:

One of our customer incorrectly changed fstab file and rebooted the OS. As a result, VM was not able to start. Fortunately, cloud where this VM was located supported serial console.

Solution:

We booted in single user mode through serial console and reverted the changes back. To boot in single user mode and update necessary file, do as follows:

Connect to the serial console and while OS is booting in a grub menu press e to edit the selected kernel:

Find line that starts with linux16 ( if you don’t see it press arrow down ), go to the end of this line and type rd.break.

Press ctrl+x.

Wait for a while and system will enter into single user mode:

During this time /sysroot is mounted in read only mode, you need to remount it in read write:

switch_root:/# mount -o remount,rw /sysroot
switch_root:/# chroot /sysroot

You can revert any changes back by updating any file, in our case we updated fstab:

sh-4.2# vim /etc/fstab

You are a real hero, because you rescued your system!

Resize ASM disks in AWS (FG enabled cluster)

  1. Connect to AWS console https://console.aws.amazon.com
  2. On the left side -> under the section ELASTIC BLOCK STORE -> choose Volumes
  3. Choose necessary disk -> click Actions button -> choose Modify Volume -> change Size
    Please note that all data disks (not quorum disk) must be increased under the same diskgroup, otherwise ASM will not let you to have different sized disks.

Choose another data disks and repeat the same steps.

4. Run the following on database nodes via root user:

# for i in /sys/block/*/device/rescan; do echo 1 > $i; done

5. Check that disks have correct sizes:

# flashgrid-node

6. Connect to the ASM instance from any database node and run:

[grid@rac1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 19.0.0.0.0 - Production on Fri Aug 23 10:17:50 2019
Version 19.4.0.0.0
Copyright (c) 1982, 2019, Oracle.  All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.4.0.0.0

SQL> alter diskgroup GRID resize all; 
Diskgroup altered.

Can’t call method “uid” on an undefined value at …DBUtilServices.pm line 28.

Problem:

opatchauto on GI fails with the following error:

# /u01/app/18.0.0/grid/OPatch/opatchauto apply /0/grid/29301682  -oh /u01/app/18.0.0/grid
 Can't call method "uid" on an undefined value at /u01/app/18.0.0/grid/OPatch/auto/database/bin/module/DBUtilServices.pm line 28.

Reason:

  1. GI is not setup yet. You may have unzipped GI installation file, but have not run gridSetup.sh
  2. $GI_HOME/oraInst.loc is missing.

Solution:

  1. Setup GI by running gridSetup.sh
  2. Copy the oraInst.loc from the other node, if you don’t have another node then please see the file content bellow:
# cat /u01/app/18.0.0/grid/oraInst.loc
inst_group=oinstall
inventory_loc=/u01/app/oraInventory

PRVG-11069 : IP address “169.254.0.2” of network interface “idrac” on the node “primrac1” would conflict with HAIP usage

Problem:

Oracle 18c GI configuration precheck was failing with the following error:

Summary of node specific errors 

primrac2  - PRVG-11069 : IP address "169.254.0.2" of network interface "idrac" on the node "primrac2" would conflict with HAIP usage.  
- Cause:  One or more network interfaces have IP addresses in the range (169.254..), the range used by HAIP which can create routing conflicts.  
- Action:  Make sure there are no IP addresses in the range (169.254..) on any network interfaces. 

primrac1  - PRVG-11069 : IP address "169.254.0.2" of network interface "idrac" on the node "primrac1" would conflict with HAIP usage. 
- Cause:  One or more network interfaces have IP addresses in the range (169.254..), the range used by HAIP which can create routing conflicts.  
- Action:  Make sure there are no IP addresses in the range (169.254..) on any network interfaces.  

On each node additional network interface – named idrac was started with the ip address 169.254.0.2. I tried to set static ip address in /etc/sysconfig/network-scripts/ifcfg-idrac , also tried to bring the interface down – but after some time interface was starting up automatically and getting the same ip address.

Cluster nodes were DELL servers with Dell Remote Access Controller(iDRAC) Service Module installed. For more information about this module installation/deinstallation… can be found here https://topics-cdn.dell.com/pdf/idrac-service-module-v32_users-guide_en-us.pdf

Servers were configured by system administrator and was not clear why this module was there, we are not using iDRAC module and the only option that we had was to remove/uninstall that module. (configuring module should also be possible to avoid such situation, but we keep our servers as clean as possible without having unsed services)

Solution:

Uninstalled iDRAC module (also expained in the above pdf):

# rpm -e dcism 

After uninstalling it idrac interface did not started anymore, so we could continue GI configuration.

Presentation: Oracle GoldenGate Microservices Overview (with DEMO)

Webinar: Oracle GoldenGate Microservices Overview (with DEMO)

PRVF-6402 : Core file name pattern is not same on all the nodes

Problem:

Oracle 18c GI configuration prerequisite checks failed with the following error:

PRVF-6402 : Core file name pattern is not same on all the nodes. Found core filename pattern "core" on nodes "primrac1". Found core filename pattern "core.%p" on nodes "primrac2".  
- Cause:  The core file name pattern is not same on all the nodes.  
- Action:  Ensure that the mechanism for core file naming works consistently on all the nodes. Typically for Linux, the elements to look into are the contents of two files /proc/sys/kernel/core_pattern or /proc/sys/kernel/core_uses_pid. Refer OS vendor documentation for platforms AIX, HP-UX, and Solaris.

Comparing parameter values on both nodes:

[root@primrac1 ~]# cat /proc/sys/kernel/core_uses_pid
0
[root@primrac2 ~]# cat /proc/sys/kernel/core_uses_pid
1 

[root@primrac1 ~]# sysctl -a|grep core_uses_pid
kernel.core_uses_pid = 0

[root@primrac2 ~]# sysctl -a|grep core_uses_pid
kernel.core_uses_pid = 1

Strange fact was that this parameter was not defined explicitly in sysctl.conf file, but still had different default values:

[root@primrac1 ~]# cat /etc/sysctl.conf |grep core_uses_pid
[root@primrac2 ~]# cat /etc/sysctl.conf |grep core_uses_pid 

Solution:

I’ve set parameter to 1 explicitly in sysctl.conf on both nodes:

[root@primrac1 ~]# cat /etc/sysctl.conf |grep core_uses_pid
kernel.core_uses_pid=1 

[root@primrac2 ~]# cat /etc/sysctl.conf |grep core_uses_pid
kernel.core_uses_pid=1

[root@primrac1 ~]# sysctl -p 
[root@primrac2 ~]# sysctl -p

[root@primrac1 ~]# sysctl -a|grep core_uses_pid 
kernel.core_uses_pid = 1

[root@primrac2 ~]# sysctl -a|grep core_uses_pid 
kernel.core_uses_pid = 1

Pressed Check Again button and GI configuration succeeded.