‘udev’ rules continuously being reloaded resulted in ASM nvme disks going offline

Environment:

Linux Server release 7.2
kernel-3.10.0-514.26.2.el7.x86_64

Problem:

When Oracle processes are opening the device for writing and then closing it, this synthesizes a change event. And udev rules having  ACTION=="add|change" gets reloaded. This behavior causes ASM nvme disks to go offline:

Thu Jul 09 16:33:16 2020
WARNING: Disk 18 (rac1$disk1) in group 2 mode 0x7f is now being offlined

Fri Jul 10 10:04:34 2020
WARNING: Disk 19 (rac1$disk5) in group 2 mode 0x7f is now being offlined

Fri Jul 10 13:45:45 2020
WARNING: Disk 15 (rac1$disk8) in group 2 mode 0x7f is now being offlined

Solution:

To suppress the false positive change events disable the inotify watch for devices used for Oracle ASM using following steps:

  1.  Create /etc/udev/rules.d/96-nvme-nowatch.rules file with just one line in it:
ACTION=="add|change", KERNEL=="nvme*", OPTIONS:="nowatch"

2. After creating the file please run the following to activate the rule:

# udevadm control --reload-rules
# udevadm trigger --type=devices --action=change

The above command will reload the complete udev configuration and will trigger all the udev rules. On a busy production system this could disrupt ongoing operations, applications running on the server. Please use the above command during a scheduled maintenance window only.

Source: https://access.redhat.com/solutions/1465913 + our experience with customers.

Advertisement

UDEV rules for configuring ASM disks

Problem:

During my previous installations I used the following udev rule on multipath devices:

KERNEL=="dm-[0-9]*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", RESULT=="360050768028200a9a40000000000001c", NAME="oracleasm/asm-disk1", OWNER="oracle", GROUP="asmadmin", MODE="0660"

So to identify the exact disk I used PROGRAM option. The above script looks through `/dev/dm-*` devices and if any of them satisfy the condition, for example:

# scsci_id -gud /dev/dm-3
360050768028200a9a40000000000001c 

then device name will be changed to /dev/oracleasm/asm-disk1, owner:group to grid:asmadmin and permission to 0660

But on my new servers same udev rule was not working anymore. (Of course, it needs more investigation, but our time is really valuable and never enough and if we know another solution that works and is acceptable- let’s just use it)

Solution:

I used udevadm command to identify other properties of these devices and wrote new udev rule (to see all properties, just remove grep):

# udevadm info --query=property --name /dev/mapper/asm1 | grep DM_UUID
DM_UUID=mpath-360050768028200a9a40000000000001c

New udev rule looks like this:

# cat /etc/udev/rules.d/99-oracle-asmdevices.rules
ENV{DM_UUID}=="mpath-360050768028200a9a40000000000001c",  SUBSYSTEM=="block", NAME="oracleasm/asm-disk1", OWNER="grid", GROUP="asmadmin", MODE="0660"

Trigger udev rules:

# udevadm trigger

Verify that name, owner, group and permissions are changed:

# ll /dev/oracleasm/
total 0
brw-rw---- 1 grid asmadmin 253, 3 Jul 17 17:33 asm-disk1