Reduce high CPU usage by TFA

Problem:

Cluster nodes experienced high CPU usage, after investigation one of the top CPU consumers on the server has been found to be a TFA process (2nd place):

 # Fri Feb 19 17:44:01 2021
AllCPU  OneCPU  PID     User    PR      NI      STime   RSS     Name
--------------------------------------------------------------------------------
11.75%  94.02%  23895   root    20      0       17:43   87M     ora_m001_ORCL2
1.42%   11.39%  2468    root    20      0       Feb02   736M    /opt/oracle.ahf/jre/bin/java -server -Xms256m -Xmx512m -Djava.awt.headless=true -Ddisable.checkForUpdate=true -XX:HeapDumpPath=/u01/app/oracle.ahf/data/rac02/diag/tfa -XX:ParallelGCThreads=5 oracle.rat.tfa.TFAMain /opt/oracle.ahf/tfa

Workaround:

In newer version of TFA, you can set CPU resource limit.

tfactl setresourcelimit 
 [-tool tool_name] 
 [-resource resource_type] 
 [-value value]

To limit TFA to a maximum of 50% of a single CPU, run the following:

# tfactl setresourcelimit -value 0.5

For more information, please check TFA official documentation.

If you don’t have newer version of TFA, you need to upgrade it first.

Change AHF home from /opt/oracle.ahf to /u01/oracle.ahf

Problem:

One of our customers had 2GB space for /opt mount point. After running root.sh script during GI configuration, 926M sized /opt/oracle.ahf folder was created which caused problems later with the available space in /opt.

Please note root.sh runs TFA installation using the following way:

2020-07-07 09:41:10: CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2020-07-07 09:41:10: Executed stage SetupTFA in 0 seconds
2020-07-07 09:41:10: Executing cmd: /u01/app/19.3.0/grid/crs/install/tfa_setup -silent -crshome /u01/app/19.3.0/grid

tfa_setup has an option -ahf_loc which is the Autonomous Health Framework home and the default value for it is /opt/oracle.ahf

There is a question, how can we avoid exhausting /opt space used by AHF?

Solution:

Choose only one: 1,2 or 3.

1. Increase /opt mount point size
2. Or, uninstall TFA (which deletes /opt/oracle.ahf folder and releases space) and reinstall it by indicating -ahf_loc option

# tfactl uninstall
# mkdir /u01/oracle.ahf
# chmod 755 /u01/oracle.ahf
# /u01/app/19.3.0/grid/crs/install/tfa_setup -ahf_loc /u01/oracle.ahf
...
AHF Location : /u01/oracle.ahf
Choose Data Directory from below options :
1. /u01/oracle.ahf [Free Space : 41347 MB]
2. /u01/app [Free Space : 41347 MB]
3. Enter a different Location

Choose Option [1 - 3] : 1
AHF Data Directory : /u01/oracle.ahf/data

Do you want to add AHF Notification Email IDs ? [Y]|N : N
...

3. Or, change default location for AHF home (AHF_HOME) before running root.sh script:

# mkdir /u01/oracle.ahf
# chmod 755 /u01/oracle.ahf
# export AHF_HOME=/u01/oracle.ahf
# /u01/app/19.3.0/grid/root.sh

Check that AHF home was created under /u01/oracle.ahf instead of /opt/oracle.ahf

# ll /opt|grep oracle.ahf

# ll /u01|grep oracle.ahf
drwxr-xr-x 10 root root 134 Jul 7 12:46 oracle.ahf