Reduce high CPU usage by TFA

Problem:

Cluster nodes experienced high CPU usage, after investigation one of the top CPU consumers on the server has been found to be a TFA process (2nd place):

 # Fri Feb 19 17:44:01 2021
AllCPU  OneCPU  PID     User    PR      NI      STime   RSS     Name
--------------------------------------------------------------------------------
11.75%  94.02%  23895   root    20      0       17:43   87M     ora_m001_ORCL2
1.42%   11.39%  2468    root    20      0       Feb02   736M    /opt/oracle.ahf/jre/bin/java -server -Xms256m -Xmx512m -Djava.awt.headless=true -Ddisable.checkForUpdate=true -XX:HeapDumpPath=/u01/app/oracle.ahf/data/rac02/diag/tfa -XX:ParallelGCThreads=5 oracle.rat.tfa.TFAMain /opt/oracle.ahf/tfa

Workaround:

In newer version of TFA, you can set CPU resource limit.

tfactl setresourcelimit 
 [-tool tool_name] 
 [-resource resource_type] 
 [-value value]

To limit TFA to a maximum of 50% of a single CPU, run the following:

# tfactl setresourcelimit -value 0.5

For more information, please check TFA official documentation.

If you don’t have newer version of TFA, you need to upgrade it first.

Advertisement

Change AHF home from /opt/oracle.ahf to /u01/oracle.ahf

Problem:

One of our customers had 2GB space for /opt mount point. After running root.sh script during GI configuration, 926M sized /opt/oracle.ahf folder was created which caused problems later with the available space in /opt.

Please note root.sh runs TFA installation using the following way:

2020-07-07 09:41:10: CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2020-07-07 09:41:10: Executed stage SetupTFA in 0 seconds
2020-07-07 09:41:10: Executing cmd: /u01/app/19.3.0/grid/crs/install/tfa_setup -silent -crshome /u01/app/19.3.0/grid

tfa_setup has an option -ahf_loc which is the Autonomous Health Framework home and the default value for it is /opt/oracle.ahf

There is a question, how can we avoid exhausting /opt space used by AHF?

Solution:

Choose only one: 1,2 or 3.

1. Increase /opt mount point size
2. Or, uninstall TFA (which deletes /opt/oracle.ahf folder and releases space) and reinstall it by indicating -ahf_loc option

# tfactl uninstall
# mkdir /u01/oracle.ahf
# chmod 755 /u01/oracle.ahf
# /u01/app/19.3.0/grid/crs/install/tfa_setup -ahf_loc /u01/oracle.ahf
...
AHF Location : /u01/oracle.ahf
Choose Data Directory from below options :
1. /u01/oracle.ahf [Free Space : 41347 MB]
2. /u01/app [Free Space : 41347 MB]
3. Enter a different Location

Choose Option [1 - 3] : 1
AHF Data Directory : /u01/oracle.ahf/data

Do you want to add AHF Notification Email IDs ? [Y]|N : N
...

3. Or, change default location for AHF home (AHF_HOME) before running root.sh script:

# mkdir /u01/oracle.ahf
# chmod 755 /u01/oracle.ahf
# export AHF_HOME=/u01/oracle.ahf
# /u01/app/19.3.0/grid/root.sh

Check that AHF home was created under /u01/oracle.ahf instead of /opt/oracle.ahf

# ll /opt|grep oracle.ahf

# ll /u01|grep oracle.ahf
drwxr-xr-x 10 root root 134 Jul 7 12:46 oracle.ahf

Upgrading/Installing TFA with OSWatcher

The whole process is very simple and straightforward.
Post seems big but most of the content is a command output.

1. Download TFA Collector – TFA with Database Support Tools Bundle from Doc ID 1513912.1

2. Place downloaded zip file on rac1 and unzip it:

# cd /u01/app/sw
# ll
…
-rw-r--r-- 1 root root      264751391 Apr 25 19:18 TFA-LINUX_v19.2.1

# unzip TFA-LINUX_v19.2.1 

3. Install TFA:

[root@rac1 sw]# ./installTFA-LINUX 

TFA Installation Log will be written to File : /tmp/tfa_install_21556_2019_06_03-10_39_10.log
Starting TFA installation
 TFA Version: 192100 Build Date: 201904251105
 TFA HOME : /u01/app/12.2.0/grid/tfa/rac1/tfa_home
 Installed Build Version: 184100 Build Date: 201902262137
 TFA is already installed. Upgrading TFA
 TFA Upgrade Log : /u01/app/12.2.0/grid/tfa/rac1/tfapatch.log
 TFA will be upgraded on : 
 rac1
 rac2
 Do you want to continue with TFA Upgrade ? [Y|N] [Y]: Y
 Checking for ssh equivalency in rac2
 Node rac2 is not configured for ssh user equivalency
 SSH is not configured on these nodes : 
 rac2
 Do you want to configure SSH on these nodes ? [Y|N] [Y]: N
 Patching remote nodes using TFA Installer /u01/app/sw/installTFA-LINUX…
 Copying TFA Installer to rac2…
 lost connection
 Starting TFA Installer on rac2…
 Upgrading TFA on rac1 :
 Stopping TFA Support Tools…
 Shutting down TFA for Patching…
 Shutting down TFA
 Removed symlink /etc/systemd/system/multi-user.target.wants/oracle-tfa.service.
 Removed symlink /etc/systemd/system/graphical.target.wants/oracle-tfa.service.
 Successfully shutdown TFA..
 No Berkeley DB upgrade required
 Copying TFA Certificates…
 Starting TFA in rac1…
 Starting TFA..
 Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
 Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
 Waiting up to 100 seconds for TFA to be started..
 . . . . . 
 Successfully started TFA Process..
 . . . . . 
 TFA Started and listening for commands
 Enabling Access for Non-root Users on rac1…
 Connection refused!rac2
 RemoteUtil : Connection refused!rac2
 .------------------------------------------------------------.
 | Host | TFA Version | TFA Build ID         | Upgrade Status |
 +------+-------------+----------------------+----------------+
 | rac1 |  19.2.1.0.0 | 19210020190425110550 | UPGRADED       |
 | rac2 | -           | -                    | NOT UPGRADED   |
 '------+-------------+----------------------+----------------'

The reason why it did not upgrade on rac2, is that I did not have ssh equivalency between nodes for root user.

I could enable ssh passwordless authentication and TFA would be upgraded in one step, but because of the security I will not enable it and just manually install TFA on the second node:

4. Copy installation file to rac2 and install:

[root@rac2 ~]# mkdir -p /u01/app/sw/
[root@rac2 ~]# chmod -R 777 /u01/app/sw/
[root@rac2 ~]# su - oracle
[oracle@rac2 ~]$ cd /u01/app/sw/  
[oracle@rac2 sw]$ scp rac1:/u01/app/sw/installTFA-LINUX .
installTFA-LINUX                           100%  254MB 114.9MB/s   00:02 
[root@rac2 sw]# ./installTFA-LINUX 
 TFA Installation Log will be written to File : /tmp/tfa_install_15370_2019_06_03-10_50_05.log
 Starting TFA installation
 TFA Version: 192100 Build Date: 201904251105
 TFA HOME : /u01/app/12.2.0/grid/tfa/rac2/tfa_home
 Installed Build Version: 184100 Build Date: 201902262137
 TFA is already installed. Upgrading TFA
 TFA Upgrade Log : /u01/app/12.2.0/grid/tfa/rac2/tfapatch.log
 TFA-00002 Oracle Trace File Analyzer (TFA) is not running
 TFA-00002 Oracle Trace File Analyzer (TFA) is not running
 Unable to determine the status of TFA in other nodes.
 TFA will be upgraded on Node rac2:
 Do you want to continue with TFA Upgrade ? [Y|N] [Y]: 
 Upgrading TFA on rac2 :
 Stopping TFA Support Tools…
 Shutting down TFA for Patching…
 Shutting down TFA
 Removed symlink /etc/systemd/system/multi-user.target.wants/oracle-tfa.service.
 Removed symlink /etc/systemd/system/graphical.target.wants/oracle-tfa.service.
 . . . . . 
 . . . 
 Successfully shutdown TFA..
 No Berkeley DB upgrade required
 Copying TFA Certificates…
 Starting TFA in rac2…
 Starting TFA..
 Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
 Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
 Waiting up to 100 seconds for TFA to be started..
 . . . . . 
 Successfully started TFA Process..
 . . . . . 
 TFA Started and listening for commands
 Enabling Access for Non-root Users on rac2…
 .------------------------------------------------------------.
 | Host | TFA Version | TFA Build ID         | Upgrade Status |
 +------+-------------+----------------------+----------------+
 | rac2 |  19.2.1.0.0 | 19210020190425110550 | UPGRADED       |
 | rac1 |  19.2.1.0.0 | 19210020190425110550 | UPGRADED       |
 '------+-------------+----------------------+----------------'

5. Stop and Start TFA on rac1 and rac2:

# tfactl stop
Sending stoptfa
Success
Stopping TFA from the Command Line
Nothing to do !
LCM is not running
TFA is running  - Will wait 5 seconds (up to 3 times)  
TFA-00104 Cannot establish connection with TFA Server. Please check TFA Certificates
Killing TFA running with pid 16627
. . . 
Successfully stopped TFA..

# tfactl start
TFA-00002 Oracle Trace File Analyzer (TFA) is not running
Starting TFA..
Waiting up to 100 seconds for TFA to be started..
Successfully started TFA Process..

# ps -ef|grep OSWatcher
root     14528     1  0 May30 ?        00:44:11 /bin/sh ./OSWatcher.sh
root     14850 14528  0 May30 ?        00:00:58 /bin/sh ./OSWatcherFM.sh 48 /home/fg/oswbb/archive

6. Check OSWatcher repository location:

# ll /home/fg/oswbb/archive

drwxr-xr-x 2 root root  202 May 30 22:00 oswcpuinfo
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswifconfig
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswiostat
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswmeminfo
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswmpstat
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswnetstat
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswnfsiostat
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswpidstat
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswpidstatd
drwxr-xr-x 2 root root    6 May 30 21:56 oswprvtnet
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswps
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswslabinfo
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswtop
drwxr-xr-x 2 root root 4096 Jun  3 11:00 oswvmstat
drwxr-xr-x 2 root root    6 May 30 21:56 oswxentop