Reduce high CPU usage by TFA

Problem:

Cluster nodes experienced high CPU usage, after investigation one of the top CPU consumers on the server has been found to be a TFA process (2nd place):

 # Fri Feb 19 17:44:01 2021
AllCPU  OneCPU  PID     User    PR      NI      STime   RSS     Name
--------------------------------------------------------------------------------
11.75%  94.02%  23895   root    20      0       17:43   87M     ora_m001_ORCL2
1.42%   11.39%  2468    root    20      0       Feb02   736M    /opt/oracle.ahf/jre/bin/java -server -Xms256m -Xmx512m -Djava.awt.headless=true -Ddisable.checkForUpdate=true -XX:HeapDumpPath=/u01/app/oracle.ahf/data/rac02/diag/tfa -XX:ParallelGCThreads=5 oracle.rat.tfa.TFAMain /opt/oracle.ahf/tfa

Workaround:

In newer version of TFA, you can set CPU resource limit.

tfactl setresourcelimit 
 [-tool tool_name] 
 [-resource resource_type] 
 [-value value]

To limit TFA to a maximum of 50% of a single CPU, run the following:

# tfactl setresourcelimit -value 0.5

For more information, please check TFA official documentation.

If you don’t have newer version of TFA, you need to upgrade it first.

Create shortcuts for frequently accessed servers

Life is too short, that’s why it’s mandatory to use shortcuts… Instead of typing frequently used ssh client options such as port, user, hostname, identity-file and so on, you can save that information in sshd config file and then access it with defined alias.

  • System wide config file location is /etc/ssh/ssh_config
  • User specific config file location is ~/.ssh/config same as $HOME/.ssh/config

Instead of connecting to the server everytime using the following command:

# ssh root@95.80.12.10 -i ~/.ssh/my_id_rsa

Save the following entries in ~/.ssh/config file:

# vim ~/.ssh/config
Host my_db
HostName 95.80.12.10
IdentityFile ~/.ssh/my_id_rsa
User root

And connect to the server using this simple way:

# ssh my_db

For other options check https://linuxize.com/post/using-the-ssh-config-file/

asmcmd does not return error text for root user

Problem:

While running asmcmd commands via root user, error messages are not displayed. But if we run the same command via grid – it returns messages. In the following example, ORACLE_HOME is set to RDBMS home (instead of GI) and we are trying to list the file which does not exist:

[root@rac1 ~]# echo $ORACLE_HOME
/u01/app/oracle/product/19.3.0/dbhome_1

[root@rac1 ~]# echo $PATH
/u01/app/19.3.0/grid/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin

[root@rac1 ~]# asmcmd ls +demodg/orcl/tempfile/TEMP.263.10617958
<<<<<<Nothing is displayed here

When a file does not exist, ls command should display ASMCMD-8002: entry 'TEMP.263.10617958' does not exist in directory '+demodg/orcl/tempfile/' error. But in our example, it does not return anything.

In the following example, if we try to copy a file, we see a message about copying a file but actually, the file is not copied because grid user does not have permission under /u01:

[root@rac1 ~]# asmcmd cp +demodg/orcl/tempfile/TEMP.263.1061795851 /u01
copying +demodg/orcl/tempfile/TEMP.263.1061795851 -> /u01/TEMP.263.1061795851

If we run the same command via grid, we get understandable error message:

[grid@rac1 ~]$  asmcmd cp +demodg/orcl/tempfile/TEMP.263.1061795851 /u01
 ASMCMD-9463: operation failed due to lack of write permissions

Reason:

Environment variables are not set correctly. ORACLE_HOME should be pointing to GI home.

Solution:

[root@rac1 ~]# export ORACLE_HOME=/u01/app/19.3.0/grid

[root@rac1 ~]# asmcmd ls +demodg/orcl/tempfile/TEMP.263.10617958
ASMCMD-8002: entry 'TEMP.263.10617958' does not exist in directory '+demodg/orcl/tempfile/'

Upgrade the database time zone file using the DBMS_DST package in 19c

After upgrading database from 12c to 19c, you may need to upgrade database time zone file version. This step is not always mandatory, but it is recommended by pre-upgrade checker.

Useful info about its necessity from Oracle site: https://oracle-base.com/articles/misc/update-database-time-zone-file

From Oracle 11gR2 onward, new time zone files are shipped with upgrades and patches, but they are not automatically applied to the database.

Applying a change to the database time zone file not only affects the way new data is handled, but potentially alters data stored in TIMESTAMP WITH TIME ZONE columns, so you need to consider the impact of this before upgrading the time zone file.

Remember, if you only deal with dates in your country, and your country has not altered its time zone or daylight saving time policy, this upgrade may not be necessary.”

Now let’s do time zone file upgrade:

1. Check current settings:

SQL> SELECT * FROM v$timezone_file;

FILENAME		VERSION     CON_ID
-------------------- ---------- ----------
timezlrg_26.dat 	     26 	 0

2. Startup database in upgrade mode:

SQL> shutdown immediate;
SQL> startup upgrade;

3. Start upgrade window:

SQL> SET SERVEROUTPUT ON
SQL> DECLARE
  l_tz_version PLS_INTEGER;
BEGIN
  l_tz_version := DBMS_DST.get_latest_timezone_version;

  DBMS_OUTPUT.put_line('l_tz_version=' || l_tz_version);
  DBMS_DST.begin_upgrade(l_tz_version);
END;
/


l_tz_version=32
An upgrade window has been successfully started.

PL/SQL procedure successfully completed.

4. Check primary and secondary time zone versions:

SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME LIKE 'DST_%'
ORDER BY PROPERTY_NAME; 

PROPERTY_NAME		 VALUE
------------------------ ----------
DST_PRIMARY_TT_VERSION	 32
DST_SECONDARY_TT_VERSION 26
DST_UPGRADE_STATE	 UPGRADE

5. Startup database in normal mode:

SQL> shut immediate;
SQL> startup;

6. Do the upgrade:

SQL> SET SERVEROUTPUT ON
SQL> DECLARE
  l_failures   PLS_INTEGER;
BEGIN
  DBMS_DST.upgrade_database(l_failures);
  DBMS_OUTPUT.put_line('DBMS_DST.upgrade_database : l_failures=' || l_failures);
  DBMS_DST.end_upgrade(l_failures);
  DBMS_OUTPUT.put_line('DBMS_DST.end_upgrade : l_failures=' || l_failures);
END;
/


Table list: "GSMADMIN_INTERNAL"."AQ$_CHANGE_LOG_QUEUE_TABLE_S"
Number of failures: 0
Table list: "GSMADMIN_INTERNAL"."AQ$_CHANGE_LOG_QUEUE_TABLE_L"
Number of failures: 0
Table list: "MDSYS"."SDO_DIAG_MESSAGES_TABLE"
Number of failures: 0
Table list: "DVSYS"."SIMULATION_LOG$"
Number of failures: 0
Table list: "DVSYS"."AUDIT_TRAIL$"
Number of failures: 0
DBMS_DST.upgrade_database : l_failures=0
An upgrade window has been successfully ended.
DBMS_DST.end_upgrade : l_failures=0

PL/SQL procedure successfully completed.

7. Check new settings:

SQL> SELECT * FROM v$timezone_file;

FILENAME		VERSION     CON_ID
-------------------- ---------- ----------
timezlrg_32.dat 	     32 	 0


SQL> COLUMN property_name FORMAT A30
SQL> COLUMN property_value FORMAT A20

SQL> SELECT property_name, property_value
FROM   database_properties
WHERE  property_name LIKE 'DST_%'
ORDER BY property_name;

PROPERTY_NAME		       PROPERTY_VALUE
------------------------------ --------------------
DST_PRIMARY_TT_VERSION	       32
DST_SECONDARY_TT_VERSION       0
DST_UPGRADE_STATE	       NONE

ORA-56920: a prepare or upgrade window or an on-demand or datapump-job loading of a secondary time zone data file is in an active state

Problem:

During DTS upgrade, while following the steps mentioned here, got the following error:

SQL> DECLARE
  l_tz_version PLS_INTEGER;
BEGIN
  l_tz_version := DBMS_DST.get_latest_timezone_version;
  DBMS_OUTPUT.put_line('l_tz_version=' || l_tz_version);
  DBMS_DST.begin_upgrade(l_tz_version);
END;
/ 

DECLARE
*
ERROR at line 1:
ORA-56920: a prepare or upgrade window or an on-demand or datapump-job loading
of a secondary time zone data file is in an active state
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79
ORA-06512: at "SYS.DBMS_DST", line 1208
ORA-06512: at line 6

Solution:

The error simply means that this step is already done and you can continue with other steps, make sure that DST_UPGRADE_STATE column in the following query shows UPGRADE value.

SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME LIKE 'DST_%'
ORDER BY PROPERTY_NAME;

PROPERTY_NAME		 VALUE
------------------------ ----------
DST_PRIMARY_TT_VERSION	 32
DST_SECONDARY_TT_VERSION 26
DST_UPGRADE_STATE	 UPGRADE

there is no need to do anything, continue with the next DTS upgrade steps.

PRCD-1229 : An attempt to access configuration of database orcl was rejected because its version 12.2.0.1.0 differs from the program version 19.0.0.0.0

Problem:

After a manual database upgrade from 12c to 19c, I was not able to start database instance using srvctl.

[oracle@rac1 ~]$ srvctl start instance -db orcl -n rac1

PRCD-1027 : Failed to retrieve database orcl
PRCD-1229 : An attempt to access configuration of database orcl was rejected because its version 12.2.0.1.0 differs from the program version 19.0.0.0.0. Instead run the program from /u01/app/oracle/product/12.2.0/dbhome_1.

Solution:

Use srvctl upgrade from new home:

$ /u01/app/oracle/product/19.3.0/dbhome_1/bin/srvctl upgrade database -d orcl -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1

Try to start again:

$ srvctl start instance -db orcl -n rac1

ORA-39701: database must be mounted EXCLUSIVE for UPGRADE or DOWNGRADE

Problem:

In cluster environment, I was not able to start database in upgrade mode:

SQL> startup upgrade

ORACLE instance started.
Total System Global Area 1996486272 bytes
Fixed Size		    8898176 bytes
Variable Size		  704643072 bytes
Database Buffers	 1275068416 bytes
Redo Buffers		    7876608 bytes
Database mounted.
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-39701: database must be mounted EXCLUSIVE for UPGRADE or DOWNGRADE

The following also did not work:

SQL> startup mount exclusive

ORACLE instance started.
Total System Global Area 1996486272 bytes
Fixed Size		    8898176 bytes
Variable Size		  704643072 bytes
Database Buffers	 1275068416 bytes
Redo Buffers		    7876608 bytes
Database mounted.

SQL> alter database open upgrade;
alter database open upgrade
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-39701: database must be mounted EXCLUSIVE for UPGRADE or DOWNGRADE

Solution:

Change parameter cluster_database to FALSE, and startup in upgrade mode:

$ sqlplus / as sysdba

SQL> startup nomount;

SQL> alter system set cluster_database=FALSE scope=spfile sid='*';

SQL> shutdown immediate;

SQL> startup upgrade

ORACLE instance started.
Total System Global Area 1996486272 bytes
Fixed Size		    8898176 bytes
Variable Size		  704643072 bytes
Database Buffers	 1275068416 bytes
Redo Buffers		    7876608 bytes
Database mounted.
Database opened.

After finishing your work, don’t forget to return cluster_database parameter to TRUE and restart your database:

SQL> alter system set cluster_database=TRUE scope=spfile sid='*';

Manually installing the HR Schema in 19c

When creating a database using the dbca there is an option to install Sample Schemas automatically, but if you’ve missed that step then here are the steps to manually install HR schema (only) in your database:

1. Log on to sqlplus as SYS AS SYSDBA:

$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Aug 20 11:44:54 2020
Version 19.7.0.0.0

SQL> 

2. Run hr_main.sql script using the following command:

SQL> @?/demo/schema/human_resources/hr_main.sql

specify password for HR as parameter 1:
Enter value for 1: hr

specify default tablespeace for HR as parameter 2:
Enter value for 2: users

specify temporary tablespace for HR as parameter 3:
Enter value for 3: temp

specify log path as parameter 4:
Enter value for 4: $ORACLE_HOME/demo/schema/log/

PL/SQL procedure successfully completed.
...
******  Creating REGIONS table ....
...
******  Creating COUNTRIES table ....
...
******  Creating LOCATIONS table ....
...
******  Creating DEPARTMENTS table ....
...
******  Creating JOBS table ....
...
******  Creating EMPLOYEES table ....
...
******  Creating JOB_HISTORY table ....
...
******  Creating EMP_DETAILS_VIEW view ...
...
******  Populating REGIONS table ....
...
******  Populating COUNTIRES table ....
...
PL/SQL procedure successfully completed.

3. Verify that the schema was created:

SQL> SELECT table_name FROM dba_tables where owner='HR';

TABLE_NAME
------------------------------------------------------------
REGIONS
COUNTRIES
LOCATIONS
DEPARTMENTS
JOBS
EMPLOYEES
JOB_HISTORY

7 rows selected.

Increase root volume size in AWS

1. Increase disk size from AWS console:

Volumes -> choose disk -> Actions -> Modify Volume -> enter new value in Size field:

2. Use the lsblk command to display information about the block devices attached to your instance:

# lsblk
..
nvme0n1     259:11   0   40G  0 disk
└─nvme0n1p1 259:12   0   35G  0 part /

The root volume, /dev/nvme0n1, has a partition, /dev/nvme0n1p1. While the size of the root volume reflects the new size, 40 GB, the size of the partition reflects the original size, 35 GB, and must be extended before you can extend the file system.

3. To extend the partition on the root volume, use the following growpart command:

# growpart /dev/nvme0n1 1

CHANGED: partition=1 start=2048 old: size=73398239 end=73400287 new: size=83883999 end=83886047

4. Verify new size:

# lsblk
..
nvme0n1     259:11   0   40G  0 disk
└─nvme0n1p1 259:12   0   40G  0 part /

5. Use the df -Th command to verify the size of the file system for root volume:

# df -Th /

Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/nvme0n1p1 xfs    35G  4.2G   31G  12% /

6. From the above output, we see that filesystem is XFS, grow it using xfs_growfs

# xfs_growfs -d /

meta-data=/dev/nvme0n1p1         isize=256    agcount=36, agsize=262080 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=9174779, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 9174779 to 10485499

If filesystem is ext2, ext3, or ext4, you should use resize2fs.

7. Check new size:

# df -Th /

Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/nvme0n1p1 xfs    40G  4.2G   36G  11% /

TCPS configuration: ora.LISTENER.lsnr is in INTERMEDIATE state

Problem

After configuring TCPS endpoint in listener configuration using the following way:

In srvctl:

[grid@rac1 ~]$ srvctl modify listener -p "TCP:1522/TCPS:1524"

In listener.ora file:

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac1.example.com)(PORT = 1522))
      (ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
      (ADDRESS = (PROTOCOL = TCPS)(HOST =rac1.example.com)(PORT = 1524))
    ))

So from the above, we see that I’ve configured as static endpoint registration as well as dynamic.

Restarted listener and saw that ora.LISTENER.lsnr resource left in the INTERMEDIATE state:

# srvctl stop listener;srvctl start listener

# crsctl status res -t|head
...
ora.LISTENER.lsnr 
ONLINE INTERMEDIATE rac1 Not All Endpoints Registered,STABLE
ONLINE INTERMEDIATE rac2 Not All Endpoints Registered,STABLE

Troubelshooting

After checking listener.log file, found the following messages:

Dynamic address is already listened on (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=rac1.example.com)(PORT=1524)))
Dynamic address is already listened on (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac1.example.com)(PORT=1522)))
Dynamic address is already listened on (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac1.example.com)(PORT=1522)))

So, due to the static registration, the dynamic failed and Agent thought it could not register endpoints, which left resource in the INTERMEDIATE state (which should not be normal I think) 

I tried to disable dynamic registration by setting ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER to OFF in listener.ora file, but after modifying and restarting listener using srvctl Oracle Agent changed the value back to ON. I could not find a way to disable it so far.

The second option was to remove static entries from listener.ora file and leave only dynamic registration, but after that, the important endpoint on TCPS/1524 was not registered:

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=192.168.1.11)(PORT=1524)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.1)(PORT=1522)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.11)(PORT=1522)))

The endpoint that I want is 192.168.1.1:1524 and not 192.168.1.11:1524

Workaround

Configured dynamic registration in srvctl for TCP/1522 port only:

[grid@rac1 ~]$ srvctl modify listener -p "TCP:1522"

Configured static registration in listener.ora file for TCPS/1524:

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
      (ADDRESS = (PROTOCOL = TCPS)(HOST =rac1.example.com)(PORT = 1524))
    )
  )

After restarting listener, the status became ONLINE:

ora.LISTENER.lsnr
              ONLINE  ONLINE   rac1   STABLE
              ONLINE  ONLINE   rac2   STABLE

Another workaround that my colleage found is the following:

Set ENDPOINTS to null in crsctl (in srvctl it’s not possible):

# crsctl modify resource ora.LISTENER.lsnr -attr "ENDPOINTS=" –unsupported

And register all endpoints statically in listener.ora. Using that way, no dynamic registration is triggered, only static endpoints will be visible.

Please comment bellow, if you have a better way. Thank you!