REMOTE HOST IDENTIFICATION HAS CHANGED!

Problem:

Connecting via ssh to the newly created host causes error:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:AxfpHOVc8NP2OYPGce92HMa5LADDQj2V98ZKgoQHFGU.
Please contact your system administrator.
Add correct host key in /Users/mari/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /Users/mari/.ssh/known_hosts:315
ECDSA host key for 52.1.130.91 has changed and you have requested strict checking.
Host key verification failed.

Reason:

I had another server with the same Public IP, so when I connected to the old saver the host identification has been saved in known_hosts. After a while I have removed old server and created a new one and assigned the PIP. The host identification has changed, but old entries were still saved in known_hosts.

Solution:

Open /Users/mari/.ssh/known_hosts and delete only the line containing mentioned IP (52.1.130.91 in my case), save file and retry the connection.
It should work now.

Advertisement

One of the solutions for ORA-27300: OS system dependent operation:fork failed with status: 11

Problem:

Databases were crashed and alert logs were showing errors:

Fri Nov 12 13:23:39 2021
Process startup failed, error stack:
Errors in file /app/oracle/diag/rdbms/orcl/orcl/trace/orcl_psp0_25852.trc:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn5

We’ve implemented a procedure to take process list (by ps -ef), so output during that time was the following:

oracle    592    1  0 13:57 ?        00:00:00 [oracle] <defunct>
oracle    593    1  0 13:55 ?        00:00:00 [oracle] <defunct>
oracle    615    1  0 13:57 ?        00:00:00 [oracle] <defunct>
oracle    618    1  0 13:57 ?        00:00:00 [oracle] <defunct>
...

Not only Oracle, but sshd and some other processes were also experiencing the same:

oracle  22335 22331  0 13:52 ?        00:00:00 [ps] <defunct>
oracle  22336 22331  0 13:52 ?        00:00:00 [grep] <defunct>
oracle  22338 22331  0 13:52 ?        00:00:00 [grep] <defunct>
oracle  14389    1  0 13:24 ?        00:00:00 [sshd] <defunct>
oracle  15852    1  0 13:23 ?        00:00:00 [sshd] <defunct>

Reason:

A large amount of Zombie processes, causing applications to fail.

Solution:

You may find a lot of recommendations about increasing kernel.pid_max, similarly ORA-27300: OS System Dependent Operation:fork Failed With Status: 11 (Doc ID 1546393.1). Of course, you can make this parameter unlimited, but this will not solve the problem, it will just postpone it.

The reason for high number of defunct processes is described here, https://access.redhat.com/solutions/2438581.

The parent process for our defunct processes was systemd (pid=1) and the version of it was systemd-219-19.el7.x86_64.

The solution is to update systemd to the latest version.

Print the content of multiple differently named files in Linux

If the number of files you are working on is big, then you need automation as soon as possible.
This post describes find -o option, which helps you work on differently named files when their number is big.

For example, if you want to output the content of files physical_block_size and logical_block_size located under /sys/block/*/queue, run the following:

# find /sys/block/*/queue -name physical_block_size -o -name logical_block_size | while read f ; do echo "$f $(cat $f)" ; done

..
/sys/block/dm-0/queue/physical_block_size 4096
/sys/block/dm-0/queue/logical_block_size 512
/sys/block/dm-1/queue/physical_block_size 512
...

Where -o means OR.

Useful when working on ASM disks.

Draw graph for Linux sar output using ksar

I’ve recently heard about this tool , as it is said we are learning things until the death (:

Our company is saving sar output in a text file periodicly and after performance or other issues we need to analyze it’s output to find out which resource was busy and when.. analyzing text file is time-consuming and can also cause eye tension.

Output in sar:

00:00:01        CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest    %gnice     %idle
00:10:01        all      3.14      0.00      2.43      1.64      0.00      0.00      0.60      0.00      0.00     92.20
00:10:01          0      3.64      0.00      2.33      4.10      0.00      0.00      1.10      0.00      0.00     88.83

00:00:01      scall/s badcall/s  packet/s     udp/s     tcp/s     hit/s    miss/s   sread/s  swrite/s saccess/s sgetatt/s
00:10:01         0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

...
00:00:01       totsck    tcpsck    udpsck    rawsck   ip-frag    tcp-tw
00:10:01         5682       656      1783         0         6       502
00:20:01         5651       668      1748         0         0       804

CPU, Network, Disk I/O, etc. activities are logged.

Same text file analyzed by ksar tool and graphycally displayed is the following:

Full list of items that can be seen graphycally are the following:

Now I will show all necessary information that is necessary to use this tool:

1. Download a pre-built jar from GitHub releases page.

2. Run jar on your computer:

   java -jar ksar-5.2.4-b396_gf0680721-SNAPSHOT-all.jar

3. Click Data -> Load from a file…and choose output of sar in a text file

Full information about this tool: https://github.com/vlsi/ksar

Gmail blocks emails from Postfix client on Linux

Problem:

I want to send email notification to my Gmail account from Linux server using Postfix client. Mails are not received and /var/log/maillog is full of the following error messages:

Aug 18 17:24:29 rac1 postfix/smtp[17580]: connect to gmail-smtp-in.l.google.com[74.125.69.27]:25: Connection timed out
Aug 18 17:24:29 rac1 postfix/smtp[17580]: connect to gmail-smtp-in.l.google.com[2607:f8b0:4001:c0d::1a]:25: Network is unreachable
Aug 18 17:24:59 rac1 postfix/smtp[17580]: connect to alt1.gmail-smtp-in.l.google.com[173.194.77.27]:25: Connection timed out
Aug 18 17:25:29 rac1 postfix/smtp[17580]: connect to alt2.gmail-smtp-in.l.google.com[173.194.219.27]:25: Connection timed out

Solution:

Configure Postfix and Gmail account accordingly.

1. Confirm that the myhostname parameter is configured with your server’s FQDN:

# grep ^myhostname /etc/postfix/main.cf
myhostname = rac1.example.com

2. Generate an App Password for Postfix:

Click on App passwords -> Select app dropdown -> choose Other (custom name) -> Enter “Postfix” -> click GENERATE.

Postfix app password is generated in yellow box, copy and save it (generated_password_goes_here will be changed by this value).

3. Fill SMTP Host, username, and password in /etc/postfix/sasl_passwd

# cat /etc/postfix/sasl_passwd
smtp.gmail.com your_username@gmail.com:generated_password_goes_here

4. Create the hash db file

# postmap /etc/postfix/sasl_passwd

5. Configure the Postfix Relay Server:

# grep ^relayhost /etc/postfix/main.cf
relayhost = [smtp.gmail.com]:587

6.  To enable authentication, add the following parameters in /etc/postfix/main.cf

smtp_sasl_auth_enable = yes
smtpd_tls_auth_only = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
smtp_tls_security_level = encrypt

7. Reload Postfix service:

# systemctl reload postfix

8. For sending test email, I use Flashgrid tool:

[root@rac1 ~]# flashgrid-node test-alerts
FlashGrid 21.2.24.58935 #bb6005e9d66650d1996184c38d2fb8a2a78420a8
License: Active, Marketplace
Licensee: Flashgrid Inc.
Support plan: 24x7
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Test alerts were sent

The alert is received now:

Linux STRESS command usage example

Problem:

During high CPU usage in kernel space we have noticed brownouts on our database nodes. For finding the reason of the problem we wanted to reproduce the issue and somehow trigger high %sy usage on our nodes.

I have found stress tool very useful and want to share my experience with you.

Solution:

1. Install stress tool via yum:

# yum install stress

2. Stress has several options to use:

[root@rac1 ~]# stress

`stress' imposes certain types of compute stress on your system

Usage: stress [OPTION [ARG]] ...
 -?, --help         show this help statement
     --version      show version statement
 -v, --verbose      be verbose
 -q, --quiet        be quiet
 -n, --dry-run      show what would have been done
 -t, --timeout N    timeout after N seconds
     --backoff N    wait factor of N microseconds before work starts
 -c, --cpu N        spawn N workers spinning on sqrt()
 -i, --io N         spawn N workers spinning on sync()
 -m, --vm N         spawn N workers spinning on malloc()/free()
     --vm-bytes B   malloc B bytes per vm worker (default is 256MB)
     --vm-stride B  touch a byte every B bytes (default is 4096)
     --vm-hang N    sleep N secs before free (default none, 0 is inf)
     --vm-keep      redirty memory instead of freeing and reallocating
 -d, --hdd N        spawn N workers spinning on write()/unlink()
     --hdd-bytes B  write B bytes per hdd worker (default is 1GB)

Example: stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --timeout 10s

Note: Numbers may be suffixed with s,m,h,d,y (time) or B,K,M,G (size).

To cause high %sy you need to use –vm option and find appropriate number of workers, in my case 50 workers were enough to cause an issue.

In the following example, stress will run 50 workers and timeout for the run will be 200s:

# stress --vm 50 --timeout 200s

From another terminal tab, run top command to monitor %sy usage (81.2%) :

See short video demonstration below:

Add new virtual machine in VBox and install Oracle Linux

Intro:

This blog post belongs to my student at Business and Technology University Ivane Metreveli, thank you Ivane for participating in this project.

  1. First of all, you need to download Oracle Linux iso file from edelivery.oracle.com or from oracle.com. After that, run VirtualBox, click New button and create new virtual machine:

2. Set Name of the Virtual Machine and select operation system as follows, click Next

3. Select appropriate RAM amount, 3GB RAM is recommended for normal processing, click on Next button and jump to next step

4. Now, Select Create a virtual hard disk now option and click Create button

5. Select VDI(virtualbox Disk image)

6. Select Dynamically allocated if you don’t want take hard disk space immediately

7. Select file size (disk size for VB) and the location, click Create button to finish virtual machine creation process

8. Virtual machine is already is created. Before we open/start VM, we load iso file in the machne, click Settings and follow me

9. Navigate to Storage and click CD icon,  on the right side of the window, under attributes, click CD icon and add virtual machine’s .iso file.

10. After that, you can click start button

11. Select .iso files or click folder icon and open folder where .iso file is located, select it and click start

12. Next step is OS installation process, here you select Install Oracle linux 7.6 and click enter to start installation process:

13. Select system language and click continue

14. Select installation destiantion

15. Select the disk where you want to install system. You can select virtual disk, that you have created in the previous step or add a new one. Select disk and click Done button;

16. Now all parameter is ready. Click Begin Installation and wait for finishing the process

17. Set password and click Done

18. Installation is in progress, need to wait more

19. Installation proess is finished, click Roboot button and move to the next step:

20. Installation is finised now, you can start working with Oracle Linux:

Install Linux in Virtual Box

Intro:

This blog post belongs to my student at Business and Technology University Saba Lapanashvili, thank you Saba for participating in this project.

Requirements:

– VirtualBox
– Linux iso file ( For example Linux Mint 15 )

Step 1: Choose System Type

– After installing VirtualBox, click New
– Fill the Name field: e.g Linux Mint 15
– Select Type: Linux
– Select Version: Ubuntu

Step 2: Select the Amount of RAM

– Select the amount of RAM, e.g 2048 MB = 2 GB

Step 3: Configure Hard Disk Settings

– Choose Create a virtual hard drive now, to make a virtual disk space
– Select the VDI 
– Choose Dynamically allocated
– Select the amount of hard drive size

Step 4: Choose Linux ISO File

Now we have done hardware settings

– Click Start to launch system
– Choose your system iso file from your computer, for example my system iso file is (linuxmint-15-cinnamon-dvd-64bit.iso)

Step 5: Install Linux and Make Account

– Click on the Install Linux Mint
– And select Erase disk and install Linux Mint
– Then press Install Now
– Make your account
– Press Continue

Step 6: Congratulations

Congratulations now you have Linux on your Windows.

Python: ImportError: No module named typing

Problem:

Installing joblib module is throwing error:

# pip install joblib
...
ImportError: No module named typing

Trying to install typing, returns the same error:

# pip install typing
...
ImportError: No module named typing

Solution:

Install python-typing using yum:

# yum install python-typing

Python: Manipulation on file content located in s3 archived as tar.gz without downloading

Problem:

Need to analyze several values from multiple files that are archived as tar.gz and located on s3. This operation must be performed without downloading or extracting tar.gz

HARBOR: I am neither the Python expert nor the developer, so it is assumed that I am having mistakes in it or script could be written shorter and easier way than I did.
But it satisfies my needs. So please use it as an example only and investigate the content of it.

Hierarcy of the tar.gz file is the following (sample):

-myfile.tar.gz
—folder1.tar.gz
—–flashgrid_cluster
—–files/node_monitor_error.log
—folder2.tar.gz
—–flashgrid_cluster
—–files/node_monitor_error.log

  1. Create extracts3tar.py file with the following content and grant executable permission to that file:

    Note: Update the following entries in the file according to your environment.
AWS_ACCESS_KEY_ID = "my key goes here"    
AWS_SECRET_ACCESS_KEY = "my secret key goes here"
AWS_STORAGE_BUCKET_NAME = "my bucket name goes here"

Content of extracts3tar.py:

#!/usr/bin/python2.7
import boto3
import tarfile
import joblib
import io
import sys

class S3Loader(object):
    AWS_ACCESS_KEY_ID = "my key goes here"
    AWS_SECRET_ACCESS_KEY = "my secret key goes here"
    AWS_REGION_NAME = "us-east-1"
    AWS_STORAGE_BUCKET_NAME = "my bucket name goes here"
    def __init__(self):
        self.s3_client = boto3.client("s3",
                                     aws_access_key_id=self.AWS_ACCESS_KEY_ID,
                                     aws_secret_access_key=self.AWS_SECRET_ACCESS_KEY)

    def load_tar_file_s3_into_object_without_download(self, s3_filepath):

        # Describing variables search pattern
        match = ("Disk latency above threshold")
        notmatch = (".lun")

        s3_object = self.s3_client.get_object(Bucket=self.AWS_STORAGE_BUCKET_NAME, Key=s3_filepath)
        wholefile = s3_object['Body'].read()
        fileobj = io.BytesIO(wholefile)

        # Opening first tar.gz file
        tar = tarfile.open(fileobj=fileobj)

        # Searching nested tar.gz files
        childgz = [f.name for f in tar.getmembers() if f.name.endswith('.gz')]

        # Extracting file named flashgrid_cluster which is located in the first tar.gz
        node1gz = tarfile.open(fileobj=tar.extractfile(childgz[0]))
        fgclustername = [f.name for f in node1gz.getmembers() if f.name.endswith('flashgrid_cluster')]
        fgclusternamecontent = node1gz.extractfile(fgclustername[0])

        # Extracting text that contains string "Cluster Name:"
        for fgclusternameline in fgclusternamecontent:
           if "Cluster Name:" in fgclusternameline:
             clustername=fgclusternameline
#        print(len(childgz))
#        print(clustername)
#        print(childgz)
#        nodegzlist=list('')
#        nodemonfilelist=list('')

# Extracting file node_monitor_error.log from all nested tar.gz files
        for i in childgz:
#          nodegzlist.append(tarfile.open(fileobj=tar.extractfile(i)))
           cur_gz_file_extracted = tarfile.open(fileobj=tar.extractfile(i))
#           print(tarfile.open(fileobj=tar.extractfile(i)).getmembers())
           cur_node_mon_file = [f.name for f in cur_gz_file_extracted.getmembers() if f.name.endswith('node_monitor-error.log')]

# Path to node_monitor_error.log contains hostname inside so extracting string that is the hostname
           cur_node_name = cur_node_mon_file[0].split("/")[0]
#           print(cur_node_name)
#           nodemonfilelist.append([f.name for f in curfile.getmembers() if f.name.endswith('node_monitor-error.log')])
#           print(nodemonfilelist[0],nodemonfilelist[1],nodemonfilelist[2])

# Extracting content of node_monitor_error.log file
           cur_node_mon_file_content = cur_gz_file_extracted.extractfile(cur_node_mon_file[0])
#           print(cur_node_mon_file_content)
#        fgclusternamecontent = node1gz.extractfile(fgclustername[0])

#        for fgclusternameline in fgclusternamecontent:
#           if "Cluster Name:" in fgclusternameline:
#             clustername=fgclusternameline

# Selecting lines from the extracted file and filtering based on match criteria (match, notmatch variables)
           for cur_node_mon_file_content_line in cur_node_mon_file_content:
            if match in cur_node_mon_file_content_line and not (notmatch in cur_node_mon_file_content_line):
               # Extracting time from the string, knowing the exact position
               time = cur_node_mon_file_content_line.split(" ")[0] + " " + cur_node_mon_file_content_line.split(" ")[1]
               cur_node_mon_file_line_splitted = cur_node_mon_file_content_line.split(" ")
               # Extracting necessary values after spliting the content by delimiter " "
               print(clustername.strip(),cur_node_name,cur_node_mon_file_line_splitted[8] , time,  cur_node_mon_file_line_splitted[17] + " " + cur_node_mon_file_line_splitted[18].strip())
#               print(nodemonfileline)

if __name__ == "__main__":
    s3_loader = S3Loader()
    try:

     # Script takes 1 argument
      s3_loader.load_tar_file_s3_into_object_without_download(s3_filepath=str(sys.argv[1]))

    except:
     pass

2. Run .py file and pass path of the tar.gz file

# ./extracts3tar.py "myfoldername/myfile.tar.gz"

So the search is happening for flashgrid_cluster and node_monitor_error.log file content, for which two nested tar.gz should be analyzed.

Note: For running the above script, I have to install the following rpms:

# wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm; yum install epel-release-latest-7.noarch.rpm
# yum install python-pip
# pip install boto3

UPDATE 20 June 2022:

On one of my env I was getting Syntax error while running script. I had to change the python version in the header:
From: #!/usr/bin/python2.7
To: #!/bin/python3

Then installed:
# pip3 install boto3
# pip3 install joblib