Python: ImportError: No module named typing


Installing joblib module is throwing error:

# pip install joblib
ImportError: No module named typing

Trying to install typing, returns the same error:

# pip install typing
ImportError: No module named typing


Install python-typing using yum:

# yum install python-typing

Python: Manipulation on file content located in s3 archived as tar.gz without downloading


Need to analyze several values from multiple files that are archived as tar.gz and located on s3. This operation must be performed without downloading or extracting tar.gz

HARBOR: I am neither the Python expert nor the developer, so it is assumed that I am having mistakes in it or script could be written shorter and easier way than I did.
But it satisfies my needs. So please use it as an example only and investigate the content of it.

Hierarcy of the tar.gz file is the following (sample):


  1. Create file with the following content and grant executable permission to that file:

    Note: Update the following entries in the file according to your environment.
AWS_ACCESS_KEY_ID = "my key goes here"    
AWS_SECRET_ACCESS_KEY = "my secret key goes here"
AWS_STORAGE_BUCKET_NAME = "my bucket name goes here"

Content of

import boto3
import tarfile
import joblib
import io
import sys

class S3Loader(object):
    AWS_ACCESS_KEY_ID = "my key goes here"
    AWS_SECRET_ACCESS_KEY = "my secret key goes here"
    AWS_REGION_NAME = "us-east-1"
    AWS_STORAGE_BUCKET_NAME = "my bucket name goes here"
    def __init__(self):
        self.s3_client = boto3.client("s3",

    def load_tar_file_s3_into_object_without_download(self, s3_filepath):

        # Describing variables search pattern
        match = ("Disk latency above threshold")
        notmatch = (".lun")

        s3_object = self.s3_client.get_object(Bucket=self.AWS_STORAGE_BUCKET_NAME, Key=s3_filepath)
        wholefile = s3_object['Body'].read()
        fileobj = io.BytesIO(wholefile)

        # Opening first tar.gz file
        tar =

        # Searching nested tar.gz files
        childgz = [ for f in tar.getmembers() if'.gz')]

        # Extracting file named flashgrid_cluster which is located in the first tar.gz
        node1gz =[0]))
        fgclustername = [ for f in node1gz.getmembers() if'flashgrid_cluster')]
        fgclusternamecontent = node1gz.extractfile(fgclustername[0])

        # Extracting text that contains string "Cluster Name:"
        for fgclusternameline in fgclusternamecontent:
           if "Cluster Name:" in fgclusternameline:
#        print(len(childgz))
#        print(clustername)
#        print(childgz)
#        nodegzlist=list('')
#        nodemonfilelist=list('')

# Extracting file node_monitor_error.log from all nested tar.gz files
        for i in childgz:
#          nodegzlist.append(
           cur_gz_file_extracted =
#           print(
           cur_node_mon_file = [ for f in cur_gz_file_extracted.getmembers() if'node_monitor-error.log')]

# Path to node_monitor_error.log contains hostname inside so extracting string that is the hostname
           cur_node_name = cur_node_mon_file[0].split("/")[0]
#           print(cur_node_name)
#           nodemonfilelist.append([ for f in curfile.getmembers() if'node_monitor-error.log')])
#           print(nodemonfilelist[0],nodemonfilelist[1],nodemonfilelist[2])

# Extracting content of node_monitor_error.log file
           cur_node_mon_file_content = cur_gz_file_extracted.extractfile(cur_node_mon_file[0])
#           print(cur_node_mon_file_content)
#        fgclusternamecontent = node1gz.extractfile(fgclustername[0])

#        for fgclusternameline in fgclusternamecontent:
#           if "Cluster Name:" in fgclusternameline:
#             clustername=fgclusternameline

# Selecting lines from the extracted file and filtering based on match criteria (match, notmatch variables)
           for cur_node_mon_file_content_line in cur_node_mon_file_content:
            if match in cur_node_mon_file_content_line and not (notmatch in cur_node_mon_file_content_line):
               # Extracting time from the string, knowing the exact position
               time = cur_node_mon_file_content_line.split(" ")[0] + " " + cur_node_mon_file_content_line.split(" ")[1]
               cur_node_mon_file_line_splitted = cur_node_mon_file_content_line.split(" ")
               # Extracting necessary values after spliting the content by delimiter " "
               print(clustername.strip(),cur_node_name,cur_node_mon_file_line_splitted[8] , time,  cur_node_mon_file_line_splitted[17] + " " + cur_node_mon_file_line_splitted[18].strip())
#               print(nodemonfileline)

if __name__ == "__main__":
    s3_loader = S3Loader()

     # Script takes 1 argument


2. Run .py file and pass path of the tar.gz file

# ./ "myfoldername/myfile.tar.gz"

So the search is happening for flashgrid_cluster and node_monitor_error.log file content, for which two nested tar.gz should be analyzed.

Note: For running the above script, I have to install the following rpms:

# wget; yum install epel-release-latest-7.noarch.rpm
# yum install python-pip
# pip install boto3

UPDATE 20 June 2022:

On one of my env I was getting Syntax error while running script. I had to change the python version in the header:
From: #!/usr/bin/python2.7
To: #!/bin/python3

Then installed:
# pip3 install boto3
# pip3 install joblib