Dataset Extraction

File System

get_data_from_filesystem(file_path, file_type='delimited', sheet_name=0, delimiter=',')[source]

Extract data from local file system

file_path: str

File system path of the data file to be extracted

file_type: {‘delimited’, ‘excel’}, default=’delimited’

Type of the data file

sheet_name: str, default=0

Sheet name if file_type is excel

delimiter: str, default=’,’

File delimiter to use if file_type is delimited

Examples

ds_file_system

Database

get_data_from_database(db_type=None, host=None, database=None, service=None, oracle_instant_client_path=None, username=None, password=None, query=None, chunk_size=1000)[source]

Extract data from database. Supported databases are MariaDB, Microsoft SQL, MySQL, Oracle, AWS Aurora and PostgreSQL

db_type: {‘mariadb’, ‘mssql’, ‘mysql’, ‘aurora-mysql’, ‘oracle’, ‘postgres’, ‘aurora-postgre’}

Database type

host: str

Database host

database: str

Database name

service: str

Oracle service name. Used if db_type is oracle

oracle_instant_client_path: str

Filesystem path of Oracle instant client. Used if db_type is oracle

username: str

Database user

password: str

Database password

query: str

Database query to be used to extract data

chunk_size: int or None, default=1000

Number of rows to return in each batch. Pass None to disable batch mode

Examples

ds_database

Amazon cloud (AWS)

get_data_from_s3(aws_access_key_id, aws_secret_access_key, s3_bucket_name, s3_object_name, destination_path, file_type='delimited', sheet_name=0, delimiter=',')[source]

Extract data from Amazon cloud (AWS)

aws_access_key_id: str

The access key to use when creating the s3 client

aws_secret_access_key: str

The secret key to use when creating the s3 client

s3_bucket_name: str

s3 bucket name from where the dat file needs to be pulled

s3_object_name: str

Name of the data file

destination_path: str

File system path where the file will be downloaded from S3

file_type: {‘delimited’, ‘excel’}, default=’delimited’

type of the data file

sheet_name: str, default=0

sheet name if the dataset is an Excel file

delimiter: str, default=’,’

file delimiter to use for delimited files

Examples

ds_aws