Dataset Extraction
File System
- get_data_from_filesystem(file_path, file_type='delimited', sheet_name=0, delimiter=',')[source]
Extract data from local file system
- file_path: str
File system path of the data file to be extracted
- file_type: {‘delimited’, ‘excel’}, default=’delimited’
Type of the data file
- sheet_name: str, default=0
Sheet name if
file_typeisexcel- delimiter: str, default=’,’
File delimiter to use if
file_typeisdelimited
Examples
ds_file_system In [ ]:pip install bluemist
In [ ]:from bluemist.environment import initialize from bluemist.datasource import get_data_from_filesystem
In [ ]:initialize() data = get_data_from_filesystem('/path/to/file/filename.csv')
Database
- get_data_from_database(db_type=None, host=None, database=None, service=None, oracle_instant_client_path=None, username=None, password=None, query=None, chunk_size=1000)[source]
Extract data from database. Supported databases are MariaDB, Microsoft SQL, MySQL, Oracle, AWS Aurora and PostgreSQL
- db_type: {‘mariadb’, ‘mssql’, ‘mysql’, ‘aurora-mysql’, ‘oracle’, ‘postgres’, ‘aurora-postgre’}
Database type
- host: str
Database host
- database: str
Database name
- service: str
Oracle service name. Used if
db_typeisoracle- oracle_instant_client_path: str
Filesystem path of Oracle instant client. Used if
db_typeisoracle- username: str
Database user
- password: str
Database password
- query: str
Database query to be used to extract data
- chunk_size: int or None, default=1000
Number of rows to return in each batch. Pass
Noneto disable batch mode
Examples
ds_database In [ ]:pip install bluemist
In [ ]:from bluemist.environment import initialize from bluemist.datasource.database import get_data_from_database
In [ ]:initialize() ## Get data from MySQL query = 'SELECT * FROM auto_mpg' data = get_data_from_database(db_type='mysql', host='dbHost:3306', username='username', password='password', database='dbName', query=query, chunk_size=100) ## Get data from Postgres query = 'SELECT * FROM public.auto_mpg' data = get_data_from_database(db_type='postgres', host='dbHost:5432', username='username', password='password', database='dbName', query=query, chunk_size=100) ## Get data from Aurora MS SQL query = 'SELECT * FROM dbo.auto_mpg' data = get_data_from_database(db_type='mssql', host='dbHost:1433', username='username', password='password', database='dbName', query=query, chunk_size=100) ## Get data from Oracle query = 'SELECT * FROM AUTO_MPG' data = get_data_from_database(db_type='oracle', host='dbHost', username='username', password='password', service='serviceName', oracle_instant_client_path= '/path/to/instantclient/instantclient', query=query, chunk_size=100) ## Get data from MariaDB query = 'SELECT * FROM auto_mpg' data = get_data_from_database(db_type='mariadb', host='dbHost:3306', username='username', password='password', database='dbName', query=query, chunk_size=100) ## Get data from AWS Aurora MySQL query = 'SELECT * FROM AUTO_MPG' data = get_data_from_database(db_type='aurora-mysql', host='dbHost:3306', username='username', password='password', database='dbName', query=query, chunk_size=100) ## Get data from AWS Aurora Postgres query = 'SELECT * FROM public.auto_mpg' data = get_data_from_database(db_type='aurora-postgres', host='dbHost:5432', username='username', password='password', database='dbName', query=query, chunk_size=100)
Amazon cloud (AWS)
- get_data_from_s3(aws_access_key_id, aws_secret_access_key, s3_bucket_name, s3_object_name, destination_path, file_type='delimited', sheet_name=0, delimiter=',')[source]
Extract data from Amazon cloud (AWS)
- aws_access_key_id: str
The access key to use when creating the s3 client
- aws_secret_access_key: str
The secret key to use when creating the s3 client
- s3_bucket_name: str
s3 bucket name from where the dat file needs to be pulled
- s3_object_name: str
Name of the data file
- destination_path: str
File system path where the file will be downloaded from S3
- file_type: {‘delimited’, ‘excel’}, default=’delimited’
type of the data file
- sheet_name: str, default=0
sheet name if the dataset is an Excel file
- delimiter: str, default=’,’
file delimiter to use for delimited files
Examples
ds_aws In [ ]:pip install bluemist
In [ ]:from bluemist.environment import initialize from bluemist.datasource.aws import get_data_from_s3
In [ ]:initialize() data = get_data_from_s3('<aws_access_key_id>', '<aws_secret_access_key>', 's3_bucket_name', '<s3_object_name>', '<local_download_path')