Experiment Tracking

Bluemist provides integration with MLflow to track, experiment and evaluate machine learning models. This can be simply achieved by setting the params experiment_name and run_name while training the model using train_test_evaluate as described here

Note

Is is important to execute !mlflow ui after train_test_evaluate has been completed successfully

Examples

regression_simple

In [1]:

pip install bluemist-ai

In [2]:

from sklearn import datasets

from bluemist.environment import initialize
from bluemist.preprocessing import preprocess_data
from bluemist.regression import train_test_evaluate

In [3]:

initialize()
data = datasets.load_diabetes(as_frame=True)


██████╗ ██╗     ██╗   ██╗███████╗███╗   ███╗██╗███████╗████████╗     █████╗ ██╗
██╔══██╗██║     ██║   ██║██╔════╝████╗ ████║██║██╔════╝╚══██╔══╝    ██╔══██╗██║
██████╔╝██║     ██║   ██║█████╗  ██╔████╔██║██║███████╗   ██║       ███████║██║
██╔══██╗██║     ██║   ██║██╔══╝  ██║╚██╔╝██║██║╚════██║   ██║       ██╔══██║██║
██████╔╝███████╗╚██████╔╝███████╗██║ ╚═╝ ██║██║███████║   ██║       ██║  ██║██║                                                                        
                                (version 0.1.1)
    
Bluemist path :: /home/shashank-agrawal/PycharmProjects/bluemist-ai/bluemist
System platform :: posix, Linux, 5.19.0-31-generic, linux-x86_64, ('64bit', 'ELF')

In [4]:

# Categorical encoding using OneHotEncoder
X_train, X_test, y_train, y_test = preprocess_data(data.frame, 
                                                   target_variable='target', 
                                                   test_size=0.25, 
                                                   categorical_features=['sex'], 
                                                   categorical_encoder='OneHotEncoder')

In [5]:

# Train and compare models
train_test_evaluate(X_train, X_test, y_train, y_test, 
                    experiment_name='regression_demo', run_name='run1')

Training TweedieRegressor: 100%|██████████| 46/46 [05:20<00:00,  6.98s/it]

	mean_absolute_error	mean_squared_error	r2_score
Estimator
ARDRegression	44.519864	3008.079745	0.504548
AdaBoostRegressor	48.546408	3497.081448	0.424005
BaggingRegressor	50.866667	3987.891171	0.343166
BayesianRidge	44.602808	3033.835815	0.500305
CCA	44.280999	3030.354646	0.500879
DecisionTreeRegressor	65.162162	6719.414414	-0.106736
DummyRegressor	67.830462	6076.558849	-0.000853
ElasticNet	46.933887	3278.863163	0.459948
ElasticNetCV	44.684214	3039.259258	0.499412
ExtraTreeRegressor	65.450450	6392.945946	-0.052964
ExtraTreesRegressor	47.227477	3538.285024	0.417219
GammaRegressor	48.502967	3524.887576	0.419426
GaussianProcessRegressor	77.382411	10123.496561	-0.667413
GradientBoostingRegressor	48.263828	3719.519272	0.387368
HistGradientBoostingRegressor	49.630362	3937.539944	0.351459
HuberRegressor	44.020699	3015.074551	0.503395
KNeighborsRegressor	50.468468	4245.383784	0.300755
KernelRidge	44.206911	3013.256654	0.503695
Lars	67.869366	8318.907271	-0.370184
LarsCV	46.646006	3186.593443	0.475145
Lasso	44.739299	3026.476677	0.501517
LassoCV	44.563633	3017.892921	0.502931
LassoLars	44.739338	3026.483536	0.501516
LassoLarsCV	44.133249	3005.256664	0.505013
LassoLarsIC	44.174273	3004.538917	0.505131
LinearRegression	44.133249	3005.256664	0.505013
LinearSVR	50.776148	3855.542495	0.364964
MLPRegressor	83.214515	10343.340753	-0.703623
NuSVR	61.968426	5148.979722	0.151926
OrthogonalMatchingPursuit	52.514640	3998.460337	0.341425
OrthogonalMatchingPursuitCV	45.502552	3140.428922	0.482749
PLSCanonical	96.890416	14032.727762	-1.311292
PLSRegression	44.498711	3022.041707	0.502248
PassiveAggressiveRegressor	45.565626	3203.906238	0.472293
PoissonRegressor	43.527473	2984.389272	0.508450
QuantileRegressor	67.333333	6262.864873	-0.031539
RANSACRegressor	48.635928	3541.139407	0.416749
RadiusNeighborsRegressor	59.681818	4624.522727	-0.883304
RandomForestRegressor	48.419730	3824.330700	0.370105
Ridge	44.151308	3005.602835	0.504956
RidgeCV	44.495625	3026.059565	0.501586
SGDRegressor	44.349688	3012.482469	0.503822
SVR	60.675204	5156.658837	0.150661
TheilSenRegressor	44.324085	3055.778362	0.496691
TransformedTargetRegressor	44.133249	3005.256664	0.505013
TweedieRegressor	48.712896	3476.216433	0.427442

In [ ]:

!mlflow ui

[2023-02-22 01:35:13 -0800] [142987] [INFO] Starting gunicorn 20.1.0
[2023-02-22 01:35:13 -0800] [142987] [INFO] Listening at: http://127.0.0.1:5000 (142987)
[2023-02-22 01:35:13 -0800] [142987] [INFO] Using worker: sync
[2023-02-22 01:35:13 -0800] [142990] [INFO] Booting worker with pid: 142990
[2023-02-22 01:35:13 -0800] [142991] [INFO] Booting worker with pid: 142991
[2023-02-22 01:35:14 -0800] [142992] [INFO] Booting worker with pid: 142992
[2023-02-22 01:35:14 -0800] [142993] [INFO] Booting worker with pid: 142993

To track the experiments, open the browser and navigate to http://127.0.0.1:5000