Experiment Tracking
Bluemist provides integration with MLflow to track, experiment and evaluate machine learning models. This can be simply achieved by setting the params experiment_name and run_name while training the model using train_test_evaluate as described here
Note
Is is important to execute !mlflow ui after train_test_evaluate has been completed successfully
Examples
In [1]:
pip install bluemist-ai
In [2]:
from sklearn import datasets
from bluemist.environment import initialize
from bluemist.preprocessing import preprocess_data
from bluemist.regression import train_test_evaluate
In [3]:
initialize()
data = datasets.load_diabetes(as_frame=True)
██████╗ ██╗ ██╗ ██╗███████╗███╗ ███╗██╗███████╗████████╗ █████╗ ██╗
██╔══██╗██║ ██║ ██║██╔════╝████╗ ████║██║██╔════╝╚══██╔══╝ ██╔══██╗██║
██████╔╝██║ ██║ ██║█████╗ ██╔████╔██║██║███████╗ ██║ ███████║██║
██╔══██╗██║ ██║ ██║██╔══╝ ██║╚██╔╝██║██║╚════██║ ██║ ██╔══██║██║
██████╔╝███████╗╚██████╔╝███████╗██║ ╚═╝ ██║██║███████║ ██║ ██║ ██║██║
(version 0.1.1)
Bluemist path :: /home/shashank-agrawal/PycharmProjects/bluemist-ai/bluemist
System platform :: posix, Linux, 5.19.0-31-generic, linux-x86_64, ('64bit', 'ELF')
In [4]:
# Categorical encoding using OneHotEncoder
X_train, X_test, y_train, y_test = preprocess_data(data.frame,
target_variable='target',
test_size=0.25,
categorical_features=['sex'],
categorical_encoder='OneHotEncoder')
In [5]:
# Train and compare models
train_test_evaluate(X_train, X_test, y_train, y_test,
experiment_name='regression_demo', run_name='run1')
Training TweedieRegressor: 100%|██████████| 46/46 [05:20<00:00, 6.98s/it]
| mean_absolute_error | mean_squared_error | r2_score | |
|---|---|---|---|
| Estimator | |||
| ARDRegression | 44.519864 | 3008.079745 | 0.504548 |
| AdaBoostRegressor | 48.546408 | 3497.081448 | 0.424005 |
| BaggingRegressor | 50.866667 | 3987.891171 | 0.343166 |
| BayesianRidge | 44.602808 | 3033.835815 | 0.500305 |
| CCA | 44.280999 | 3030.354646 | 0.500879 |
| DecisionTreeRegressor | 65.162162 | 6719.414414 | -0.106736 |
| DummyRegressor | 67.830462 | 6076.558849 | -0.000853 |
| ElasticNet | 46.933887 | 3278.863163 | 0.459948 |
| ElasticNetCV | 44.684214 | 3039.259258 | 0.499412 |
| ExtraTreeRegressor | 65.450450 | 6392.945946 | -0.052964 |
| ExtraTreesRegressor | 47.227477 | 3538.285024 | 0.417219 |
| GammaRegressor | 48.502967 | 3524.887576 | 0.419426 |
| GaussianProcessRegressor | 77.382411 | 10123.496561 | -0.667413 |
| GradientBoostingRegressor | 48.263828 | 3719.519272 | 0.387368 |
| HistGradientBoostingRegressor | 49.630362 | 3937.539944 | 0.351459 |
| HuberRegressor | 44.020699 | 3015.074551 | 0.503395 |
| KNeighborsRegressor | 50.468468 | 4245.383784 | 0.300755 |
| KernelRidge | 44.206911 | 3013.256654 | 0.503695 |
| Lars | 67.869366 | 8318.907271 | -0.370184 |
| LarsCV | 46.646006 | 3186.593443 | 0.475145 |
| Lasso | 44.739299 | 3026.476677 | 0.501517 |
| LassoCV | 44.563633 | 3017.892921 | 0.502931 |
| LassoLars | 44.739338 | 3026.483536 | 0.501516 |
| LassoLarsCV | 44.133249 | 3005.256664 | 0.505013 |
| LassoLarsIC | 44.174273 | 3004.538917 | 0.505131 |
| LinearRegression | 44.133249 | 3005.256664 | 0.505013 |
| LinearSVR | 50.776148 | 3855.542495 | 0.364964 |
| MLPRegressor | 83.214515 | 10343.340753 | -0.703623 |
| NuSVR | 61.968426 | 5148.979722 | 0.151926 |
| OrthogonalMatchingPursuit | 52.514640 | 3998.460337 | 0.341425 |
| OrthogonalMatchingPursuitCV | 45.502552 | 3140.428922 | 0.482749 |
| PLSCanonical | 96.890416 | 14032.727762 | -1.311292 |
| PLSRegression | 44.498711 | 3022.041707 | 0.502248 |
| PassiveAggressiveRegressor | 45.565626 | 3203.906238 | 0.472293 |
| PoissonRegressor | 43.527473 | 2984.389272 | 0.508450 |
| QuantileRegressor | 67.333333 | 6262.864873 | -0.031539 |
| RANSACRegressor | 48.635928 | 3541.139407 | 0.416749 |
| RadiusNeighborsRegressor | 59.681818 | 4624.522727 | -0.883304 |
| RandomForestRegressor | 48.419730 | 3824.330700 | 0.370105 |
| Ridge | 44.151308 | 3005.602835 | 0.504956 |
| RidgeCV | 44.495625 | 3026.059565 | 0.501586 |
| SGDRegressor | 44.349688 | 3012.482469 | 0.503822 |
| SVR | 60.675204 | 5156.658837 | 0.150661 |
| TheilSenRegressor | 44.324085 | 3055.778362 | 0.496691 |
| TransformedTargetRegressor | 44.133249 | 3005.256664 | 0.505013 |
| TweedieRegressor | 48.712896 | 3476.216433 | 0.427442 |
In [ ]:
!mlflow ui
[2023-02-22 01:35:13 -0800] [142987] [INFO] Starting gunicorn 20.1.0 [2023-02-22 01:35:13 -0800] [142987] [INFO] Listening at: http://127.0.0.1:5000 (142987) [2023-02-22 01:35:13 -0800] [142987] [INFO] Using worker: sync [2023-02-22 01:35:13 -0800] [142990] [INFO] Booting worker with pid: 142990 [2023-02-22 01:35:13 -0800] [142991] [INFO] Booting worker with pid: 142991 [2023-02-22 01:35:14 -0800] [142992] [INFO] Booting worker with pid: 142992 [2023-02-22 01:35:14 -0800] [142993] [INFO] Booting worker with pid: 142993
To track the experiments, open the browser and navigate to http://127.0.0.1:5000