Machine Learning

Ploomber has many features specifically tailored to accelerate Machine Learning workflows.

graph LR la[Load dataset A] --> ca[Clean] --> fa[Features] --> merge[Merge] lb[Load dataset B] --> cb[Clean] --> fb[Features] --> merge merge --> train1[NN] --> eval[Evaluate] merge --> train2[Random Forest] --> eval merge --> train3[SVM] --> eval

Data cleaning and feature engineering

Data cleaning and feature engineering are highly iterative processes, Ploomber accelerates them via incremental builds, which allow you to introduce changes to your pipeline and bring results up-to-date without having to re-compute everything from scratch.

Experiment tracking

Ploomber also plays nicely with experiment trackers, allowing you to train hundreds of models and track the results.

Example: Integration with MLflow

Instructions

pip install ploomber
ploomber examples -n templates/mlflow -o ploomber-mlflow

Parallel experiments

To help you find the best performing model, Ploomber allows you to parallelize Machine Learning experiments.

Example: Running a grid of experiments in parallel
pip install ploomber
ploomber examples -n cookbook/grid -o grid
Example: Model selection with nested cross-validation
pip install ploomber
ploomber examples -n cookbook/nested-cv -o nested-cv

Large-scale model training

If one machine isn’t enough, you can parallelize training jobs in a cluster by exporting your pipeline to any of our supported platforms (Kubernetes, Airflow, and AWS Batch).

Deployment

Once you find the best performing model, you can deploy it for batch processing or as an online API.