Ploomber has many features specifically tailored to accelerate Machine Learning workflows.
Data cleaning and feature engineering¶
Data cleaning and feature engineering are highly iterative processes, Ploomber accelerates them via incremental builds, which allow you to introduce changes to your pipeline and bring results up-to-date without having to re-compute everything from scratch.
Ploomber also plays nicely with experiment trackers, allowing you to train hundreds of models and track the results.
Example: Integration with MLflow
pip install ploomber ploomber examples -n templates/mlflow -o ploomber-mlflow
To help you find the best performing model, Ploomber allows you to parallelize Machine Learning experiments.
Example: Running a grid of experiments in parallel
pip install ploomber ploomber examples -n cookbook/grid -o grid
Example: Model selection with nested cross-validation
pip install ploomber ploomber examples -n cookbook/nested-cv -o nested-cv
Large-scale model training¶
If one machine isn’t enough, you can parallelize training jobs in a cluster by exporting your pipeline to any of our supported platforms (Kubernetes, Airflow, and AWS Batch).