Batch processing¶
You can export Ploomber pipelines to production schedulers for batch processing. Check out our package Soopervisor, which allows you to export to Kubernetes (via Argo workflows), AWS Batch, Airflow, and SLURM.
Composing batch pipelines¶
To compose a batch pipeline, use the import_tasks_from
directive in
your pipeline.yaml
file.
For example, define your feature generation tasks in a features.yaml
file:
# generate one feature...
- source: features.a_feature
product: features/a_feature.csv
# another feature...
- source: features.anoter_feature
product: features/another_feature.csv
# join the two previous features...
- source: features.join
product: features/all.csv
Then import those tasks in your training pipeline, pipeline.yaml
:
meta:
# import feature generation tasks
import_tasks_from: features.yaml
tasks:
# Get raw data for training
- source: train.get_historical_data
product: raw/get.csv
# The import_tasks_from injects your features generation tasks here
# Train a model
- source: train.train_model
product: model/model.pickle
Your serving pipeline pipepline-serve.yaml
would look like this:
meta:
# import feature generation tasks
import_tasks_from: features.yaml
tasks:
# Get new data for predictions
- source: serve.get_new_data
product: serve/get.parquet
# The import_tasks_from injects your features generation tasks here
# Make predictions using a trained model
- source: serve.predict
product: serve/predictions.csv
params:
path_to_model: model.pickle
Here’s an example project
showing how to use import_tasks_from
to create a training
(pipeline.yaml
) and serving (pipeline-serve.yaml
) pipeline.
Scheduling¶
For an example showing how to schedule runs with cron and Ploomber, click here.