Refactoring legacy notebooks¶

This tutorial shows how to convert legacy notebooks into Ploomber pipelines.

Note

If you don’t have a sample notebook, download one from here.

or execute:

curl -O https://raw.githubusercontent.com/ploomber/soorgeon/main/examples/machine-learning/nb.ipynb

The only requirement for your notebook is to separate sections with H2 headings:

Here’s an example notebook with three sections separated by H2 headings:

Once your notebook is ready, you can refactor it with:

# install soorgeon
pip install soorgeon

# refactor the nb.ipynb notebook
soorgeon refactor nb.ipynb

Tip

Sometimes, soorgeon may not be able to split your notebook sections, if so, run soorgeon refactor nb.ipynb --single-task to generate a pipeline with one task. If you have questions, send us a message on Slack.

The command above will generate a pipeline.yaml with your pipeline declaration and .ipynb tasks (one per section).

You can also tell Soorgeon to generate tasks in .py format:

# generate tasks in .py format (requires soorgeon>=0.0.13)
soorgeon refactor nb.ipynb --file-format py

Note that due to the Jupyter integration, you can open .py files as notebooks in Jupyter

To run the pipeline:

# install dependencies
pip install -r requirements.txt

# run Ploomber pipeline
ploomber build

That’s it! Now that you have a Ploomber pipeline, you can benefit from all our features! If you want to learn more about the framework, check out the basic concepts tutorial.

Resources¶

Soorgeon’s user guide
GitHub
Interactive example
Blog post series on notebook refactoring: Part I, and Part II

Contents

Refactoring legacy notebooks¶

Resources¶