Notebooks API User Guide¶
Some of the advantages of running parallel notebooks are that we can run different tasks or processes simultaneously by using multiple computing resources. This will allow us to work more efficiently. (More details about this can be read in our blog post.)
With Ploomber and Ploomber Cloud you can parametrize notebooks and run multiple copies in parallel (each one with a different set of parameters). This guide will show you how!
The following sections will be covered in this tutorial. (You can click any of them to jump directly to the corresponding section.)
Pre-requisites¶
This section will help you setup your local environment to run notebooks in Ploomber Cloud. You only need to install Ploomber and set the API Key from your Ploomber Cloud account.
Installing Ploomber¶
To install the updated version of Ploomber, open a terminal and run the following command.
pip install ploomber --upgrade
Setting API key¶
For this, you’ll need to sign in to Ploomber Cloud. Once you sign in, jou just need to copy your API key and run the following command in your terminal:
ploomber cloud set-key {your-key}
A detailed tutorial to get and set your API Key can be found here.
Parameters¶
In this section you’ll learn how to configure your notebook to run different parameters.
First, add a cell at the top of your notebook with the notebook parameters:
# PARAMETERS
n_estimators = 1
Important: You must add the comment
# PARAMETERS
in the cell. With this, Ploomber will be able to identify that those parameter will be used during the execution.
Next, ensure that such parameters are used in the notebook’s body. Ploomber Cloud will change these values at runtime.
Now, add another raw cell at the top. In the raw cell, put the parameter values you want to use under the grid
section:
grid:
n_estimators: [1, 5, 10, 20]
Your notebook can have more than one parameter. In such case, Ploomber Cloud will run the notebook with all possible combinations.
Note: the raw cell must be a valid YAML string. YAML is a data serialization language that is often used for writing configuration files. It usually follows a simple format to list attributes. You can read more about YAML here.
Submission¶
In the previous section, you have configured diferent parameters to run different processes in parallel. In this section, we will submit these processes to Ploomber Cloud.
Let’s submit a notebook that fits a regressor and uses 4 parameter values. For this, we have prepared a notebook for you, which already contains the previously configurations for parameters that will be used for each run.
To download the notebook, simply run the following command in your terminal:
# Create a folder named notebooks
mkdir notebooks
# Download the sample notebook to the created folder
curl https://raw.githubusercontent.com/ploomber/projects/master/guides/cloud-notebooks-user-guide/notebooks/grid.ipynb -o notebooks/grid.ipynb
Now we can submit the notebook that fits the regressor with the 4 specified parameter values. In your terminal run:
[2]:
ploomber cloud nb notebooks/grid.ipynb
Uploading grid-7e41ace5.ipynb...
Triggering execution of grid-7e41ace5.ipynb...
Check that the task was submitted:
[3]:
ploomber cloud list
created_at runid status
-------------- ------------------------------------ --------
5 seconds ago b39238a2-3826-495d-90ca-b29139e324f0 created
53 minutes ago 2d4bcadf-5acb-49a5-8806-af2dbe1b32fe finished
7 hours ago ee78f4c1-ee42-4ba5-ba2f-9e73ae9228d6 finished
Wait for 1-2 minutes for the Docker image to build, you’ll see the following message once it’s done:
[6]:
ploomber cloud logs @latest --image | tail -n 10
[Container] 2022/10/26 03:59:05 Phase complete: BUILD State: SUCCEEDED
[Container] 2022/10/26 03:59:05 Phase context status code: Message:
[Container] 2022/10/26 03:59:05 Entering phase POST_BUILD
[Container] 2022/10/26 03:59:05 Phase complete: POST_BUILD State: SUCCEEDED
[Container] 2022/10/26 03:59:05 Phase context status code: Message:
Now you’ll see that the notebook has started
:
[7]:
ploomber cloud list
created_at runid status
-------------- ------------------------------------ --------
3 minutes ago b39238a2-3826-495d-90ca-b29139e324f0 started
57 minutes ago 2d4bcadf-5acb-49a5-8806-af2dbe1b32fe finished
7 hours ago ee78f4c1-ee42-4ba5-ba2f-9e73ae9228d6 finished
Let’s see the status of each task (one task per parameter value):
[8]:
ploomber cloud status @latest
Geting latest ID...
Got ID: b39238a2-3826-495d-90ca-b29139e324f0
Unknown status: started
taskid name runid status
------------------------- --------------- ------------------------- --------
d9c5d4d0-b076-44ba-a807-8 grid-7e41ace5-1 b39238a2-3826-495d-90ca-b created
d6689c7b8ed 29139e324f0
0ac99557-5c30-4160-869d-e grid-7e41ace5-3 b39238a2-3826-495d-90ca-b created
65e007fbd17 29139e324f0
6442ea6b-cece-4530-8af6-6 grid-7e41ace5-2 b39238a2-3826-495d-90ca-b created
8d38ac230ed 29139e324f0
bd3363a5-c223-4673-9a52-3 grid-7e41ace5-0 b39238a2-3826-495d-90ca-b created
3fa9dbef681 29139e324f0
After a few minutes, they are done:
[9]:
ploomber cloud status @latest
Geting latest ID...
Got ID: b39238a2-3826-495d-90ca-b29139e324f0
Pipeline finished...
taskid name runid status
------------------------- --------------- ------------------------- --------
d9c5d4d0-b076-44ba-a807-8 grid-7e41ace5-1 b39238a2-3826-495d-90ca-b finished
d6689c7b8ed 29139e324f0
0ac99557-5c30-4160-869d-e grid-7e41ace5-3 b39238a2-3826-495d-90ca-b finished
65e007fbd17 29139e324f0
6442ea6b-cece-4530-8af6-6 grid-7e41ace5-2 b39238a2-3826-495d-90ca-b finished
8d38ac230ed 29139e324f0
bd3363a5-c223-4673-9a52-3 grid-7e41ace5-0 b39238a2-3826-495d-90ca-b finished
3fa9dbef681 29139e324f0
Let’s see what’s in our outputs workspace:
[10]:
ploomber cloud products
path
-----------------------------------------------------
grid-7e41ace5/output/notebook-n_estimators=1-0.ipynb
grid-7e41ace5/output/notebook-n_estimators=10-2.ipynb
grid-7e41ace5/output/notebook-n_estimators=20-3.ipynb
grid-7e41ace5/output/notebook-n_estimators=5-1.ipynb
plot-aebe61a1/output/notebook.ipynb
plot-f7ad8452/output/notebook.ipynb
Download all the executed notebooks:
[11]:
ploomber cloud download 'grid-7e41ace5/*'
Writing file into path grid-7e41ace5/output/.notebook-n_estimators=1-0.ipynb.metadata
Writing file into path grid-7e41ace5/output/.notebook-n_estimators=20-3.ipynb.metadata
Writing file into path grid-7e41ace5/output/.notebook-n_estimators=5-1.ipynb.metadata
Writing file into path grid-7e41ace5/output/.notebook-n_estimators=10-2.ipynb.metadata
Writing file into path grid-7e41ace5/output/notebook-n_estimators=5-1.ipynb
Writing file into path grid-7e41ace5/output/notebook-n_estimators=10-2.ipynb
Writing file into path grid-7e41ace5/output/notebook-n_estimators=20-3.ipynb
Writing file into path grid-7e41ace5/output/notebook-n_estimators=1-0.ipynb
Note that we’re using the identifier printed when we submitted the notebook.
For a better understanding of the previous cells, you can read more details about execution monitoring and downloading results in the previous guide.
Input data¶
If your notebook requires input data, you can upload it.
We have prepared two sample notebooks that will allow you to work with uploads of input data. To download the first one that will be used, run in your terminal:
curl https://raw.githubusercontent.com/ploomber/projects/master/guides/cloud-notebooks-user-guide/notebooks/input-data.ipynb -o notebooks/input-data.ipynb
Let’s see what happens if we try to run a notebook with missing input data:
[12]:
ploomber cloud nb notebooks/input-data.ipynb
Uploading input-data-49dc8734.ipynb...
Triggering execution of input-data-49dc8734.ipynb...
Error: Error validating inputs/outputs: {'missing': {'../data/penguins.csv'}} (status: 400)
Ploomber Cloud will parse your notebook and look for referenced files. If they’re missing in your data workspace, it’ll show an error like the one above.
In our notebook, we have the following line:
df = pd.read_csv('../data/penguins.csv')
Ploomber realizes you’re using a local file at ../data/penguins.csv
. Since files can be either inputs or outputs, you have to indicate Ploomber what they are. To fix this, add a raw cell at the top:
# this determines where to look for input data and where
# to store outputs
prefix: penguins-classification
# for each path in our notebook, indicate if it's an input or output
# the values must be the same as in your notebook
inputs:
- ../data/penguins.csv
# no outputs, so no need to add an "outputs" section
The second sample notebook to be used will contain the raw cell example. To download it, simply run:
curl https://raw.githubusercontent.com/ploomber/projects/master/guides/cloud-notebooks-user-guide/notebooks/input-data-with-raw-cell.ipynb -o notebooks/input-data-with-raw-cell.ipynb
Let’s run the notebook that contains the raw cell:
[13]:
ploomber cloud nb notebooks/input-data-with-raw-cell.ipynb
Uploading input-data-with-raw-cell-d896c53b.ipynb...
Triggering execution of input-data-with-raw-cell-d896c53b.ipynb...
Error: Cannot start execution. The following inputs are missing:
- ../data/penguins.csv
Upload them to your data workspace or using the CLI:
ploomber cloud data --upload ../data/penguins.csv --prefix penguins-classification/input --name data-penguins.csv
(status: 400)
This time, Ploomber Cloud is telling us the files are not in our data workspace. So let’s upload them.
First, let’s get the data:
[14]:
curl https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv -o penguins.csv
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 13478 100 13478 0 0 54957 0 --:--:-- --:--:-- --:--:-- 57110
Use the command printed in the error message:
[19]:
# NOTE: you may need to change the path in the --upload argument if the file is somewhere else
ploomber cloud data --upload penguins.csv --prefix penguins-classification/input --name data-penguins.csv
Uploading data-penguins.csv...
Let’s submit the notebook:
[20]:
ploomber cloud nb notebooks/input-data-with-raw-cell.ipynb
Uploading input-data-with-raw-cell-d539ba23.ipynb...
Triggering execution of input-data-with-raw-cell-d539ba23.ipynb...
Wait for a couple of minutes to finish (status
will appear as finished
):
[22]:
ploomber cloud list
created_at runid status
-------------- ------------------------------------ --------
9 minutes ago 19f8242e-373b-4b2b-bee4-0181a3edfc51 finished
31 minutes ago b39238a2-3826-495d-90ca-b29139e324f0 finished
an hour ago 2d4bcadf-5acb-49a5-8806-af2dbe1b32fe finished
7 hours ago ee78f4c1-ee42-4ba5-ba2f-9e73ae9228d6 finished
The prefix
in the raw cell determines where the outputs are stored. Hence, to download all outputs:
[23]:
ploomber cloud download 'penguins-classification/*'
Writing file into path penguins-classification/output/.notebook.ipynb.metadata
Writing file into path penguins-classification/output/notebook.ipynb
Outputs¶
prefix: some-experiment
outputs:
- path/to/model.pickle
Resources (memory, CPU and GPU)¶

You can request more resources for your notebook execution by adding the following in the raw cell:
task_resources:
vcpus: 8 # number of CPUs
memory: 16384 # memory in MiB
See this notebook for an example (Note: the configuration cell is not visible on GitHub, you have to view it with Jupyter). If you want to download this sample notebook and test it locally, run the following command:
curl https://raw.githubusercontent.com/ploomber/projects/master/guides/cloud-notebooks-user-guide/notebooks/resources.ipynb -o notebooks/resources.ipynb
Note: The free community plan is capped to 2 CPUS and 4GiB of memory and no GPUs. If you need more resources, you can subscribe to the Teams plan. If you’re a student or researcher, join our Slack and we’ll lift the restrictions.
Packages¶
By default, Ploomber Cloud will parse your import
statements and install the latest version. If you want a specific version, add this in your raw cell:
dependencies:
- matplotlib==3.5.3
- scikit-learn==1.1.0
See this notebook for an example (Note: the configuration cell is not visible on GitHub, you have to view it with Jupyter). If you want to download this sample notebook and test it locally, run the following command:
curl https://raw.githubusercontent.com/ploomber/projects/master/guides/cloud-notebooks-user-guide/notebooks/dependencies.ipynb -o notebooks/dependencies.ipynb
Extra files¶
If your notebook depends on extra files (e.g., utility functions). You can include them when executing the notebook. In your raw top cell, add the include
section:
include:
# you can put individual files
- functions.py
# or directories
- more_functions/
Here is a complete example.
Important: Do not include large data files here! Because this is uploaded every time you run your notebook. If you have input data files, see the Input data section.
Concurrent runs¶
The free community plan allows you to run parallel jobs via the grid
feature. However, you cannot start a new execution until that one is done. If you need concurrent runs, you can subscribe to the Teams plan. If you’re a student or researcher, join our Slack and we’ll lift the restrictions.
To abort your latest run:
ploomber cloud abort @latest
Debugging¶
To see the status of your runs:
ploomber cloud list
To see tasks within a given run:
ploomber cloud status {runid}
# or for the latest run
ploomber cloud status @latest
Even if your notebook fails, the failed notebook is uploaded, you can use it for debugging:
ploomber cloud download 'path/to/notebook.ipynb'
To list existing files in your products workspace:
ploomber cloud products
To get the logs for all tasks in the run:
ploomber cloud logs {runid}
# or for the latest run
ploomber cloud logs @latest
To get the logs for the Docker building process:
ploomber cloud logs {runid} --image
# or for the latest run
ploomber cloud logs @latest --image
[ ]: