Getting Started with Pipelines API

Note: To use Ploomber Cloud, you need an API key, click here to get one.

Ploomber Cloud allows you to go from your local environment to a distributed environment in the cloud (to run hundreds of experiments in parallel!) and back in a single command.

The pipeline

We’ll run a sample pipeline that prepares some data and trains 10 models in parallel, here’s how the pipeline looks like:

[1]:
from ploomber.spec import DAGSpec
[2]:
dag = DAGSpec('pipeline.yaml').to_dag()
dag.plot()
[2]:
../_images/cloud_pipelines-api-getting-started_3_1.png

All we need is to provide a requirements.lock.txt with the dependencies that our project needs. No need to learn Kubernetes! Ploomber Cloud will to all the heavy lifting of sending our code to the cloud, creating a Docker image, scheduling jobs, communicating them and storing the results.

Setup

We are constantly updating the Ploomber Cloud CLI so ensure you’re running the latest version:

pip install ploomber --upgrade

Now, let’s set our API key:

ploomber cloud set-key {your-key}

Download this example:

# download
ploomber examples -n guides/cloud-execution -o example

# move to the example
cd example

To the cloud!

In Ploomber, you run pipelines locally with:

ploomber build

To run the pipeline in the cloud, you do:

ploomber cloud build

Let’s see it in action!

[6]:
%%sh
ploomber cloud build
Zipping project -> project.zip
Uploading project...
Uploaded project, starting execution...
Starting build...

That’s it! Let’s now check that our pipeline was scheduled:

[7]:
%%sh
ploomber cloud list
created_at      runid                                 status
--------------  ------------------------------------  --------
14 seconds ago  136f9b47-27fa-4f65-b06f-b6fe665f3ea9  created
8 hours ago     62ac3682-b1bf-4521-aa26-260135a2f982  finished
10 hours ago    921748f6-5ef7-4f4e-a500-07162d1e3245  finished
12 hours ago    2b21a085-1d03-4426-be99-0f7a1b8c4f2e  finished
13 hours ago    abd50a1c-f5ee-479b-9a3c-92c7be172b9e  finished

We can see our runs in the list, let’s copy the ID to retrieve the status:

[8]:
%%sh
ploomber cloud status 136f9b47-27fa-4f65-b06f-b6fe665f3ea9
Run created...

“Run created…” means Ploomber Cloud is building the Docker image, let’s wait a minute to give it some time to finish and send the jobs to the cluster. Ploomber Cloud runs one container per task, allowing you to parallelize our pipeline easily!

You can check the Docker image building docs with the following command:

[9]:
%%sh
ploomber cloud logs 136f9b47-27fa-4f65-b06f-b6fe665f3ea9 --image
Image build hasn't started yet...

It’ll take about a minute for the Docker build process to start, you may execute the following command to continuously watch the logs:

[ ]:
%%sh
ploomber cloud logs {runid} --image --watch
```

Let’s check the status of the individual tasks:

[11]:
%%sh
ploomber cloud status 136f9b47-27fa-4f65-b06f-b6fe665f3ea9
taskid                          name    runid                           status
------------------------------  ------  ------------------------------  --------
da8f149e-27cb-4297-8061-8dae36  fit5    136f9b47-27fa-4f65-b06f-b6fe66  created
1ab37b                                  5f3ea9
c6920fd4-ba65-495c-b022-d68274  fit9    136f9b47-27fa-4f65-b06f-b6fe66  created
ff7307                                  5f3ea9
9bfdaf0e-28d3-43b1-9c91-c1e185  fit6    136f9b47-27fa-4f65-b06f-b6fe66  created
255797                                  5f3ea9
69cbd6db-0b29-49db-a87a-70c70c  fit1    136f9b47-27fa-4f65-b06f-b6fe66  created
fb066f                                  5f3ea9
b26948e3-4ff8-49e7-afbf-9c3cf1  fit0    136f9b47-27fa-4f65-b06f-b6fe66  created
ac0f75                                  5f3ea9
13e30907-9adb-4fd2-ac1d-c70c5b  join    136f9b47-27fa-4f65-b06f-b6fe66  created
a6368a                                  5f3ea9
bef13025-b0f3-4563-95f0-95b332  fit7    136f9b47-27fa-4f65-b06f-b6fe66  created
9afbb9                                  5f3ea9
85e0fd4b-911b-42be-9593-63209b  fit4    136f9b47-27fa-4f65-b06f-b6fe66  created
47c933                                  5f3ea9
94c0cbbf-853e-47c6-972b-76a01d  fit8    136f9b47-27fa-4f65-b06f-b6fe66  created
e680ac                                  5f3ea9
39219c5b-ec35-42d6-bc34-bbf21c  fit3    136f9b47-27fa-4f65-b06f-b6fe66  created
6d568c                                  5f3ea9
27586e14-f07b-4734-86b8-7219fe  fit2    136f9b47-27fa-4f65-b06f-b6fe66  created
654d90                                  5f3ea9

Great! We see that our jobs have been scheduled.

You can execute the following to watch the task status continuously:

[ ]:
%%sh
ploomber cloud status {runid} --watch
```

Let’s give it a few minutes for it to finish training the 10 models, and run the command again:

[12]:
%%sh
ploomber cloud status 136f9b47-27fa-4f65-b06f-b6fe665f3ea9
taskid                          name    runid                           status
------------------------------  ------  ------------------------------  --------
da8f149e-27cb-4297-8061-8dae36  fit5    136f9b47-27fa-4f65-b06f-b6fe66  finished
1ab37b                                  5f3ea9
c6920fd4-ba65-495c-b022-d68274  fit9    136f9b47-27fa-4f65-b06f-b6fe66  finished
ff7307                                  5f3ea9
9bfdaf0e-28d3-43b1-9c91-c1e185  fit6    136f9b47-27fa-4f65-b06f-b6fe66  finished
255797                                  5f3ea9
69cbd6db-0b29-49db-a87a-70c70c  fit1    136f9b47-27fa-4f65-b06f-b6fe66  finished
fb066f                                  5f3ea9
b26948e3-4ff8-49e7-afbf-9c3cf1  fit0    136f9b47-27fa-4f65-b06f-b6fe66  finished
ac0f75                                  5f3ea9
13e30907-9adb-4fd2-ac1d-c70c5b  join    136f9b47-27fa-4f65-b06f-b6fe66  finished
a6368a                                  5f3ea9
bef13025-b0f3-4563-95f0-95b332  fit7    136f9b47-27fa-4f65-b06f-b6fe66  finished
9afbb9                                  5f3ea9
85e0fd4b-911b-42be-9593-63209b  fit4    136f9b47-27fa-4f65-b06f-b6fe66  finished
47c933                                  5f3ea9
94c0cbbf-853e-47c6-972b-76a01d  fit8    136f9b47-27fa-4f65-b06f-b6fe66  finished
e680ac                                  5f3ea9
39219c5b-ec35-42d6-bc34-bbf21c  fit3    136f9b47-27fa-4f65-b06f-b6fe66  finished
6d568c                                  5f3ea9
27586e14-f07b-4734-86b8-7219fe  fit2    136f9b47-27fa-4f65-b06f-b6fe66  finished
654d90                                  5f3ea9

Task logs

To check the logs for each task:

(Note: it may take a few minutes for the task to start, and hence, for the logs to be visible. Ploomber Cloud spins up the necessary infrastructure on-demand, so you only pay for what you use, however, this requires us to shut down any unused infrastructure) (Note: You can use the latest tag to get the latest run logs: ploomber cloud logs latest)

[1]:
%%sh
ploomber cloud logs 136f9b47-27fa-4f65-b06f-b6fe665f3ea9
Error: Missing api key
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
/var/folders/qx/px2tf0rn5bq6c79yd3gk37fwzpvmb9/T/ipykernel_47285/2156033434.py in <module>
----> 1 get_ipython().run_cell_magic('sh', '', 'ploomber cloud logs 136f9b47-27fa-4f65-b06f-b6fe665f3ea9\n\n')

~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2417             with self.builtin_trap:
   2418                 args = (magic_arg_s, cell)
-> 2419                 result = fn(*args, **kwargs)
   2420             return result
   2421

~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/magics/script.py in named_script_magic(line, cell)
    140             else:
    141                 line = script
--> 142             return self.shebang(line, cell)
    143
    144         # write a basic docstring:

~/opt/anaconda3/lib/python3.9/site-packages/decorator.py in fun(*args, **kw)
    230             if not kwsyntax:
    231                 args, kw = fix(args, kw, sig)
--> 232             return caller(func, *(extras + args), **kw)
    233     fun.__name__ = func.__name__
    234     fun.__doc__ = func.__doc__

~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188
    189         if callable(arg):

~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/magics/script.py in shebang(self, line, cell)
    243             sys.stderr.flush()
    244         if args.raise_error and p.returncode!=0:
--> 245             raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
    246
    247     def _run_script(self, p, cell, to_close):

CalledProcessError: Command 'b'ploomber cloud logs 136f9b47-27fa-4f65-b06f-b6fe665f3ea9\n\n'' returned non-zero exit status 1.

Or using the latest tag

[ ]:
%%sh
ploomber cloud logs latest

…and back!

Great, so our jobs have finished. We can explore our cloud storage to see what files are available:

[14]:
%%sh
ploomber cloud products
path
-----------------------
output/features.parquet
output/get.html
output/get.parquet
output/join.parquet
output/model-0.pickle
output/model-1.pickle
output/model-2.pickle
output/model-3.pickle
output/model-4.pickle
output/model-5.pickle
output/model-6.pickle
output/model-7.pickle
output/model-8.pickle
output/model-9.pickle
output/model.pickle
output/nb-0.html
output/nb-1.html
output/nb-2.html
output/nb-3.html
output/nb-4.html
output/nb-5.html
output/nb-6.html
output/nb-7.html
output/nb-8.html
output/nb-9.html
output/nb.html
output/raw.parquet

Each model training task automatically generates an HTML report, we can download the files with this command:

ploomber cloud download {pattern}

Where pattern is a glob-like pattern, for example to download all files:

ploomber cloud download '*'

Or files with a specific extension:

ploomber cloud download '*.extension'

Let’s download the HTML reports:

[15]:
%%sh
ploomber cloud download '*.html'
Downloading output/nb-9.html
Downloading output/nb.html
Downloading output/.get.html.metadata
Downloading output/nb-5.html
Downloading output/.nb.html.metadata
Downloading output/nb-0.html
Downloading output/.nb-1.html.metadata
Downloading output/.nb-9.html.metadata
Downloading output/.nb-2.html.metadata
Downloading output/nb-6.html
Downloading output/nb-3.html
Downloading output/.nb-0.html.metadata
Downloading output/nb-4.html
Downloading output/.nb-6.html.metadata
Downloading output/get.html
Downloading output/.nb-5.html.metadata
Downloading output/.nb-3.html.metadata
Downloading output/nb-2.html
Downloading output/.nb-7.html.metadata
Downloading output/.nb-8.html.metadata
Downloading output/.nb-4.html.metadata
Downloading output/nb-7.html
Downloading output/nb-8.html
Downloading output/nb-1.html

Each fit task generated a model evaluation report. Go check them out!

Incremental builds

Ploomber allows you to dramatically speed up iterations with incremental builds. Let’s revisit our pipeline structure:

[16]:
dag.plot()
[16]:
../_images/cloud_pipelines-api-getting-started_31_1.png

Let’s say you modify the join task. If you run ploomber cloud build, Ploomber Cloud will only execute the tasks that have changed. So it will run join, and all the fit tasks, but skip get, and features!

To force execution of all tasks, you may execute: ploomber cloud build --force

Debugging

If any of your notebooks (or scripts) fails, a copy of the partially executed notebook will be uploaded, so you can debug it.

For example, let’s say I add the following in my notebook/script:

raise ValueError('some new error!')

Upon, execution, I can retrieve the logs with:

ploomber cloud logs latest

Or use the runid

ploomber cloud logs {runid}

And I’ll see the following:

---------------------------------------------------------------------------
Exception encountered at "In [9]":
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_41/2195319445.py in <cell line: 1>()
----> 1 raise ValueError('some new error!')
ValueError: some new error!

And a few lines below:

ploomber.exceptions.TaskBuildError: Error when executing task 'fit'. Partially executed notebook uploaded to remote storage at: products//project/output/nb.ipynb

I can download that partially executed notebook with:

ploomber cloud download '*nb.ipynb'

Then, I can open the notebook, and I’ll see the code and cells with their corresponding output so I can debug!

Abort an executing job

At the moment we do support concurrent executions for paid users only. If you wish to submit a different job while there is one executing in the cloud, you will need to abort the running one using this command:

ploomber cloud abort {runid}

Or use the latest tag

ploomber cloud abort latest

Note: You can use the latest tag to abort the latest run: ploomber cloud abort latest Note: to get the list of runids, execute ploomber cloud list

That’s it!

We hope you enjoyed this tutorial and are excited to use Ploomber in your next project. Questions? Ping us on Slack!