To run this example locally, install Ploomber and execute: ploomber examples -n guides/debugging

To start a free, hosted JupyterLab: binder-logo

Found an issue? Let us know.

Have questions? Ask us anything on Slack.

Debugging

Tutorial showing techniques for debugging pipelines.

For a quick reference, click here.

Debugger basics

Skip this if you’re already familiar with the Python debugger.

A debugger is a program that helps inspect another program for debugging. Python comes with a debugger called pdb.

There are a few approaches for debugging programs. One approach is line-by-line debugging, which starts our program in debug mode so we can easily inspect variables, move to the next line, etc.

One important concept to know when debugging is stack frame. Simply speaking, stack frames represent the state of our code at a given level. When you write a non-trivial function, it will depend on other functions to work (yours or from third party packages). Each function has its own stack frame which defines the variables that are available to it.

When a program fails, it can do so at different levels (i.e. a different stack frame). Let’s see a simple example:

def reciprocal(x):
    return 1/x

def reciprocal_and_multiply(x, y):
    return reciprocal(x) * y

There are two places where things can go wrong in the program above: if we pass x=0, the reciprocal operation will fail. If we pass y=None, the program fails, but it will do so in the reciprocal_and_multiply function. For this trivial example, it’s easy to see at which level the code breaks but in a real program the source code alone is usually not enough to know. Moving between stack frames can help you find out where the error is coming from.

Understanding error messages

Let’s take a look at our example pipeline declaration:

# Content of pipeline.yaml
tasks:
  - source: load.py
    product:
      nb: output/raw.html
      train: output/train.csv
      test: output/test.csv

  - source: preprocess.py
    product: output/clean.html

Very simple, two tasks. One loads the data and the next one preprocess it.

Let’s run the pipeline and then analyze the output:

[1]:
%%sh --no-raise-error
ploomber build --force
Loading pipeline...
Building task 'load':   0%|          | 0/2 [00:00<?, ?it/s]
Executing:   0%|          | 0/5 [00:00<?, ?cell/s]
Executing:  20%|██        | 1/5 [00:01<00:07,  1.97s/cell]
Executing: 100%|██████████| 5/5 [00:02<00:00,  2.15cell/s]
Building task 'preprocess':  50%|█████     | 1/2 [00:02<00:02,  2.86s/it]
Executing:   0%|          | 0/6 [00:00<?, ?cell/s]
Executing:  17%|█▋        | 1/6 [00:03<00:15,  3.15s/cell]
Executing:  67%|██████▋   | 4/6 [00:03<00:01,  1.60cell/s]
Executing: 100%|██████████| 6/6 [00:03<00:00,  1.57cell/s]
Building task 'preprocess': 100%|██████████| 2/2 [00:06<00:00,  3.41s/it]

=============================== DAG build failed ===============================
----------- NotebookRunner: preprocess -> File('output/clean.html') ------------
------- /Users/Edu/dev/projects-ploomber/guides/debugging/preprocess.py --------

---------------------------------------------------------------------------
Exception encountered at "In [6]":
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/3h/_lvh_w_x5g30rrjzb_xnn2j80000gq/T/ipykernel_66697/1567702903.py in <module>
----> 1 my_preprocessing_function(X_train, X_test)

/var/folders/3h/_lvh_w_x5g30rrjzb_xnn2j80000gq/T/ipykernel_66697/2058553394.py in my_preprocessing_function(X_train, X_test)
      2     encoder = OneHotEncoder()
      3     X_train_t = encoder.fit_transform(X_train)
----> 4     X_test_t = encoder.transform(X_test)
      5     return X_train_t, X_test_t

~/miniconda3/envs/projects/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py in transform(self, X)
    880             "infrequent_if_exist",
    881         }
--> 882         X_int, X_mask = self._transform(
    883             X,
    884             handle_unknown=self.handle_unknown,

~/miniconda3/envs/projects/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py in _transform(self, X, handle_unknown, force_all_finite, warn_on_unknown)
    158                         " during transform".format(diff, i)
    159                     )
--> 160                     raise ValueError(msg)
    161                 else:
    162                     if warn_on_unknown:

ValueError: Found unknown categories ['d'] in column 0 during transform

ploomber.exceptions.TaskBuildError: ("Error when executing task 'preprocess'. Partially executed notebook available at /Users/Edu/dev/projects-ploomber/guides/debugging/output/clean.ipynb", NotebookRunner: preprocess -> File('output/clean.html'))
ploomber.exceptions.TaskBuildError: Error building task "preprocess"
=============================== Summary (1 task) ===============================
NotebookRunner: preprocess -> File('output/clean.html')
=============================== DAG build failed ===============================

Need help? https://ploomber.io/community

The summary at the bottom gives a high-level explanation:

=============================== Summary (1 task) ===============================
NotebookRunner: preprocess -> File('output/clean.html')
=============================== DAG build failed ===============================

The preprocess task failed. Go up a few lines:

ploomber.exceptions.TaskBuildError: An error occurred when calling papermil.execute_notebook, partially executed notebook with traceback available at ...

That’s useful, it tells us where we can find the partially executed notebook in case we want to take a look at it. A few lines up:

ValueError: Found unknown categories ['d'] in column 1 during transform

That’s the exact line that failed, if you take a look at the original error traceback, you’ll see that the actual line that raised the exception comes from the scikit-learn library (_encoders.py file):

~/miniconda3/envs/ploomber/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py in _transform(self, X, handle_unknown)
    122                     msg = ("Found unknown categories {0} in column {1}"
    123                            " during transform".format(diff, i))
--> 124                     raise ValueError(msg)
    125                 else:
    126                     # Set the problematic rows to an acceptable value and

ValueError: Found unknown categories ['d'] in column 0 during transform

The error message provides us a lot of information: Our pipeline failed while executing task preprocess. Somewhere in our task’s code we ran something that made scikit-learn crash.

Let’s take a look at the failing task’s source code:

# Content of preprocess.py
# %%
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# %% tags=["parameters"]
upstream = ['load']
product = None



# %%
def my_preprocessing_function(X_train, X_test):
    encoder = OneHotEncoder()
    X_train_t = encoder.fit_transform(X_train)
    X_test_t = encoder.transform(X_test)
    return X_train_t, X_test_t


# %%
X_train = pd.read_csv(upstream['load']['train'])
X_test = pd.read_csv(upstream['load']['test'])

# %%
my_preprocessing_function(X_train, X_test)

Our preprocess.py script is using scikit-learn’s OneHotEncoder to transform variables. The error message offers some information but not enough to fix the issue (we don’t have a column named “0”!). There must be something going on internally.

This is a good use case for Ploomber’s debugging capabilities.

Starting a debugging session

There are three ways to start a debugger:

  1. Jump to the first line and start the debugger

  2. (post-mortem) Let the task run and start the debugger as soon as it fails

  3. (breakpoint) Jump to a specific line and start the debugger

We’ll analyze the three of them but feel free to jump to the one that’s more applicable to your use case.

Jump to the first line and start the debugger

To start a debugging session you first have to start an interactive session. To do so, run the following command in the terminal:

ploomber interact

When it finishes setting things up, your pipeline will be available in the dag variable. This is a standard Python session, you can execute any Python code you want.

We already know that the error is happening in the preprocess task, you can start a line-by-line debugging session with the following command:

>>> dag['preprocess'].debug()

Here’s a replay of my debugging session (with comments):

# COMMENT: I entered the command "next" a few times until I reached the failing line
ipdb>
> /var/folders/3h/_lvh_w_x5g30rrjzb_xnn2j80000gq/T/tmpbatitar6.py(45)<module>()
     41 X_train = pd.read_csv(upstream['load']['train'])
     42 X_test = pd.read_csv(upstream['load']['test'])
     43
     44 # + tags=[]
# COMMENT: "--->" means that line will be executed when I send the "next" command
---> 45 my_preprocessing_function(X_train, X_test)

ipdb>
# COMMENT: Same error message that we got before!
ValueError: Found unknown categories ['d'] in column 0 during transform
> /var/folders/3h/_lvh_w_x5g30rrjzb_xnn2j80000gq/T/tmpbatitar6.py(45)<module>()
     41 X_train = pd.read_csv(upstream['load']['train'])
     42 X_test = pd.read_csv(upstream['load']['test'])
     43
     44 # + tags=[]
---> 45 my_preprocessing_function(X_train, X_test)
# COMMENT: I entered the "down" command to move one stack frame down (inside my_preprocessing_function)
ipdb> down
> /var/folders/3h/_lvh_w_x5g30rrjzb_xnn2j80000gq/T/tmpbatitar6.py(36)my_preprocessing_function()
     34     encoder = OneHotEncoder()
     35     X_train_t = encoder.fit_transform(X_train)
# COMMENT: The line that raised the exception
---> 36     X_test_t = encoder.transform(X_test)
     37     return X_train_t, X_test_t
     38

# COMMENT: Print X_train and X_test
ipdb> X_train
  cat
0   a
1   b
2   c
ipdb> X_test
  cat
0   a
1   b
2   d
# COMMENT: Exit debugger with the "quit" command
ipdb> quit

Ah-ha! The encoder is fitted with a column with values a, b and c but then applied to a testing set with value d. That’s why it’s breaking.

This is an example of how your code could be doing everything right, but your data is incompatible. How you fix this is up to us. The important thing is that we know why things are failing.

(post-mortem) Let the task run and start the debugger as soon as it fails

Note: post-mortem debugging was improved in Ploomber 0.20, ensure have at least that version

An alternative approach is to let the program run and start the debugging session as soon as it finds an exception, this is called post-mortem debugging.

To start a debugging session”

ploomber task {task-name} --debug

or:

ploomber build --debug

This will start the debugger as soon as the code breaks, alternatively, you can serialize the error to start the debugger session whenever you want:

Added in Ploomber 0.20

ploomber task {task-name} --debuglater

or:

ploomber build --debuglater

Then, start the debugging session with:

dltr {task-name}.dump

Note: you may delete the ``{task-name}.dump`` file once you are done debugging

Important: Beware that using ``–debuglater`` will serialize all the variables, so ensure you have enough disk space when using it, especially if running with the Parallel executor

Here’s the (commented) replay of my post-mortem debugging session:

# COMMENT: I deleted a few lines for brevity
ValueError: Found unknown categories ['d'] in column 0 during transform
> /Users/Edu/miniconda3/envs/ploomber/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py(124)_transform()
    122                     msg = ("Found unknown categories {0} in column {1}"
    123                            " during transform".format(diff, i))
# COMMENT: The session starts here. Not very useful because we are inside the scikit-learn package
# (note the file path: site-packages/sklearn/preprocessing/_encoders.py)
--> 124                     raise ValueError(msg)
    125                 else:
    126                     # Set the problematic rows to an acceptable value and
# COMMENT: Let's move up
ipdb> up
> /Users/Edu/miniconda3/envs/ploomber/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py(428)transform()
    426         check_is_fitted(self)
    427         # validation of X happens in _check_X called by _transform
--> 428         X_int, X_mask = self._transform(X, handle_unknown=self.handle_unknown)
    429
    430         n_samples, n_features = X_int.shape
# COMMENT: Still inside scikit-learn, let's move up again
ipdb> up
> /var/folders/3h/_lvh_w_x5g30rrjzb_xnn2j80000gq/T/tmp653y199s.py(36)my_preprocessing_function()
     34     encoder = OneHotEncoder()
     35     X_train_t = encoder.fit_transform(X_train)
# COMMENT: Now are are in our task's code, same place as in the previous example
---> 36     X_test_t = encoder.transform(X_test)
     37     return X_train_t, X_test_t
     38
ipdb> X_train
  cat
0   a
1   b
2   c
ipdb> X_test
  cat
0   a
1   b
2   d
ipdb> quit

As you can see, we can use either of these two approaches.

(breakpoint) Jump to a specific line and start the debugger

The previous example showed how we could debug a program that raises an exception. A more difficult scenario is when our program runs without errors but we find issues in the output (e.g. charts are not displaying correctly, data file has NAs, etc).

This is a much harder problem because we don’t know where to look at! If a bug is originated in task A it might propagate to any downstream tasks that use the product from A as input, this is why testing is essential. By explicitly checking our data expectations, we increase the chance of catching errors at the source, rather than in a downstream task.

When it happens (and trust me, it will), we recommend you to follow a recursive approach: Once you detect the error, the first question to answer is: Which task produced this output? Once you know that, start a line-by-line debugging session (post-mortem won’t work because there is no exception!), and carefully check variables to see if you can spot the error.

If everything looks correct, go to all upstream tasks and repeat this process. You can do this from the command line.

First, start an interactive session from the terminal:

ploomber interactive

Then debug the task that produced the buggy output:

>>> dag['buggy_task'].debug()

If that’s not enough, check upstream tasks. To find upstream tasks, use task.upstream:

>>> dag['buggy_task'].upstream

If you have a hypothesis of where the error might be. You can insert a breakpoint in your task’s source code to start a debugging session at any given point:

# buggy_task.py

# some code
# ...
# ...

# breakpoint: this is where I think the bug is...
import pdb; pdb.set_trace()


# more code
# ...
# ...

Then start a post-mortem session. The debugger will start at the line where you inserted the breakpoint.

Using the CLI to check if we fixed the bug

In a real scenario, we might try a few things before we find bug fix. To quickly iterate over candidate solutions, we’d like to check if the applied change makes our pipeline not to throw an error. This is where Ploomber’s incremental builds come in handy.

If we narrowed down the error to a specific task, we can apply changes and quickly check if the new code runs correctly by just running that task:

ploomber task {task-name}

If the exception happens in task B, but the solution has to be implemented in task A (where A is an upstream dependency of B), then we have to make sure that we run A and B to verify the fix. A full end-to-end run is wasteful but so is an incremental run if B has many downstream tasks. For testing purposes, we just care about things going well until ``B``. This is a good use of a partial build: it will run all tasks until it reaches a selected task (by default, it will still skip up-to-date tasks). In our case:

ploomber build --partially B

Letting our pipeline fail under unforeseen circumstances

The error in our program is of particular interest because it posits a common scenario: our program is correct but still failed due to unforeseen circumstances (unexpected data properties). Although these bugs challenge our assumptions about input data, fixing the error is just as important as explaining why we fixed the way we did it.

Picture this: we decide to drop all observations that contain the unexpected value (d), now our pipeline runs correctly. A few months later, we receive new data so we run the pipeline again, but we run into the same issue because of a new unexpected value (say, e).

We could argue that one solution would be to drop all unexpected values. Is this the best approach? Dropping observations silently is dangerous, as they might contain helpful information for our analysis. If we bury a drop=True piece of code in a pipeline with dozens of files, we will cause a lot of trouble to someone (which could be us) in the future. As we mentioned in the previous guide: explicitly stating our data expectations is the way to move forward.

If we decide dropping d is a reasonable choice, we can encode our new data expectations in the upstream task testing function (because that’s the task that supplies input data). Let’s recall how our pipeline looks like:

# Content of pipeline.yaml
tasks:
  - source: load.py
    product:
      nb: output/raw.html
      train: output/train.csv
      test: output/test.csv

  - source: preprocess.py
    product: output/clean.html

load supplies input for preprocess. Our testing function for the load task would look like this:

def test_load(product):
    train = pd.read_csv(str(product['train']))
    test = pd.read_csv(str(product['test']))

    # NOTE: these are the expected values in the cat column
    # we expect value 'd' in the testing set and we'll
    # drop it during preprocessing. Any other unexpected
    # values will raise an exception here so we have the
    # chance to decide what to do with it
    assert set(train['cat'].unique()) == {'a', 'b', 'c'}
    assert set(test['cat'].unique()) == {'a', 'b', 'c', 'd'}

The comment should actually be part of the testing function, without it, there is no context to understand why are we testing such a specific condition.

Debugging (templated) SQL scripts

So far, we’ve discussed how to debug Python scripts, but SQL scripts can also fail. In a previous guide, we showed how templated SQL scripts help us write more concise SQL code, but this comes with a cost. Relying too much on templating makes our templated source code short but hard to read. If your database complains about syntax errors when executing SQL tasks, chances are, the errors is coming from incorrect templating logic. One good first debugging step is to take a look at the rendered code. You can do so from the command line:

ploomber task {task-name} --source

Apart from looking at rendered code, there isn’t much to say about debugging SQL scripts because there are no interactive debuggers. The best we can do is to organize our scripts in a clear way to make it easy to spot errors.

Where to go next