To run this example locally, install Ploomber and execute: ploomber examples -n guides/logging

To start a free, hosted JupyterLab: binder-logo

Found an issue? Let us know.

Have questions? Ask us anything on Slack.

Logging

Tutorial showing how to add logging to a pipeline.

Sample pipeline

The pipeline we’ll be using for this guide contains two tasks (a script and a function):

# Content of basic/pipeline.yaml
tasks:
  - source: script.py
    product: output/nb-log.html
    papermill_params:
      log_output: True

  - source: tasks.function
    product: output/fn-log.txt

Note that the script task contains:

papermill_params:
    log_output: True

This extra configuration is required on each script/notebook task in your pipeline to enable logging. The code on each task isn’t important; they contain a for loop and log a message on each iteration. Let’s see it in action:

[1]:
%%sh
cd basic
ploomber build --log info --force
Loading pipeline...
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.66689       60.8412
function  True          3.00372       39.1588
WARNING:traitlets:Kernel Provisioning: The 'local-provisioner' is not found.  This is likely due to the presence of multiple jupyter_client distributions and a previous distribution is being used as the source for entrypoints - which does not include 'local-provisioner'.  That distribution should be removed such that only the version-appropriate distribution remains (version >= 7).  Until then, a 'local-provisioner' entrypoint will be automatically constructed and used.
The candidate distribution locations are: ['/workspaces/projects/venv/lib/python3.8/site-packages/jupyter_client-7.3.2.dist-info']
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/Grammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp1j9p6bb9'
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/PatternGrammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp2rs4570f'
INFO:ploomber.dag.dag:Building DAG DAG("basic")
Building task 'script':   0%|          | 0/2 [00:00<?, ?it/s]INFO:ploomber.tasks.abc.NotebookRunner:Starting execution: NotebookRunner: script -> File('output/nb-log.html')
INFO:papermill:Input Notebook:  /tmp/tmp_odbqsk5.ipynb
INFO:papermill:Output Notebook: /workspaces/projects/guides/logging/basic/output/nb-log.ipynb

Executing:   0%|          | 0/6 [00:00<?, ?cell/s]INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Executing Cell 1---------------------------------------
INFO:papermill:Ending Cell 1------------------------------------------

Executing:  17%|█▋        | 1/6 [00:01<00:05,  1.02s/cell]INFO:papermill:Executing Cell 2---------------------------------------
INFO:papermill:Ending Cell 2------------------------------------------
INFO:papermill:Executing Cell 3---------------------------------------
INFO:papermill:Ending Cell 3------------------------------------------
INFO:papermill:Executing Cell 4---------------------------------------
INFO:papermill:Ending Cell 4------------------------------------------
INFO:papermill:Executing Cell 5---------------------------------------
INFO:papermill:Ending Cell 5------------------------------------------
INFO:papermill:Executing Cell 6---------------------------------------
INFO:papermill:INFO:__main__:[script log] Finished step 1...

INFO:papermill:INFO:__main__:[script log] Finished step 2...

INFO:papermill:INFO:__main__:[script log] Finished step 3...

INFO:papermill:INFO:__main__:[script log] Done.

INFO:papermill:Ending Cell 6------------------------------------------

Executing: 100%|██████████| 6/6 [00:04<00:00,  1.40cell/s]
INFO:ploomber.tasks.abc.NotebookRunner:Done. Operation took 4.7 seconds
Building task 'function':  50%|█████     | 1/2 [00:04<00:04,  4.67s/it]INFO:ploomber.tasks.abc.PythonCallable:Starting execution: PythonCallable: function -> File('output/fn-log.txt')
INFO:tasks:[function log] Finished step 1...
INFO:tasks:[function log] Finished step 2...
INFO:tasks:[function log] Finished step 3...
INFO:tasks:[function log] Done.
INFO:ploomber.tasks.abc.PythonCallable:Done. Operation took 3.0 seconds
Building task 'function': 100%|██████████| 2/2 [00:07<00:00,  3.85s/it]
INFO:ploomber.dag.dag: DAG report:
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.66689       60.8412
function  True          3.00372       39.1588

We can see that the logging statements appear in the console. If you want to take a look at the code, click here.

Why not print?

Note that the snippets above use the logging module instead of print. Although print is a quick and easy way to display messages in the console, the logging module is more flexible. Hence, it is the recommended option.

Logging to a file

It’s common to send all your log records to a file. You can do so with the --log-file/-F option:

ploomber build --log info --log-file my.log

Logging to a file from Python

Alternatively, you can configure logging from Python, which gives you more flexibility:

[2]:
# you may store the contents of this cell in a .py file and then call it from the command line
# e.g., python run_pipeline.py
import logging
from pathlib import Path
import sys

from ploomber.executors import Serial
from ploomber.spec import DAGSpec

if Path('my.log').exists():
    Path('my.log').unlink()

logging.basicConfig(filename='my.log', format='%(levelname)s:%(message)s', level=logging.INFO)

# make sure we can import basic/tasks.py since basic/pipeline.yaml uses it
sys.path.append('basic')

dag = DAGSpec('basic/pipeline.yaml').to_dag()
dag.build(force=True)
[2]:
name Ran? Elapsed (s) Percentage
script True 4.76759 61.3428
functionTrue 3.00445 38.6572

Let’s look at the file contents:

[3]:
print(Path('my.log').read_text())
INFO:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/Grammar.txt
INFO:Writing grammar tables to /home/codespace/.cache/black/22.3.0/Grammar3.8.13.final.0.pickle
INFO:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmpeca0cxtc'
INFO:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/PatternGrammar.txt
INFO:Writing grammar tables to /home/codespace/.cache/black/22.3.0/PatternGrammar3.8.13.final.0.pickle
INFO:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp3djgccxy'
INFO:Building DAG DAG("basic")
INFO:Starting execution: NotebookRunner: script -> File('basic/output/nb-log.html')
INFO:Input Notebook:  /tmp/tmp1y05qp_7.ipynb
INFO:Output Notebook: /workspaces/projects/guides/logging/basic/output/nb-log.ipynb
INFO:Executing notebook with kernel: python3
INFO:Executing Cell 1---------------------------------------
INFO:Ending Cell 1------------------------------------------
INFO:Executing Cell 2---------------------------------------
INFO:Ending Cell 2------------------------------------------
INFO:Executing Cell 3---------------------------------------
INFO:Ending Cell 3------------------------------------------
INFO:Executing Cell 4---------------------------------------
INFO:Ending Cell 4------------------------------------------
INFO:Executing Cell 5---------------------------------------
INFO:Ending Cell 5------------------------------------------
INFO:Executing Cell 6---------------------------------------
INFO:INFO:__main__:[script log] Finished step 1...

INFO:INFO:__main__:[script log] Finished step 2...

INFO:INFO:__main__:[script log] Finished step 3...

INFO:INFO:__main__:[script log] Done.

INFO:Ending Cell 6------------------------------------------
INFO:Done. Operation took 4.8 seconds
INFO:Starting execution: PythonCallable: function -> File('basic/output/fn-log.txt')
INFO:[function log] Finished step 1...
INFO:[function log] Finished step 2...
INFO:[function log] Finished step 3...
INFO:[function log] Done.
INFO:Done. Operation took 3.0 seconds
INFO: DAG report:
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.76759       61.3428
function  True          3.00445       38.6572

Controlling logging level

The Python’s logging module allows you to filter messages depending on their priority. For example, when running your pipeline, you may only want to display regular messages, but you may allow regular and debugging messages for more granularity when debugging. Since Ploomber runs tasks differently depending on their type (i.e., functions vs. scripts/notebooks), controlling the logging level requires a bit of extra work. Let’s use the same pipeline in the parametrized directory:

[4]:
%%sh
cd parametrized
ploomber build --log info --env--logging_level info --force
Loading pipeline...
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.49249       59.9239
function  True          3.0045        40.0761
WARNING:traitlets:Kernel Provisioning: The 'local-provisioner' is not found.  This is likely due to the presence of multiple jupyter_client distributions and a previous distribution is being used as the source for entrypoints - which does not include 'local-provisioner'.  That distribution should be removed such that only the version-appropriate distribution remains (version >= 7).  Until then, a 'local-provisioner' entrypoint will be automatically constructed and used.
The candidate distribution locations are: ['/workspaces/projects/venv/lib/python3.8/site-packages/jupyter_client-7.3.2.dist-info']
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/Grammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp8s6jihtc'
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/PatternGrammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp2yj6e00b'
INFO:ploomber.dag.dag:Building DAG DAG("parametrized")
Building task 'script':   0%|          | 0/2 [00:00<?, ?it/s]INFO:ploomber.tasks.abc.NotebookRunner:Starting execution: NotebookRunner: script -> File('output/nb-log.html')
INFO:papermill:Input Notebook:  /tmp/tmprwq9ox1n.ipynb
INFO:papermill:Output Notebook: /workspaces/projects/guides/logging/parametrized/output/nb-log.ipynb

Executing:   0%|          | 0/6 [00:00<?, ?cell/s]INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Executing Cell 1---------------------------------------
INFO:papermill:Ending Cell 1------------------------------------------

Executing:  17%|█▋        | 1/6 [00:00<00:04,  1.12cell/s]INFO:papermill:Executing Cell 2---------------------------------------
INFO:papermill:Ending Cell 2------------------------------------------
INFO:papermill:Executing Cell 3---------------------------------------
INFO:papermill:Ending Cell 3------------------------------------------
INFO:papermill:Executing Cell 4---------------------------------------
INFO:papermill:Ending Cell 4------------------------------------------
INFO:papermill:Executing Cell 5---------------------------------------
INFO:papermill:Ending Cell 5------------------------------------------
INFO:papermill:Executing Cell 6---------------------------------------
INFO:papermill:INFO:__main__:[script log] Finished step 1...

INFO:papermill:INFO:__main__:[script log] Finished step 2...

INFO:papermill:INFO:__main__:[script log] Finished step 3...

INFO:papermill:INFO:__main__:[script log] Done.

INFO:papermill:Ending Cell 6------------------------------------------

Executing: 100%|██████████| 6/6 [00:04<00:00,  1.44cell/s]
INFO:ploomber.tasks.abc.NotebookRunner:Done. Operation took 4.5 seconds
Building task 'function':  50%|█████     | 1/2 [00:04<00:04,  4.49s/it]INFO:ploomber.tasks.abc.PythonCallable:Starting execution: PythonCallable: function -> File('output/fn-log.txt')
INFO:tasks:[function log] Finished step 1...
INFO:tasks:[function log] Finished step 2...
INFO:tasks:[function log] Finished step 3...
INFO:tasks:[function log] Done.
INFO:ploomber.tasks.abc.PythonCallable:Done. Operation took 3.0 seconds
Building task 'function': 100%|██████████| 2/2 [00:07<00:00,  3.76s/it]
INFO:ploomber.dag.dag: DAG report:
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.49249       59.9239
function  True          3.0045        40.0761

Let’s now run the pipeline but switch the logging level to debug, this will print the records we saw above, plus the ones with debug level:

[5]:
%%sh
cd parametrized
ploomber build --log debug --env--logging_level debug --force
Loading pipeline...
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.47259       59.8168
function  True          3.00456       40.1832
DEBUG:ploomber.spec.dagspec:DAGSpec enviroment:
EnvDict({'cwd': '/workspaces/.../parametrized', 'git': 'master', 'git_hash': '1738307-dirty', 'here': '/workspaces/.../parametrized', ...})
DEBUG:ploomber.spec.dagspec:Expanded DAGSpec:
{   'tasks': [   {   'papermill_params': {'log_output': True},
                     'params': {'logging_level': '{{logging_level}}'},
                     'product': 'output/nb-log.html',
                     'source': 'script.py'},
                 {   'params': {'logging_level': '{{logging_level}}'},
                     'product': 'output/fn-log.txt',
                     'source': 'tasks.function'}]}
WARNING:traitlets:Kernel Provisioning: The 'local-provisioner' is not found.  This is likely due to the presence of multiple jupyter_client distributions and a previous distribution is being used as the source for entrypoints - which does not include 'local-provisioner'.  That distribution should be removed such that only the version-appropriate distribution remains (version >= 7).  Until then, a 'local-provisioner' entrypoint will be automatically constructed and used.
The candidate distribution locations are: ['/workspaces/projects/venv/lib/python3.8/site-packages/jupyter_client-7.3.2.dist-info']
DEBUG:ploomber.tasks.abc.NotebookRunner:Setting "script" status to TaskStatus.WaitingRender
DEBUG:ploomber.tasks.abc.PythonCallable:Setting "function" status to TaskStatus.WaitingRender
DEBUG:ploomber.spec.dagspec:Extracted upstream dependencies for task script: None
DEBUG:ploomber.spec.dagspec:Extracted upstream dependencies for task function: None
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/Grammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp55q_ywf2'
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/PatternGrammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmpvxyhyfii'
INFO:ploomber.dag.dag:Building DAG DAG("parametrized")
Building task 'script':   0%|          | 0/2 [00:00<?, ?it/s]INFO:ploomber.tasks.abc.NotebookRunner:Starting execution: NotebookRunner: script -> File('output/nb-log.html')
INFO:papermill:Input Notebook:  /tmp/tmpue38o2oe.ipynb
INFO:papermill:Output Notebook: /workspaces/projects/guides/logging/parametrized/output/nb-log.ipynb

Executing:   0%|          | 0/6 [00:00<?, ?cell/s]INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Executing Cell 1---------------------------------------
INFO:papermill:Ending Cell 1------------------------------------------

Executing:  17%|█▋        | 1/6 [00:00<00:04,  1.13cell/s]INFO:papermill:Executing Cell 2---------------------------------------
INFO:papermill:Ending Cell 2------------------------------------------
INFO:papermill:Executing Cell 3---------------------------------------
INFO:papermill:Ending Cell 3------------------------------------------
INFO:papermill:Executing Cell 4---------------------------------------
INFO:papermill:Ending Cell 4------------------------------------------
INFO:papermill:Executing Cell 5---------------------------------------
INFO:papermill:Ending Cell 5------------------------------------------
INFO:papermill:Executing Cell 6---------------------------------------
INFO:papermill:DEBUG:__main__:[script log] This is a message for debugging

INFO:papermill:INFO:__main__:[script log] Finished step 1...

INFO:papermill:INFO:__main__:[script log] Finished step 2...

INFO:papermill:INFO:__main__:[script log] Finished step 3...

INFO:papermill:INFO:__main__:[script log] Done.

INFO:papermill:Ending Cell 6------------------------------------------

Executing: 100%|██████████| 6/6 [00:04<00:00,  1.44cell/s]
INFO:ploomber.tasks.abc.NotebookRunner:Done. Operation took 4.5 seconds
Building task 'function':  50%|█████     | 1/2 [00:04<00:04,  4.47s/it]INFO:ploomber.tasks.abc.PythonCallable:Starting execution: PythonCallable: function -> File('output/fn-log.txt')
INFO:tasks:[function log] Finished step 1...
INFO:tasks:[function log] Finished step 2...
INFO:tasks:[function log] Finished step 3...
INFO:tasks:[function log] Done.
INFO:ploomber.tasks.abc.PythonCallable:Done. Operation took 3.0 seconds
Building task 'function': 100%|██████████| 2/2 [00:07<00:00,  3.75s/it]
INFO:ploomber.dag.dag: DAG report:
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.47259       59.8168
function  True          3.00456       40.1832

To get the code for the previous example, click here.

Implementation details

To keep the tutorial short, we overlooked some technical details. However, if you want to customize logging, they are essential to know.

Function tasks and sub-processes

By default, Ploomber runs function tasks in a child process. However, beginning on version 3.8, Python 3.8 switched to use spawn instead of fork on macOS, this implies that child processes do not inherit the logging configuration of their parents. That’s why you must configure a logger inside the function’s body:

import logging


def some_task(product):
    # the following line is required on Python>=3.8 if using macOS
    logging.basicConfig(level=logging.INFO)

    logger = logging.getLogger(__name__)

    # to log a message, call logger.info
    logger.info('Some message')

Scripts and notebooks

Unlike function tasks, which can run in the same process as Ploomber or in a child process, scripts and notebooks execute independently. Hence, any logging configuration made in the main process is lost, and We have to configure a separate logger at the top of the script/notebook.

Parallel execution

Logging is currently unavailable when using the Parallel executor.