To run this locally, install Ploomber and execute: ploomber examples -n guides/logging
Found an issue? Let us know.
Questions? Ask us on Slack.
Logging¶
Tutorial showing how to add logging to a pipeline.
Sample pipeline¶
The pipeline we’ll be using for this guide contains two tasks (a script and a function):
# Content of basic/pipeline.yaml
tasks:
- source: script.py
product: output/nb-log.html
papermill_params:
log_output: True
- source: tasks.function
product: output/fn-log.txt
Note that the script task contains:
papermill_params:
log_output: True
This extra configuration is required on each script/notebook task in your pipeline to enable logging. The code on each task isn’t important; they contain a for loop and log a message on each iteration. Let’s see it in action:
[1]:
%%sh
cd basic
ploomber build --log info --force
Loading pipeline...
name Ran? Elapsed (s) Percentage
-------- ------ ------------- ------------
script True 4.66689 60.8412
function True 3.00372 39.1588
WARNING:traitlets:Kernel Provisioning: The 'local-provisioner' is not found. This is likely due to the presence of multiple jupyter_client distributions and a previous distribution is being used as the source for entrypoints - which does not include 'local-provisioner'. That distribution should be removed such that only the version-appropriate distribution remains (version >= 7). Until then, a 'local-provisioner' entrypoint will be automatically constructed and used.
The candidate distribution locations are: ['/workspaces/projects/venv/lib/python3.8/site-packages/jupyter_client-7.3.2.dist-info']
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/Grammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp1j9p6bb9'
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/PatternGrammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp2rs4570f'
INFO:ploomber.dag.dag:Building DAG DAG("basic")
Building task 'script': 0%| | 0/2 [00:00<?, ?it/s]INFO:ploomber.tasks.abc.NotebookRunner:Starting execution: NotebookRunner: script -> File('output/nb-log.html')
INFO:papermill:Input Notebook: /tmp/tmp_odbqsk5.ipynb
INFO:papermill:Output Notebook: /workspaces/projects/guides/logging/basic/output/nb-log.ipynb
Executing: 0%| | 0/6 [00:00<?, ?cell/s]INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Executing Cell 1---------------------------------------
INFO:papermill:Ending Cell 1------------------------------------------
Executing: 17%|█▋ | 1/6 [00:01<00:05, 1.02s/cell]INFO:papermill:Executing Cell 2---------------------------------------
INFO:papermill:Ending Cell 2------------------------------------------
INFO:papermill:Executing Cell 3---------------------------------------
INFO:papermill:Ending Cell 3------------------------------------------
INFO:papermill:Executing Cell 4---------------------------------------
INFO:papermill:Ending Cell 4------------------------------------------
INFO:papermill:Executing Cell 5---------------------------------------
INFO:papermill:Ending Cell 5------------------------------------------
INFO:papermill:Executing Cell 6---------------------------------------
INFO:papermill:INFO:__main__:[script log] Finished step 1...
INFO:papermill:INFO:__main__:[script log] Finished step 2...
INFO:papermill:INFO:__main__:[script log] Finished step 3...
INFO:papermill:INFO:__main__:[script log] Done.
INFO:papermill:Ending Cell 6------------------------------------------
Executing: 100%|██████████| 6/6 [00:04<00:00, 1.40cell/s]
INFO:ploomber.tasks.abc.NotebookRunner:Done. Operation took 4.7 seconds
Building task 'function': 50%|█████ | 1/2 [00:04<00:04, 4.67s/it]INFO:ploomber.tasks.abc.PythonCallable:Starting execution: PythonCallable: function -> File('output/fn-log.txt')
INFO:tasks:[function log] Finished step 1...
INFO:tasks:[function log] Finished step 2...
INFO:tasks:[function log] Finished step 3...
INFO:tasks:[function log] Done.
INFO:ploomber.tasks.abc.PythonCallable:Done. Operation took 3.0 seconds
Building task 'function': 100%|██████████| 2/2 [00:07<00:00, 3.85s/it]
INFO:ploomber.dag.dag: DAG report:
name Ran? Elapsed (s) Percentage
-------- ------ ------------- ------------
script True 4.66689 60.8412
function True 3.00372 39.1588
We can see that the logging statements appear in the console. If you want to take a look at the code, click here.
Why not print?¶
Note that the snippets above use the logging
module instead of print
. Although print
is a quick and easy way to display messages in the console, the logging
module is more flexible. Hence, it is the recommended option.
Logging to a file¶
It’s common to send all your log records to a file. You can do so with the --log-file/-F
option:
ploomber build --log info --log-file my.log
Logging to a file from Python¶
Alternatively, you can configure logging from Python, which gives you more flexibility:
[2]:
# you may store the contents of this cell in a .py file and then call it from the command line
# e.g., python run_pipeline.py
import logging
from pathlib import Path
import sys
from ploomber.executors import Serial
from ploomber.spec import DAGSpec
if Path('my.log').exists():
Path('my.log').unlink()
logging.basicConfig(filename='my.log', format='%(levelname)s:%(message)s', level=logging.INFO)
# make sure we can import basic/tasks.py since basic/pipeline.yaml uses it
sys.path.append('basic')
dag = DAGSpec('basic/pipeline.yaml').to_dag()
dag.build(force=True)
[2]:
name | Ran? | Elapsed (s) | Percentage |
---|---|---|---|
script | True | 4.76759 | 61.3428 |
function | True | 3.00445 | 38.6572 |
Let’s look at the file contents:
[3]:
print(Path('my.log').read_text())
INFO:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/Grammar.txt
INFO:Writing grammar tables to /home/codespace/.cache/black/22.3.0/Grammar3.8.13.final.0.pickle
INFO:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmpeca0cxtc'
INFO:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/PatternGrammar.txt
INFO:Writing grammar tables to /home/codespace/.cache/black/22.3.0/PatternGrammar3.8.13.final.0.pickle
INFO:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp3djgccxy'
INFO:Building DAG DAG("basic")
INFO:Starting execution: NotebookRunner: script -> File('basic/output/nb-log.html')
INFO:Input Notebook: /tmp/tmp1y05qp_7.ipynb
INFO:Output Notebook: /workspaces/projects/guides/logging/basic/output/nb-log.ipynb
INFO:Executing notebook with kernel: python3
INFO:Executing Cell 1---------------------------------------
INFO:Ending Cell 1------------------------------------------
INFO:Executing Cell 2---------------------------------------
INFO:Ending Cell 2------------------------------------------
INFO:Executing Cell 3---------------------------------------
INFO:Ending Cell 3------------------------------------------
INFO:Executing Cell 4---------------------------------------
INFO:Ending Cell 4------------------------------------------
INFO:Executing Cell 5---------------------------------------
INFO:Ending Cell 5------------------------------------------
INFO:Executing Cell 6---------------------------------------
INFO:INFO:__main__:[script log] Finished step 1...
INFO:INFO:__main__:[script log] Finished step 2...
INFO:INFO:__main__:[script log] Finished step 3...
INFO:INFO:__main__:[script log] Done.
INFO:Ending Cell 6------------------------------------------
INFO:Done. Operation took 4.8 seconds
INFO:Starting execution: PythonCallable: function -> File('basic/output/fn-log.txt')
INFO:[function log] Finished step 1...
INFO:[function log] Finished step 2...
INFO:[function log] Finished step 3...
INFO:[function log] Done.
INFO:Done. Operation took 3.0 seconds
INFO: DAG report:
name Ran? Elapsed (s) Percentage
-------- ------ ------------- ------------
script True 4.76759 61.3428
function True 3.00445 38.6572
Controlling logging level¶
The Python’s logging module allows you to filter messages depending on their priority. For example, when running your pipeline, you may only want to display regular messages, but you may allow regular and debugging messages for more granularity when debugging. Since Ploomber runs tasks differently depending on their type (i.e., functions vs. scripts/notebooks), controlling the logging level requires a bit of extra work. Let’s use the
same pipeline in the parametrized
directory:
[4]:
%%sh
cd parametrized
ploomber build --log info --env--logging_level info --force
Loading pipeline...
name Ran? Elapsed (s) Percentage
-------- ------ ------------- ------------
script True 4.49249 59.9239
function True 3.0045 40.0761
WARNING:traitlets:Kernel Provisioning: The 'local-provisioner' is not found. This is likely due to the presence of multiple jupyter_client distributions and a previous distribution is being used as the source for entrypoints - which does not include 'local-provisioner'. That distribution should be removed such that only the version-appropriate distribution remains (version >= 7). Until then, a 'local-provisioner' entrypoint will be automatically constructed and used.
The candidate distribution locations are: ['/workspaces/projects/venv/lib/python3.8/site-packages/jupyter_client-7.3.2.dist-info']
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/Grammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp8s6jihtc'
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/PatternGrammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp2yj6e00b'
INFO:ploomber.dag.dag:Building DAG DAG("parametrized")
Building task 'script': 0%| | 0/2 [00:00<?, ?it/s]INFO:ploomber.tasks.abc.NotebookRunner:Starting execution: NotebookRunner: script -> File('output/nb-log.html')
INFO:papermill:Input Notebook: /tmp/tmprwq9ox1n.ipynb
INFO:papermill:Output Notebook: /workspaces/projects/guides/logging/parametrized/output/nb-log.ipynb
Executing: 0%| | 0/6 [00:00<?, ?cell/s]INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Executing Cell 1---------------------------------------
INFO:papermill:Ending Cell 1------------------------------------------
Executing: 17%|█▋ | 1/6 [00:00<00:04, 1.12cell/s]INFO:papermill:Executing Cell 2---------------------------------------
INFO:papermill:Ending Cell 2------------------------------------------
INFO:papermill:Executing Cell 3---------------------------------------
INFO:papermill:Ending Cell 3------------------------------------------
INFO:papermill:Executing Cell 4---------------------------------------
INFO:papermill:Ending Cell 4------------------------------------------
INFO:papermill:Executing Cell 5---------------------------------------
INFO:papermill:Ending Cell 5------------------------------------------
INFO:papermill:Executing Cell 6---------------------------------------
INFO:papermill:INFO:__main__:[script log] Finished step 1...
INFO:papermill:INFO:__main__:[script log] Finished step 2...
INFO:papermill:INFO:__main__:[script log] Finished step 3...
INFO:papermill:INFO:__main__:[script log] Done.
INFO:papermill:Ending Cell 6------------------------------------------
Executing: 100%|██████████| 6/6 [00:04<00:00, 1.44cell/s]
INFO:ploomber.tasks.abc.NotebookRunner:Done. Operation took 4.5 seconds
Building task 'function': 50%|█████ | 1/2 [00:04<00:04, 4.49s/it]INFO:ploomber.tasks.abc.PythonCallable:Starting execution: PythonCallable: function -> File('output/fn-log.txt')
INFO:tasks:[function log] Finished step 1...
INFO:tasks:[function log] Finished step 2...
INFO:tasks:[function log] Finished step 3...
INFO:tasks:[function log] Done.
INFO:ploomber.tasks.abc.PythonCallable:Done. Operation took 3.0 seconds
Building task 'function': 100%|██████████| 2/2 [00:07<00:00, 3.76s/it]
INFO:ploomber.dag.dag: DAG report:
name Ran? Elapsed (s) Percentage
-------- ------ ------------- ------------
script True 4.49249 59.9239
function True 3.0045 40.0761
Let’s now run the pipeline but switch the logging level to debug, this will print the records we saw above, plus the ones with debug
level:
[5]:
%%sh
cd parametrized
ploomber build --log debug --env--logging_level debug --force
Loading pipeline...
name Ran? Elapsed (s) Percentage
-------- ------ ------------- ------------
script True 4.47259 59.8168
function True 3.00456 40.1832
DEBUG:ploomber.spec.dagspec:DAGSpec enviroment:
EnvDict({'cwd': '/workspaces/.../parametrized', 'git': 'master', 'git_hash': '1738307-dirty', 'here': '/workspaces/.../parametrized', ...})
DEBUG:ploomber.spec.dagspec:Expanded DAGSpec:
{ 'tasks': [ { 'papermill_params': {'log_output': True},
'params': {'logging_level': '{{logging_level}}'},
'product': 'output/nb-log.html',
'source': 'script.py'},
{ 'params': {'logging_level': '{{logging_level}}'},
'product': 'output/fn-log.txt',
'source': 'tasks.function'}]}
WARNING:traitlets:Kernel Provisioning: The 'local-provisioner' is not found. This is likely due to the presence of multiple jupyter_client distributions and a previous distribution is being used as the source for entrypoints - which does not include 'local-provisioner'. That distribution should be removed such that only the version-appropriate distribution remains (version >= 7). Until then, a 'local-provisioner' entrypoint will be automatically constructed and used.
The candidate distribution locations are: ['/workspaces/projects/venv/lib/python3.8/site-packages/jupyter_client-7.3.2.dist-info']
DEBUG:ploomber.tasks.abc.NotebookRunner:Setting "script" status to TaskStatus.WaitingRender
DEBUG:ploomber.tasks.abc.PythonCallable:Setting "function" status to TaskStatus.WaitingRender
DEBUG:ploomber.spec.dagspec:Extracted upstream dependencies for task script: None
DEBUG:ploomber.spec.dagspec:Extracted upstream dependencies for task function: None
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/Grammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmp55q_ywf2'
INFO:blib2to3.pgen2.driver:Generating grammar tables from /workspaces/projects/venv/lib/python3.8/site-packages/blib2to3/PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /home/codespace/.cache/black/22.3.0/PatternGrammar3.8.13.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/home/codespace/.cache/black/22.3.0/tmpvxyhyfii'
INFO:ploomber.dag.dag:Building DAG DAG("parametrized")
Building task 'script': 0%| | 0/2 [00:00<?, ?it/s]INFO:ploomber.tasks.abc.NotebookRunner:Starting execution: NotebookRunner: script -> File('output/nb-log.html')
INFO:papermill:Input Notebook: /tmp/tmpue38o2oe.ipynb
INFO:papermill:Output Notebook: /workspaces/projects/guides/logging/parametrized/output/nb-log.ipynb
Executing: 0%| | 0/6 [00:00<?, ?cell/s]INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Executing Cell 1---------------------------------------
INFO:papermill:Ending Cell 1------------------------------------------
Executing: 17%|█▋ | 1/6 [00:00<00:04, 1.13cell/s]INFO:papermill:Executing Cell 2---------------------------------------
INFO:papermill:Ending Cell 2------------------------------------------
INFO:papermill:Executing Cell 3---------------------------------------
INFO:papermill:Ending Cell 3------------------------------------------
INFO:papermill:Executing Cell 4---------------------------------------
INFO:papermill:Ending Cell 4------------------------------------------
INFO:papermill:Executing Cell 5---------------------------------------
INFO:papermill:Ending Cell 5------------------------------------------
INFO:papermill:Executing Cell 6---------------------------------------
INFO:papermill:DEBUG:__main__:[script log] This is a message for debugging
INFO:papermill:INFO:__main__:[script log] Finished step 1...
INFO:papermill:INFO:__main__:[script log] Finished step 2...
INFO:papermill:INFO:__main__:[script log] Finished step 3...
INFO:papermill:INFO:__main__:[script log] Done.
INFO:papermill:Ending Cell 6------------------------------------------
Executing: 100%|██████████| 6/6 [00:04<00:00, 1.44cell/s]
INFO:ploomber.tasks.abc.NotebookRunner:Done. Operation took 4.5 seconds
Building task 'function': 50%|█████ | 1/2 [00:04<00:04, 4.47s/it]INFO:ploomber.tasks.abc.PythonCallable:Starting execution: PythonCallable: function -> File('output/fn-log.txt')
INFO:tasks:[function log] Finished step 1...
INFO:tasks:[function log] Finished step 2...
INFO:tasks:[function log] Finished step 3...
INFO:tasks:[function log] Done.
INFO:ploomber.tasks.abc.PythonCallable:Done. Operation took 3.0 seconds
Building task 'function': 100%|██████████| 2/2 [00:07<00:00, 3.75s/it]
INFO:ploomber.dag.dag: DAG report:
name Ran? Elapsed (s) Percentage
-------- ------ ------------- ------------
script True 4.47259 59.8168
function True 3.00456 40.1832
To get the code for the previous example, click here.
Implementation details¶
To keep the tutorial short, we overlooked some technical details. However, if you want to customize logging, they are essential to know.
Function tasks and sub-processes¶
By default, Ploomber runs function tasks in a child process. However, beginning on version 3.8, Python 3.8 switched to use spawn instead of fork on macOS, this implies that child processes do not inherit the logging configuration of their parents. That’s why you must configure a logger inside the function’s body:
import logging
def some_task(product):
# the following line is required on Python>=3.8 if using macOS
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# to log a message, call logger.info
logger.info('Some message')
Scripts and notebooks¶
Unlike function tasks, which can run in the same process as Ploomber or in a child process, scripts and notebooks execute independently. Hence, any logging configuration made in the main process is lost, and We have to configure a separate logger at the top of the script/notebook.
Parallel execution¶
Logging is currently unavailable when using the Parallel
executor.