ploomber.tasks.Task¶
- class ploomber.tasks.Task(product, dag, name=None, params=None)¶
Abstract class for all Tasks
- Parameters:
source (str or pathlib.Path) – Source code for the task, for tasks that do not take source code as input (such as PostgresCopyFrom), this can be another thing. The source can be a template and can make references to any parameter in “params”, “upstream” parameters or its own “product”, not all Tasks have templated source (templating code is mostly used by Tasks that take SQL source code as input)
product (Product) – The product that this task will create upon completion
dag (DAG) – The DAG holding this task
name (str) – A name for this task, if None a default will be assigned
params (dict) – Extra parameters passed to the task on rendering (if templated source) or during execution (if not templated source)
- params¶
A read-only dictionary-like object with params passed, after running ‘product’ and ‘upstream’ are added, if any
- Type:
Params
- on_render¶
Function to execute after rendering. The function can request any of the following arguments: task, client, product, and params.
- Type:
callable
- on_finish¶
Function to execute upon execution. Can request the same arguments as the on_render hook.
- Type:
callable
- on_failure¶
Function to execute upon failure. Can request the same arguments as the on_render hook.
- Type:
callable
Notes
All subclasses must implement the same constuctor to keep the API consistent, optional parameters after “params” are ok
Methods
build
([force, catch_exceptions])Build a single task
debug
()Debug task, only implemented in certain tasks
load
()Load task as pandas.DataFrame.
render
([force, outdated_by_code, remote])Renders code and product, all upstream tasks must have been rendered first, for that reason, this method will usually not be called directly but via DAG.render(), which renders in the right order.
run
()This is the only required method Task subclasses must implement
set_upstream
(other[, group_name])status
([return_code_diff, sections])Prints the current task status
- build(force=False, catch_exceptions=True)¶
Build a single task
Although Tasks are primarily designed to execute via DAG.build(), it is possible to do so in isolation. However, this only works if the task does not have any unrendered upstream dependencies, if that’s the case, you should call DAG.render() before calling Task.build()
- Returns:
A dictionary with keys ‘run’ and ‘elapsed’
- Return type:
dict
- Raises:
TaskBuildError – If the error failed to build because it has upstream dependencies, the build itself failed or build succeded but on_finish hook failed
DAGBuildEarlyStop – If any task or on_finish hook raises a DAGBuildEarlyStop error
- debug()¶
Debug task, only implemented in certain tasks
- load()¶
Load task as pandas.DataFrame. Only implemented in certain tasks
- render(force=False, outdated_by_code=True, remote=False)¶
Renders code and product, all upstream tasks must have been rendered first, for that reason, this method will usually not be called directly but via DAG.render(), which renders in the right order.
Render fully determines whether a task should run or not.
- Parameters:
force (bool, default=False) – If True, mark status as WaitingExecution/WaitingUpstream even if the task is up-to-date (if there are any File(s) with clients, this also ignores the status of the remote copy), otherwise, the normal process follows and only up-to-date tasks are marked as Skipped.
outdated_by_code (bool, default=True) – Factors to determine if Task.product is marked outdated when source code changes. Otherwise just the upstream timestamps are used.
remote (bool, default=False) – Use remote metadata to determine status
Notes
This method tries to avoid calls to check for product status whenever possible, since checking product’s metadata can be a slow operation (e.g. if metadata is stored in a remote database)
When passing force=True, product’s status checking is skipped altogether, this can be useful when we only want to quickly get a rendered DAG object to interact with it
- abstract run()¶
This is the only required method Task subclasses must implement
- set_upstream(other, group_name=None)¶
- status(return_code_diff=False, sections=None)¶
Prints the current task status
- Parameters:
sections (list, optional) – Sections to include. Defaults to “name”, “last_run”, “oudated”, “product”, “doc”, “location”
Attributes
PRODUCT_CLASSES_ALLOWED
client
exec_status
name
A str that represents the name of the task, you can access tasks in a dag using dag['some_name']
Callable to be executed if task fails (passes Task as first parameter and the exception as second parameter)
Callable to be executed after this task is built successfully (passes Task as first parameter)
dict that holds the parameter that will be passed to the task upon execution.
product
The product this task will create upon execution
source
Source is used by the task to compute its output, for most cases this is source code, for example PythonCallable takes a function as source and SQLScript takes a string with SQL code as source.
upstream
A mapping for upstream dependencies {task name} -> [task object]