ploomber.io.unserializer

ploomber.io.unserializer(extension_mapping=None, *, fallback=False, defaults=None, unpack=False)

Decorator for unserializing functions

Parameters
  • extension_mapping (dict, default=None) – An extension -> function mapping. Calling the decorated function with a File of a given extension will use the one in the mapping if it exists, e.g., {‘.csv’: from_csv, ‘.json’: from_json}.

  • fallback (bool or str, default=False) – Determines what method to use if extension_mapping does not match the product to unserialize. Valid values are True (uses the pickle module), ‘joblib’, and ‘cloudpickle’. If you use any of the last two, the corresponding moduel must be installed. If this is enabled, the body of the decorated function is never executed. To turn it off pass False.

  • defaults (list, default=None) – Built-in unserializing functions to use. Must be a list with any combinations of values: ‘.txt’, ‘.json’, ‘.csv’, ‘.parquet’. Unserializing .txt, returns a string, for .json returns any JSON-unserializable object (e.g., a list or a dict), .csv and .parquet return a pandas.DataFrame. If using .parquet, a parquet library must be installed (e.g., pyarrow). If extension_mapping and defaults contain overlapping keys, an error is raises

  • unpack (bool, default=False) – If True and the task product points to a directory, it will call the unserializer one time per file in the directory. The unserialized object will be a dictionary where keys are the filenames and values are the unserialized objects. Note that this isn’t recursive, it only looks at files that are immediate children of the product directory.