File clients

Note

This is a guide on file clients. For API docs see Clients.

File clients are used for uploading File products to the cloud. Currently two clients are supported for Amazon S3 and Google Cloud respectively.

During the upload process, an absolute local file path of /path/to/project/out/data.csv gets translated to the remote path path/to/parent/out/data.csv. Here, parent is the parent folder in the bucket to store the files.

Pre-requisites

  • Create a bucket in the required cloud platform, or use an existing one.

  • Configure the environment with the credentials or create a credentials.json file if environment is not configured.

Create a clients file

Next, create a clients.py file that contains the below function for S3 client:

from ploomber.clients import S3Client

def get_s3():
    return S3Client(bucket_name='bucket-name',
                    parent='parent-folder-name',
                    # pass the json_credentials_path if env not configured with credentials
                    json_credentials_path='credentials.json')

Sample file for Google Cloud Storage client:

from ploomber.clients import GCloudStorageClient

def get_gcloud():
    return GCloudStorageClient(bucket_name='bucket-name',
                               parent='parent-folder-name'
                               # pass the json_credentials_path if env not configured with credentials
                               json_credentials_path='credentials.json')

Configure the pipeline

Now, configure the pipeline.yaml file to add the clients key to specify the S3 or GCloud function:

# some content
......

# add this
clients:
 File: project-name.clients.get_client

 # content continues...

Working with external datasets

The file clients only upload products generated by the pipeline. If you want to work with an external dataset, you should download such a dataset in the pipeline task that uses it as input. If you need help contact us on Slack.

Refer: Google cloud template

Note

  • File clients can be used when running pipelines locally as well as when exporting pipelines to external servers (e.g., AWS Batch).

  • ploomber build commands downloads the existing cloud artifacts for a pipeline run previously.

  • The LocalStorageClient is mostly used for internal testing and can also be used to locally backup products.