File clients¶
Note
This is a guide on file clients
. For API docs
see Clients.
File clients are used for uploading File products to the cloud. Currently two clients are supported for Amazon S3 and Google Cloud respectively.
During the upload process, an absolute local file path of /path/to/project/out/data.csv
gets translated to the remote path path/to/parent/out/data.csv
. Here, parent
is the parent folder in the bucket to store the files.
Pre-requisites¶
Create a bucket in the required cloud platform, or use an existing one.
Configure the environment with the credentials or create a credentials.json file if environment is not configured.
Create a clients file¶
Next, create a clients.py file that contains the below function for S3 client:
from ploomber.clients import S3Client
def get_s3():
return S3Client(bucket_name='bucket-name',
parent='parent-folder-name',
# pass the json_credentials_path if env not configured with credentials
json_credentials_path='credentials.json')
Sample file for Google Cloud Storage client:
from ploomber.clients import GCloudStorageClient
def get_gcloud():
return GCloudStorageClient(bucket_name='bucket-name',
parent='parent-folder-name'
# pass the json_credentials_path if env not configured with credentials
json_credentials_path='credentials.json')
Configure the pipeline¶
Now, configure the pipeline.yaml file to add the clients key to specify the S3 or GCloud function:
# some content
......
# add this
clients:
File: project-name.clients.get_client
# content continues...
Working with external datasets¶
The file clients only upload products generated by the pipeline. If you want to work with an external dataset, you should download such a dataset in the pipeline task that uses it as input. If you need help contact us on Slack.
Refer: Google cloud template
Note
File clients can be used when running pipelines locally as well as when exporting pipelines to external servers (e.g., AWS Batch).
ploomber build
commands downloads the existing cloud artifacts for a pipeline run previously.The
LocalStorageClient
is mostly used for internal testing and can also be used to locally backup products.