This is a guide on
file clients. For API docs
File clients are used for uploading File products to the cloud. Currently two clients are supported for Amazon S3 and Google Cloud respectively.
During the upload process, an absolute local file path of
/path/to/project/out/data.csv gets translated to the remote path
parent is the parent folder in the bucket to store the files.
Create a bucket in the required cloud platform, or use an existing one.
Configure the environment with the credentials or create a credentials.json file if environment is not configured.
Create a clients file¶
Next, create a clients.py file that contains the below function for S3 client:
from ploomber.clients import S3Client def get_s3(): return S3Client(bucket_name='bucket-name', parent='parent-folder-name', # pass the json_credentials_path if env not configured with credentials json_credentials_path='credentials.json')
Sample file for Google Cloud Storage client:
from ploomber.clients import GCloudStorageClient def get_gcloud(): return GCloudStorageClient(bucket_name='bucket-name', parent='parent-folder-name' # pass the json_credentials_path if env not configured with credentials json_credentials_path='credentials.json')
Configure the pipeline¶
Now, configure the pipeline.yaml file to add the clients key to specify the S3 or GCloud function:
# some content ...... # add this clients: File: project-name.clients.get_client # content continues...
Working with external datasets¶
The file clients only upload products generated by the pipeline. If you want to work with an external dataset, you should download such a dataset in the pipeline task that uses it as input. If you need help contact us on Slack.
Refer: Google cloud template
File clients can be used when running pipelines locally as well as when exporting pipelines to external servers (e.g., AWS Batch).
ploomber buildcommands downloads the existing cloud artifacts for a pipeline run previously.
LocalStorageClientis mostly used for internal testing and can also be used to locally backup products.