Writing a Pandas DataFrame to Google Cloud Storage in Python
Start your free 7-days trial now!
Prerequisites
To follow along with this guide, please make sure to have:
created a service account and downloaded the private key (JSON file) for authentication (please check out my detailed guide)
installed the Python client library:
pip install --upgrade google-cloud-storage
Writing Pandas DataFrame to Google Cloud Storage as a CSV file
Consider the following Pandas DataFrame:
Case when you already have a bucket
To write this Pandas DataFrame to Google Cloud Storage (GCS) as a CSV file, use the blob's upload_from_string(~)
method:
from google.cloud import storagepath_to_private_key = './gcs-project-354207-099ef6796af6.json'client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
# The bucket on GCS in which to write the CSV filebucket = client.bucket('test-bucket-skytowner')# The name assigned to the CSV file on GCSblob = bucket.blob('my_data.csv')
Note the following:
if the bucket with the specified name does not exist, then an error will be thrown
the DataFrame's
to_csv()
file converts the DataFrame into a string CSV:',A,B\n0,3,5\n1,4,6\n'the second argument of
upload_from_string(~)
is the content type of the file
After running this code, we can see that my_data.csv
has been written in our test-bucket-skytowner
bucket on the GCS web console:
Case when you do not have a bucket
The above solution only works when you have already created a bucket in which to place the file on GCS - specifying a bucket that does not exist will throw an error. Therefore, we must first create a bucket on GCS using the method create_bucket(~)
, which returns the created bucket:
from google.cloud import storagepath_to_private_key = './gcs-project-354207-099ef6796af6.json'client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
# The NEW bucket on GCS in which to write the CSV filebucket = client.create_bucket('test-v2-bucket-skytowner')# The name assigned to the CSV file on GCSblob = bucket.blob('my_data.csv')
Writing Pandas DataFrame to Google Cloud Storage as a feather file
The logic for writing a Pandas DataFrame to GCS as a feather file is very similar to the CSV case, except that we must first write the feather file locally, and then upload this file using the method upload_from_filename(~)
:
import pyarrow.feather as featherfeather.write_feather(df, './feather_df')
# The bucket in which to place the feather file on GCSbucket = storage.Bucket(client, 'example-bucket-skytowner')# The name to assign to the feather file on GCSblob = bucket.blob('my_data.feather')blob.upload_from_filename('./feather_df')
After running this code, we should see the my_data.feather
file appear on web GCS console: