Downloading files from Google Cloud Storage using Python
Start your free 7-days trial now!
Prerequisites
To follow along with this guide, please make sure to have:
created a GCP (Google Cloud Platform) project
created a service account and downloaded the private key (JSON file) for authentication
installed the Python client library for Google Cloud Storage (GCS):
pip install --upgrade google-cloud-storage
If you haven't, then please check out my detailed guide first!
Downloading a single file from Google Cloud Storage using Python
Suppose we have a text file called uploaded_sample.txt
that lives in the bucket example-bucket-skytowner
on Google Cloud Storage (GCS).
To download this file from GCS, use the download_to_filename(~)
method:
from google.cloud import storage
path_to_private_key = './gcs-project-354207-099ef6796af6.json'client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)bucket = storage.Bucket(client, 'example-bucket-skytowner')blob = bucket.blob('uploaded_sample.txt')blob.download_to_filename('.downloaded_file')
Note the following:
the credential JSON file for the service account resides in the same directory as this Python script
example-bucket-skytowner
is the name of the bucket in which the file residesuploaded_sample.txt
is the name of the file on GCS that you wish to downloadthe
download_to_filename(~)
method takes as argument the path of where the file should be downloaded to.
After running this code, we should see a file called downloaded_file
in the same directory as this Python script.
Referencing blob and bucket name
We can reference the names of our file and bucket using the name
property:
The name
property is oftentimes quite handy when organizing where the files should be locally downloaded to. We will see examples of this later in this guide.
Downloading to a directory using relative path
The download_to_filename(~)
will throw an error if we supply a local path that does not exist. For instance, suppose we wanted to download a file in a local downloads
directory, which currently does not exist:
blob.download_to_filename(f'./downloads/{blob.name}')
FileNotFoundError: [Errno 2] No such file or directory: './downloads/uploaded_sample.txt'
The way to get around this is to create the folders using the method mkdir(~)
in the Path
library before we call the download_to_filename(~)
method:
from pathlib import Pathpath_folder = f'./downloads/{bucket.name}'# Create this folder locally if it does not exist# parents=True will create intermediate directories if they do not existPath(path_folder).mkdir(parents=True, exist_ok=True)blob = bucket.blob('uploaded_sample.txt')blob.download_to_filename(f'{path_folder}/{blob.name}')
When running this code, the directory downloads/example-bucket-skytowner
will be created if they do not exist yet, and the file will be downloaded in this directory. The final local path of the downloaded file would therefore be:
./downloads/example-bucket-skytowner/uploaded_sample.txt
Handling error in case of file not found
Trying to download files that do not exist in GCS will throw a 404 NotFound
error:
blob = bucket.blob('.some_non_existing_file')blob.download_to_filename('./downloaded_file')
NotFound: 404 GET https://storage.googleapis.com/download/storage/v1/b/example-bucket-skytowner/o/.some_non_existing_file?alt=media:No such object: example-bucket-skytowner/.some_non_existing_file:('Request failed with status code', 404 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
To account for this case, we can wrap our methods in a try-except
clause:
from google.cloud.exceptions import NotFound
try: blob = bucket.blob('.some_non_existing_file') blob.download_to_filename('./downloaded_file')except NotFound: # Handle this case
🚨 .some_non_existing_file does not exist - do something
Note the following:
we had to import the
NotFound
error fromgoogle.cloud.exceptions
.
Downloading multiple files from Google Cloud Storage
Currently, GCS only allows downloading files one at a time. Therefore, we must iteratively call the download_to_filename(~)
method to download multiple files from GCS.
The following code block extends the case of downloading a single file:
from google.cloud import storage
path_to_private_key = './gcs-project-354207-099ef6796af6.json'client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)bucket = storage.Bucket(client, 'example-bucket-skytowner')
list_files_to_download = ['uploaded_sample.txt', 'cat.png']for file_to_download in list_files_to_download: blob = bucket.blob(file_to_download) blob.download_to_filename(f'./{blob.name}')
Once running this code, we should see the files uploaded_sample.txt
and cat.png
downloaded in the same directory as this Python file.
Downloading a folder from Google Cloud Storage
Suppose we have the following two files under a folder called my_folder
on GCS:
📁 my_folder ├─ cat.png ├─ uploaded_sample.txt
To download all files inside the folder my_folder
:
path_to_private_key = './gcs-project-354207-099ef6796af6.json'client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)bucket = storage.Bucket(client, 'example-bucket-skytowner')
str_folder_name_on_gcs = 'my_folder/'
# Create the directory locallyPath(str_folder_name_on_gcs).mkdir(parents=True, exist_ok=True)
blobs = bucket.list_blobs(prefix=str_folder_name_on_gcs)for blob in blobs: if not blob.name.endswith('/'): # This blob is not a directory! print(f'Downloading file [{blob.name}]') blob.download_to_filename(f'./{blob.name}')
Downloading file [my_folder/cat.png]Downloading file [my_folder/uploaded_sample.txt]
After running this code, we should see a new my_folder
folder containing the two files in our current directory:
├─ script.py📁 my_folder ├─ cat.png ├─ uploaded_sample.txt
Now, let's explain how our code works:
the
list_blobs(~)
method takes in as argumentprefix
which allows us to fetch all blobs starting withprefix
.in our case, we are fetching blobs whose name begins with
'my_folder/'
. Unfortunately,my_folder/
which represents a directory in GCS is also fetched as a blob. Since we do not want to download directory blobs, we filter these blobs out by ignoring those that end with the'/'
character.even though the file name is
my_folder/cat.png
, the methoddownload_to_filename(~)
will place thecat.png
inside the foldermy_folder
. We must make sure that this folder exists by using the built-inPath(~)
library - otherwise aDirectoryNotFound
error will occur.
Downloading the content of files in memory
Instead of downloading an actual file to a local path, suppose we wanted to store the content of the file in a variable. For instance, let's read the content of a text file on GCS called uploaded_sample.txt
in memory using the download_as_string()
method:
from google.cloud import storage
path_to_private_key = './gcs-project-354207-099ef6796af6.json'client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)# The name of our bucketbucket = storage.Bucket(client, 'example-bucket-skytowner')# The name of the file on GCSblob = bucket.blob('uploaded_sample.txt')
byte_str_file_content = blob.download_as_string()str_file_content = byte_str_file_content.decode('utf-8')print(str_file_content)
This is some sample text.Hello World.
Note the following:
the
download_as_string(~)
method returns a byte stringwe use the
decode('utf-8')
method to convert the byte string into a standard stringthe content of our text file (
'uploaded_sample.txt'
) is printed in the output