Cloning a bucket in Google Cloud Storage using Python
Start your free 7-days trial now!
Although cloning a bucket in Google Cloud Storage is not supported directly, we can still emulate this behaviour by creating a new bucket with the same properties as the source bucket, and then populating the new bucket with the files in the source bucket.
Prerequisites
To follow along with this guide, please make sure to have:
created a service account and downloaded the private key (JSON file) for authentication (please check out my detailed guide)
installed the Python client library for Google Cloud Storage:
pip install --upgrade google-cloud-storage
Creating a bucket with the same properties as source bucket
The first step is to create a bucket with the same properties as the source bucket in Google Cloud Storage (GCS):
from google.cloud import storagepath_to_private_key = './gcs-project-354207-099ef6796af6.json'client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
bucket_src = client.get_bucket('example-bucket-skytowner')print(f'Source bucket location: {bucket_src.location}')print(f'Source bucket storage_class: {bucket_src.storage_class}')
new_bucket = storage.Bucket(client, 'example-bucket-skytowner-clone')new_bucket.storage_class = bucket_src.storage_classnew_bucket = client.create_bucket(new_bucket, location=bucket_src.location)
Source bucket location: USSource bucket storage_class: STANDARD
Note the following:
we have first authenticated ourselves using the private key of our service account.
we fetched the bucket meta-information from GCS using the
get_bucket(~)
method.we can access the location and storage class using the
location
andstorage_class
properties of the fetched bucket.we then create a
Bucket
object using the constructorstorage.Bucket(~)
- at this point, we have not yet created an actual bucket on GCS. We then set the same storage class as well as the location for this new bucket.we finally send a request to GCS to create the bucket using the
create_bucket(~)
method.
Now once we head over to the GCS web console, we should see our cloned bucket:
Notice how the location and storage class are the same.
Populating the cloned bucket with files of the source bucket
Now that we have cloned the bucket, we need to insert all the files that are in the source bucket into this new cloned bucket. We can do so using the following script:
blobs_src = client.list_blobs('example-bucket-skytowner')for blob_src in blobs_src: print(f'Copied [{blob_src.name}] into new bucket') blob_new = bucket_src.copy_blob(blob_src, new_bucket, new_name=blob_src.name)
Copied [cat.png] into new bucket
Here:
we first fetch the meta-information of all blobs (files) in the source bucket using
list_blob(~)
we then iterate over this list of blobs, and then use the
copy_blob(~)
method to insert each blob into the cloned bucket.