GCS101 Access Google Storage Buckets Programmatically
We need to obtain a key file for a GCP service account to let us authenticate and access GCP resources in the project from external code including Google Colab notebooks. In this post, I summarized the process to allow you to download files from Google Storage (GCS) programmatically. Let’s start with how to access GCS anonymously.
!['A Ladies Class Et The German Gymnasium' by Horace Harral (1872)](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c5/Horace_Harral._A_Ladies_Class_Et_The_German_Gymnasium.1872.jpg/473px-Horace_Harral._A_Ladies_Class_Et_The_German_Gymnasium.1872.jpg)
Access public Google storage bucket anonymously
Use this template to perform the operation:
def download_public_file(bucket_name, source_blob_name, destination_file_name):
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client.create_anonymous_client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print("Downloaded public blob {} from bucket {} to {}." \
.format(source_blob_name, bucket.name, destination_file_name)
)
For example we would like to download the training dataset and the test dataset of Kaggle competition Tabular Playground Series - Mar 2021 from public URLs:
bucket_name = 'gcs-station-168'
# training dataset (public URL: https://storage.googleapis.com/gcs-station-168/train.tfrecords)
source_blob_name = 'train.tfrecords'
destination_file_name = './train.tfrecords'
download_public_file(bucket_name, source_blob_name, destination_file_name)
# test dataset (public URL: https://storage.googleapis.com/gcs-station-168/test.tfrecords)
source_blob_name = 'test.tfrecords'
destination_file_name = './test.tfrecords'
download_public_file(bucket_name, source_blob_name, destination_file_name)
Reference: https://cloud.google.com/storage/docs/access-public-data#code-samples
Access private Google storage bucket with permission
Create a service account from Google Cloud Shell
To allow Google account B (whoever you prefer including the project owner) to access private GCS storage bucket strategic-howl-305522 under Google account A, for example, we need to do the following setup:
In the GCP console of Google account A, launch the Cloud Shell terminal.
To create a service account named ‘test-service-account’ (Note: only lowercase alphanumeric characters and hyphens are allowed) for the project strategic-howl-305522:
$ P='strategic-howl-305522' $ S='test-service-account' $ gcloud iam service-accounts create $S
Grant permissions to the service account:
$ gcloud projects add-iam-policy-binding "$P" --member "serviceAccount:$S@$P.iam.gserviceaccount.com" --role "roles/owner"
Generate the key file, in this case named test-colab.json:
$ K='test-colab.json' $ gcloud iam service-accounts keys create $K --iam-account "$S@$P.iam.gserviceaccount.com"
Download the key file:
$ cloudshell download $K
Upload the authentication token to Colab directory, in this case
/content/
.Then run the following code to list all the blobs under the project
from google.cloud import storage storage_client = storage.Client() buckets = storage_client.list_buckets() print('-- List of buckets in project \"' + storage_client.project + '\"') for b in buckets: print(b.name)
We can further list all the files insides a specific bucket, in this case bucket gcs-station-168:
bucket_name = 'gcs-station-168' bucket = storage_client.get_bucket(bucket_name) def list_files(bucketName): """List all files in GCP bucket.""" files = bucket.list_blobs(prefix=None) fileList = [file.name for file in files if '.' in file.name] return fileList list_files(bucket_name)
Reference: https://colab.research.google.com/drive/1LWhrqE2zLXqz30T0a0JqXnDPKweqd8ET#scrollTo=GBpnpPF4I8Wn