Automated Gsuite Export

How to download documents like Google Docs and Sheets programmatically

Published: Friday, Jan 22, 2021 Last modified: Thursday, Nov 14, 2024

Update: Rclone is the best DX CLI to get retrieve content from G Suite/Workspaces

Automated authentication in order to use Google Drive API is non-trivial.

Now there are two categories of credentials for accessing APIs:

  1. Service account aka credentials.json (non-interactive) but can’t really impersonate you
  2. Oauth with ClientID et al. (interactive) but can impersonate you

The “API key” has very limited scope, only for checking quota and access, not actually useful for APIs, IIUC.

Service account

Service accounts have an issue whereby it needs domain-wide-delegation by your G Suite Administrators (no one knows who these people are in large companies) to impersonate yourself with your own @example.com company email.

Note: Although you can use service accounts in applications that run from a G Suite domain, service accounts are not members of your G Suite account and aren’t subject to domain policies set by G Suite administrators. For example, a policy set in the G Suite admin console to restrict the ability of G Suite end users to share documents outside of the domain would not apply to service accounts.

from https://developers.google.com/identity/protocols/oauth2#serviceaccount

The work around is to use your credential’s email:

$ grep email credentials.json
  "client_email": "$name@$project.iam.gserviceaccount.com",

And explicitly share that email address with documents you want to export with this account.

from googleapiclient.discovery import build
from google.oauth2 import service_account
SCOPES = ['https://www.googleapis.com/auth/drive']
SERVICE_ACCOUNT_FILE = 'credentials.json'

# We use the SERVICE_ACCOUNT_FILE we just downloaded and the SCOPES we defined to create a Credentials object.
credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE, scopes=SCOPES)

# Remember, you must have created credentials.json with domain-wide delegation!
credentials = credentials.with_subject('gdrive@dlsuite.iam.gserviceaccount.com')

# We build a drive_v3 service using the credentials we just created
service = build('drive', 'v3', credentials=credentials)

files = service.files()
service.files().get(fileId='141g8UkQfdMQSTfIn475gHj1ezZVV16f5ONDxpWrrvts').execute()
fconr = files.export(fileId='141g8UkQfdMQSTfIn475gHj1ezZVV16f5ONDxpWrrvts',
					 mimeType='application/vnd.openxmlformats-officedocument.wordprocessingml.document')

fcont = fconr.execute()

print('{}...'.format(fcont[:10]))

file = open("/tmp/sample.doc", "wb")
file.write(fcont)
file.close()

Via this notebook and this answer by Jayson Salazar

Oauth token interchange from ClientID

After setting the API https://www.googleapis.com/auth/drive scope, generating/maintaining the token from Authorization code seems non-trivial to me. The Oauth Playground (with its own ClientID) does this automatically for you. But to do this yourself from Python appears non-trivial to me.

When you use the “Try this API” aka “google-apis-explorer” Google application https://developers.google.com/drive/api/v3/reference/files/export, again the complex Oauth interchange refresh dance is already done for you.

[ "access_token", "expiry", "refresh_token", "token_type" ]

However if you want your own App from the CLI to do this, it appears non-trivial as it needs some explicit Oauth consent interchange https://accounts.google.com/o/oauth2/auth?access_type=offline&client_id=.... However once you do have the Bearer token aka access_token, it’s straightforward to work with unlike credentials.json’s [ "auth_provider_x509_cert_url", "auth_uri", "client_email", "client_id", "client_x509_cert_url", "private_key", "private_key_id", "project_id", "token_uri", "type" ].

mimeType=application/pdf
curl -H "Authorization: Bearer $token" -o doc.pdf \
https://www.googleapis.com/drive/v3/files/${id}/export?mimeType=$mimeType

Here I believe you impersonate yourself via the ClientID, without the scary Share outside of organization process needed for the aforementioned $name@$project.iam.gserviceaccount.com service accounts.