This project is a simple Python script that backs up all Google Drives in a Google Workspace domain. It uses domain-wide delegation to impersonate a super admin account and access all drives.
- Backups all Google Drives in a Google Workspace domain (personal and shared drives) to S3
- Converts Google Apps files (Docs, Sheets, Slides) to Microsoft Office format
- Saves metadata about the files to a JSON file (id, name, md5Checksum, path, permissions)
- Handles duplicate files (same name, path) by appending file ID to the name
- Links are converted to .txt files with path to the original file
- Whitelist & blacklist of drives
- Multi-process (per drive) and multi-threaded (per file) downloading
pigz
orlz4
compression of the exported drives
- Obtain list of all users and shared drives in the domain
- For each user and shared drive (as subprocess):
- Fetch all files metadata
- Download or export all files (as threads)
2.1. If
JIT_S3_UPLOAD
is enabled, upload files to S3 as soon as they are downloaded and delete them afterwards - Compress the downloaded files (if enabled)
- Upload the folder/archive to S3
- Delete the local files (if
AUTO_CLEANUP
is enabled)
The files.json
file contains metadata about all files in given drive. It is uploaded in the Upload the folder/archive to S3
stage.
-
Create a new project in the Google Cloud Console and enable the Google Drive API, as well as Admin SDK API
-
(Optional) If your domain has a lot of users and files, you may need to request a quota increase for the Google Drive API. You can do this by going to the quotas page.
-
Create a new service account in the Google Cloud Console and download the JSON key file. This key file will be used to authenticate the script. Note the Client ID of the service account (Unique ID in Console).
-
Go to the Google Admin Console and navigate to
Security
->Access and data control
->API Controls
-
Click on
Manage Domain Wide Delegation
and click onAdd new
-
Enter the Client ID of the service account you created earlier and the following scopes:
https://www.googleapis.com/auth/admin.directory.user.readonly
https://www.googleapis.com/auth/drive.readonly
-
Obtain the
Customer ID
of your Google Workspace domain. This can be found in the Google Admin Console underAccount
->Account Settings
Name | Required | Purpose | Type | Default |
---|---|---|---|---|
DELEGATED_ADMIN_EMAIL |
Yes | E-mail address of the superadmin account | string | |
WORKSPACE_CUSTOMER_ID |
Yes | Customer ID from Google Admin Console | string | |
SERVICE_ACCOUNT_FILE |
Perhaps | Path to service account .json key. If file won't exist, SERVICE_ACCOUNT_JSON will be used to create the file |
string | service-account-key.json |
SERVICE_ACCOUNT_JSON |
Perhaps | Service account JSON file encoded in BASE64 | string | |
S3_BUCKET_NAME |
Yes | Name of the S3 bucket to upload the backup to | string | |
S3_ROLE_BASED_ACCESS |
Perhaps | Use role-based access to S3 bucket. If enabled, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are not required |
bool | false |
AWS_ACCESS_KEY_ID |
Perhaps | AWS Access Key ID | string | |
AWS_SECRET_ACCESS_KEY |
Perhaps | AWS Secret Access Key | string | |
MAX_DOWNLOAD_THREADS |
No | How many threads (per single drive) are used to download files | int | 20 |
MAX_DRIVE_PROCESSES |
No | Each drive gets it's own process. This specifies how many drives can be handled concurrently. | int | 4 |
COMPRESS_DRIVES |
No | Compress the exported drives to a .zip file | bool | false |
COMPRESSION_PROCESSES |
No | How many processes are used to compress the drives (if supported by algorithm) | int | cpu_count() |
DRIVE_WHITELIST |
No | Comma-separated list of drive IDs to backup (e.g. [email protected],0AE1OlXvu8lCKUk9PVA ) |
string | |
DRIVE_BLACKLSIT |
No | Comma-separated list of drive IDs to exclude from backup (same as DRIVE_WHITELIST ). |
string | |
AUTO_CLEANUP |
No | Automatically delete the files after the backup is complete | bool | true |
INCLUDE_SHARED_WITH_ME |
No | Include 'shared with me' files. Applies to user drives only. | bool | true |
JIT_S3_UPLOAD |
No | Upload files to S3 as soon as they are downloaded. Useful when local disk space is limited. COMPRESS_DRIVES must be set to False |
bool | false |
- Drive compression
- Configurable algorithm for file compression
- Configurable metadata fields
- Configurable links behaviour
- AWS S3 role-based access
- Drive whitelist
- Drive blacklist
- Files without
md5Checksum
are are non-binary files (e.g. Folders, Google Docs, Sheets, Slides, Forms, etc.) - If a file (or a folder) is shared with multiple users and
INCLUDE_SHARED_WITH_ME
is enabled, it will be downloaded multiple times (once per user) - Requires
MAX_DRIVE_PROCESSES
* largest Google Drive size in GB of free disk space COMPRESS_DRIVES
doubles the disk space requirements- If short on disk space, enable
JIT_S3_UPLOAD
to upload files to S3 as soon as they are downloaded. At mostMAX_DOWNLOAD_THREADS
*MAX_DRIVE_PROCESSES
files will be stored locally at any given time.