A Python-based email migration tool designed to migrate emails from a Gmail inbox (or other IMAP sources) to another IMAP mailbox. The script is highly configurable and extensible, featuring folder mapping, attachment extraction (with external storage and link replacement), checkpointing for restartability, and detailed logging and statistics.
- Overview
- Features
- Prerequisites
- Installation
- Configuration
- .env File
- config.json
- Usage
- Simulation Mode
- Live Mode
- Folder Mapping & Label Translation
- Attachment Extraction
- Checkpointing and Resuming
- Logging and Statistics
- Error Handling and Reconnection
- Extensibility
- License
This migration script uses IMAP to fetch emails from a source (typically Gmail) and appends them into a destination mailbox. It preserves metadata such as flags, internal dates, and folder structure. Additionally, the script includes features for:
-
Folder mapping: Converting Gmail’s label-based folder structure to a traditional IMAP folder structure.
-
Attachment extraction: Optionally extracting attachments that match certain criteria (file type and size), storing them externally, and replacing the attachments in the migrated email with file links.
-
Checkpointing: Saving progress to resume migration if the process is interrupted.
-
Statistics collection: Recording various metrics (e.g., number of emails per sender or folder).
-
Robust error handling and reconnection: Automatically reconnecting if the IMAP connection is lost.
-
Detailed logging: Logging output to both console (INFO level) and a file (DEBUG level).
Folder Mapping:
-
Uses a mapping function to translate Gmail labels to destination folder names.
-
Prepares missing destination folders automatically based on the source structure.
Attachment Extraction:
-
Extracts attachments of specific file types and sizes.
-
Saves extracted attachments in a structured folder hierarchy (e.g., organized by month based on email sent date).
-
Replaces attachments in the migrated email with a file:// link.
Checkpointing & Resume:
-
Automatically saves progress (including counters and last processed email index) to a JSON file.
-
Resumes migration from the last checkpoint if the script is interrupted.
Statistics Collection (Expandable):
-
Uses helper functions (e.g.
add_statistic()
) to record information such as file extensions, source folders, and senders. -
Stores statistics in a JSON file for later analysis.
Robust Logging:
-
Logs are written to a file and INFO-level messages are also printed to the console.
-
Logs include details about folder mapping, email processing, reconnections, and errors.
Error Handling & Reconnection:
-
Automatically reconnects on connection failures using exponential backoff.
-
Handles CTRL+C gracefully, performing cleanup before exit.
Python 3.7+
-
The following Python modules (most are part of the standard library):
-
imaplib
-
email
-
json
-
os
-
time
-
base64
-
hashlib
-
argparse
-
shlex
-
logging
Third-party modules:
-
python-dotenv
(for loading the.env
file) -
tqdm
(for progress bars) -
python-dateutil
(for date parsing)
Install third-party modules via pip:
pip install python-dotenv tqdm python-dateutil
Alternative
pip install -r requirements.txt
Clone the Repository:
git clone https://github.com/bwaide/MailMigration.git
cd mailmigration
Set Up a Virtual Environment (Recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install Dependencies:
pip install -r requirements.txt
The migration script is configured via environment variables (loaded from a .env
file) and a JSON configuration file (config.json
).
Google doesn't allow login via IMAP with your user password for security reasons. You therefore have to create an Application password to be used just for the migration. It is strongly advised to delete the app password after the migration is done. You can create the Application password here: https://myaccount.google.com/apppasswords
Note: You have to have Two-Factor-Authentication (2FA) turned on to be able to create application passwords.
Create a .env
file (or rename the provided .env_template
to .env
) with the following variables:
SOURCE_IMAP_SERVER=imap.gmail.com
SOURCE_EMAIL[email protected]
SOURCE_PASSWORD=yourpassword
DEST_IMAP_SERVER=imap.destination.com
DEST_EMAIL[email protected]
DEST_PASSWORD=yourdestpassword
The configuration file contains global settings and module-specific settings. Pay attention to the root_folder
, as this name differs between different locales.
Example configuration:
{
"global": {
"folder_mapping": {
"Inbox": "INBOX",
"Gesendet": "Gesendete Objekte",
"Sent Mail": "Gesendete Objekte",
"Sent": "Gesendete Objekte",
"Alle Nachrichten": "Archive",
"All Mail": "Archive",
"Important": "INBOX",
"Markiert": "INBOX",
"Starred": "INBOX",
"Wichtig": "INBOX",
"Papierkorb": null,
"Trash": null,
"Spam": null,
"Entwürfe": null,
"Drafts": null,
"Draft": null
},
"archive_folder": "Archive",
"folder_prefix": "INBOX.",
"log_file": "migration.log",
"root_folder": "[Gmail]/All Mail",
"debug_delay": 0.0,
"labels_as_flagged": [
"Important",
"[Gmail]/Important",
"Starred",
"[Gmail]/Starred"
]
},
"extract_attachments": {
"enabled": true,
"attachment_whitelist": [".pdf", ".zip", ".docx", ".xlsx"],
"max_attachment_size": 10240,
"external_storage_path": "/Users/bwaide/Documents/Development/GoogleMigration/downloads",
"storage_url": "file:///Users/bwaide/Documents/Development/GoogleMigration/downloads"
}
}
Note: You can expand this configuration with additional modules or plugins as needed.
Run the script from the command line. It supports a simulation mode:
- Simulation Mode:
Runs the migration without actually transferring emails. Check for warnings and errors inmigration.log
afterwards. Use the output instatistics.json
to adjust the configuration, especially the folder mapping and settings for attachment extraction.
python migrate.py --simulate`
- Live Mode:
Runs the migration for real. You will be prompted to confirm before actual migration begins.
python migrate.py
Note: For larger mailboxes the migration can take hours. From my experience running this script on a laptop performs at roughly 2 - 4 mails/second. For mailboxes with 10,000 mails this means 1 hour.
The script converts Gmail’s label-based folder structure into a destination folder structure using a centralized mapping function.
-
Mapping Rules:
- If the input contains
[Gmail]/Inbox
(or cleans to "INBOX"), the destination is"INBOX"
. - If a label exists in the global
folder_mapping
(fromconfig.json
), that value is used. - Otherwise, the script chooses the most specific (deepest) label and prepends the global
folder_prefix
. - If no label is available, it falls back to
archive_folder
.
- If the input contains
-
Implementation:
The functionmap_labels_to_destination(gmail_labels)
is used throughout the script for both folder creation and email migration. This ensures consistency.
If enabled in the configuration, the script will:
- Examine each email for attachments.
- For attachments that meet criteria (file extension, size):
- Save them to an external location.
- Organize them into month-based subfolders (e.g.,
"2023_10"
). - Replace the attachment in the email with a link starting with
file://
pointing to the saved file.
- For attachments that don’t meet the criteria:
They are kept as part of the email.
-
Checkpoint File:
The script saves its progress (last processed email index, total size, and skipped count) to a file (default:migration_checkpoint.json
). -
Resumption:
On start, the script loads the checkpoint and resumes from the last processed email. -
Automatic Cleanup:
The checkpoint file is automatically deleted if the migration completes successfully.
-
Logging:
Logging is configured to write DEBUG-level messages to a file and INFO-level messages to the console.
Log messages include details about folder mapping, email processing, errors, and reconnections. -
Statistics (Expandable):
The script can be extended to collect statistics (for example, counts of emails per sender, file extensions, etc.) via helper functions likeadd_statistic(category, key, amount)
.
- The script handles
KeyboardInterrupt
(CTRL+C) gracefully and performs cleanup. - If an IMAP connection is lost (e.g., due to a system error), the script automatically attempts to reconnect using exponential backoff.
- In case of an unexpected crash, the checkpoint file allows the migration to resume without duplicating emails.
The configuration system is modular, with a nested JSON structure supporting multiple modules (global settings and module-specific settings such as for attachment extraction).
You can extend the system by adding new sections to the JSON config and writing corresponding helper functions.
MIT License See https://github.com/bwaide/MailMigration/blob/main/LICENSE for details.
Björn Waide Contact me at mailto:[email protected]