My family uses SmugMug to store and share photos. Because we are repeatedly backing up our smartphones, we wind up with a lot of duplicates.
Although SmugMug has a feature to skip uploading the same filename within the same gallery, it does not check for duplicates across galleries, duplicates already uploaded, or duplicates with different filenames.
Fortunately, SmugMug provides an API through which you can obtain a MD5 checksum for each photo in each album in your account. You can use this checksum to identify duplicate photos, since it's extremely unlikely that identical checksums will be produced by different photos.
I created a Python library with a GUI that allows me to review each set of duplicates to decide which ones I want to delete.
In the course of writing this program, I discovered that about one third of my photos were duplicates, with an average of three copies per duplicate.
The current default selection is to keep the photo in the smallest album, with the smallest filename if there's a tie.
Assuming you already have some photos on a SmugMug account, you can:
- Apply for a SmugMug API key (accept the API 2.0 beta T&C). This will give you your
API_KEY
andAPI_SECRET
. If you do this from within your account, the application will be linked to your account. - From your SmugMug home page, go to Account Settings > Privacy. Click on TOKEN in your App Name to display the values of
ACCESS_TOKEN
andACCESS_SECRET
. - Clone this repository and make a local copy of
credentialsTemplate.py
calledcredentials.py
. Paste your own values ofUSER_NAME
,API_KEY
,API_SECRET
,ACCESS_TOKEN
, andACCESS_SECRET
strings at the indicated positions. - Ensure that you have Python 3 and the following libraries:
requests
,PIL
,tkinter
, andpandas
. If not, usepip install pandas
etc. - Run from terminal with
python mugMatch.py
(running from IDE is not recommended).
Note: if an app will have other users, you can implement authorization with OAuth1, but this is left as an exercise to the reader.
Future improvements could include better default selections (based on which image has been tagged or titled, but we don't do a lot of that).
I would also like to learn to use the Grid
layout instead of Pack
in tkinter
, so I can create a table of image attributes instead of long image labels.