-
Notifications
You must be signed in to change notification settings - Fork 6
Automated QC of raw images
This script will help automate the QC process by opening each of your images in a loop, prompting you for QC ratings at the command line, saving the result as a .csv. If you run the script multiple times, it can pick up where you left off, and skip showing images you've already rated.
See our QC guidelines for more details.
import csv
from datetime import datetime
import glob
import os
import pandas as pd
import re
import shutil
import sys
# ---------- SETUP ---------- #
# Usage:
# module load anaconda minc-toolkit-v2
# python run_qc.py
# Ctrl + C to exit any time
### DEFINE THESE ###
image_paths = '/path/to/images/*mprage.mnc'
subj_pattern = r'[\d]{4}'
working_dir = os.getcwd()
columns = ['subject', 'rating', 'notes'] # for QC df
# Master file for tracking QC ratings
# We want to append to this if it already exists
# Just to be safe - make a timestamped copy in an archive directory
out_file = f'{working_dir}/qc_ratings.csv'
if os.path.isfile(out_file):
os.makedirs(f'{working_dir}/archive', exist_ok=True)
timestamp = datetime.now().isoformat()
shutil.copyfile(out_file, f'{working_dir}/archive/qc_ratings_{timestamp}.csv')
qc_data = pd.read_csv(out_file)
else:
with open(out_file, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(columns)
qc_data = pd.DataFrame(columns=columns)
# ---------- RUN QC ---------- #
def save_data(ratings):
# Merge new ratings with existing ones
ratings = pd.DataFrame(ratings, columns=columns)
new_qc_data = pd.concat([qc_data, ratings])
new_qc_data.to_csv(out_file, index=False)
image_paths = sorted(glob.glob(image_paths))
new_ratings = [] # Running list of new QC ratings to append to DF
try:
for image_path in image_paths:
subj_id = re.findall(subj_pattern, image_path)[0]
# Find existing ratings for this subject
# If they exist, skip rating this subject
subj_df = qc_data.loc[qc_data['subject'].astype(str)==subj_id]
if (len(subj_df) > 0):
continue
# Otherwise, show image, get QC rating and any notes
else:
os.system(f'Display -gray {image_path}')
rating = input('Rating: ')
notes = input('Notes: ')
new_ratings.append([subj_id, rating, notes])
except KeyboardInterrupt:
save_data(new_ratings)
save_data(new_ratings)
To use the script, copy the code above into a file with a .py extension (e.g., run_qc.py
), and run the following:
module load anaconda minc-toolkit-v2
python run_qc.py
For the script to work properly, you should define the two variables at the top of the script.
-
image_paths
: this variable defines the directory and file pattern corresponding to each of your images. In the above example,/path/to/data/*/*mprage.mnc
means we'll look for files in any subdirectory of/path/to/data/
(e.g., corresponding to different subjects), then look for each file ending withmprage.mnc
in each of those subdirectories. -
subj_pattern
: this defines the string pattern in each filepath you'd like to correspond to a subject ID. In the above example,r'[\d]{4}'
corresponds to any 4-digit pattern (i.e., in this dataset, each subject has a 4-digit ID). See defining regular expressions in Python for more details.
The script will show each image in grayscale using Display
. After exiting the window, it will prompt you for a rating at the command line, as well as for any notes about your rating (just press Enter if you'd like to skip adding notes).
If you want to quit the program at any time, press Ctrl + C
. The script will save your progress in a .csv file. If you run the script again, it will copy your existing ratings to a new file in an archive folder (in case anything breaks), then pick up where you left off, and only prompt you to rate the unseen images.