This repo contains the code to reproduce experiments performed in Does Knowledge Distillation Really Work?.
We use Hydra for configuration.
To use Hydra's multirun feature use the -m
flag (e.g. to run multiple trials use -m trial_id=0,1,...
).
git clone https://github.com/samuelstanton/gnosis.git
cd gnosis
conda create --name gnosis-env python=3.8
conda activate gnosis-env
python -m pip install -r requirements.txt
python -m pip install -e .
If you already have the datasets downloaded, just create a symlink. If you skip this step the datasets will be downloaded automatically.
mkdir ./data
ln -s <DATASET_PARENT_DIR> ./data/datasets
For the sake of efficiency, we recommend you train your teacher and student models separately.
python scripts/image_classification.py -m teacher.use_ckpts=False classifier.depth=20 trainer.num_epochs=200 trainer.optimizer.lr=0.1 trainer.lr_scheduler.eta_min=0. trainer.distill_teacher=False dataloader.batch_size=256 trial_id=0,1,2
python scripts/text_classification.py -m teacher.use_ckpts=False trainer.distill_teacher=False trial_id=0,1,2
To perform synthetic data augmentation you'll first need to train a GAN checkpoint.
python scripts/image_generation.py
python scripts/image_classification.py -m trial_id=0,1,2 exp_name=student_resnet_baseline_results
python scripts/image_classification.py -m trial_id=0,1,2 exp_name=student_resnet_mixup_results distill_loader.mixup_alpha=1.
python scripts/image_classification.py -m trial_id=0,1,2 exp_name=student_resnet_synth-aug_results distill_loader.synth_ratio=0.2
python scripts/text_classification.py -m trial_id=0,1,2 exp_name=student_lstm_baseline_results
By default, program output and checkpoints are stored locally in automatically generated subdirectories.
To log results to an S3 bucket (must have AWS credentials configured), use
logger=s3 logger.bucket_name=<BUCKET_NAME>
To load checkpoints from S3, use
ckpt_store=s3 s3_bucket=<BUCKET_NAME> teacher.ckpt_path=<TEACHER_REMOTE_PATH> density_model.ckpt_path=<DM_REMOTE_PATH>
Users are encouraged to consult the configuration files in the config
directory.
Almost every aspect of the program is configurable from the command line.
This project is made freely available under an MIT license. If you make use of any part of the code, please cite
@article{stanton2021does,
title={Does Knowledge Distillation Really Work?},
author={Stanton, Samuel and Izmailov, Pavel and Kirichenko, Polina and Alemi, Alexander A and Wilson, Andrew Gordon},
journal={arXiv preprint arXiv:2106.05945},
year={2021}
}
The SN-GAN implementation and evaluation is copied from here.
The CKA implementation is copied from here.