Distributed Knowledge Transfer (DKT) method for Distributed Continual Learning
Master thesis project for University of Pisa. This work introduces the Distributed Continual Learning (DCL) research area and the Distributed Knowledge Transfer (DKT) architecture.
Please notice that the code uploaded here needs pre-trained model to work, do not esitate to contact me for further information.
In Distributed Continual Learning (DCL), the term ”distributed” refers to a complex and highly interconnected environment that involves multiple agents working together to improve their performance through the exchange of information during the training process. What distinguishes the DCL approach is fusion with the continual learning environment, whereby models continuously give and take their state with each other at regular intervals, creating a highly dynamic and adaptive training process.
The proposed method applies knowledge distillation to the distributed continual scenario. The proposed architecture attaches two distinct classification heads (fig. 3.2) to a feature extractor. The first head, called continual learning head (CL), uses cross- entropy loss to optimize the model performance on the hard targets of the current experience, while the second head, called student head (ST), adopts another loss function (typically KD loss or MSE) using as target the predictions of another model on the very same experience.
The loss function is the sum of two head-specific loss functions:
The first term of the sum is relative to the Continual Learning (CL) head. It is the classic cross-entropy between the target
The second term is relative to the Student head (ST). It is the KD loss (in this case MSE has been used)
between q̂tc the soft-targets of the teacher model, q̂cl
the soft-targets of the student head distilled at the same temperature
The requirements are contained in the requirements.txt file and can be installed via pip:
pip install -r requirements.txt
The project has been developed using the Avalanche Continual Learning Library, which is based on Pytorch. Most of the dependencies are already contained in the Avalanche installation.
Please notice that the requirements were rawly extracted from the conda environment hence they need to be pruned.
You can use newer version of Cuda/Pytorch as I was limited by the NVIDIA driver of the machine I was working on.
The thesis consisted of three experiments and each one can be run with the specific script
Experiment1 -----> cifar100_training.py Experiment2 -----> splitcifar100_pretrained.py Experiment3 -----> step_training.py
They can be executed via python:
python cifar100_training.py
If you want to know more about this project you can consult my Master Thesis