Implementation of calibration methods for neural networks. Calibrators are provided as a python function that directly operates on output logits. calibrate.py
contains implementations of calibrators. Metrics and visualizing methods for confidence outputs are contained in train.py
.
Implementation of newly introduced calibrator will be updated continuously.
Train a neural network form scratch and calibrate the confidence:
python main.py --dataset <dataset> --model_type <model> --optimizer <optim>
Load already trained networks and only calibrate the confidence:
python main.py --dataset <dataset> \
--model_type <model> \
--optimizer <optim> \
--load_model <path_to_models>
Codes for models like resnet110 were copied and modified from akamaster's repository
- Histogram Binning
- Matrix Scaling
- Vector Scaling
- Temperature Scaling
- Some implementation available online make training and validation dataset to be same. However, since probality output from neural networks are highly overfitted to the train dataset, calibrators failed to achieve reasonable performances in this case.
- Histogram binning method requires user to designate adequate number of bins. This value should be carefully tuned in exploitation. Furthermore, when designing a histogram binning algorithm, users should decide how to split bins (each bin to be equally spaced or eqaully size). In our implementation we selected the former one.
- Performance of temperature scaling reported in the paper seems to be achieved when LBFGS optimizer was used for temperature value. In our experiments, results fluctuated a lot even to the small changes in hyperparameters to the optimizer (learning rate, num_iterations).
- Original paper reported that Expected Calibration Error of matrix scaling methods for CIFAR100 is near 0.25. It seems that such results were also obtained when LBFGS optimizer is used in training calibrator. When SGD or Adam is used, error values reduce to 0.15.
- On Calibration of Modern Neural Networks ICML 2017