PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction with some ideas from ZeroQ: A Novel Zero Shot Quantization Framework.
- Tested with MobileNetV2
ImageNet validation set (Acc.) | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
- Tested with Deeplab-v3-plus_mobilenetv2
Pascal VOC 2012 val set (mIOU) | Pascal VOC 2007 test set (mIOU) | ||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
- Tested with MobileNetV2 SSD-Lite model
Pascal VOC 2012 val set (mAP with 12 metric) | Pascal VOC 2007 test set (mAP with 07 metric) | ||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
There are 6 arguments, all default to False
- quantize: whether to quantize parameters and activations.
- relu: whether to replace relu6 to relu.
- equalize: whether to perform cross layer equalization.
- correction: whether to apply bias correction
- clip_weight: whether to clip weights in range [-15, 15] (for convolution and linear layer)
- distill: whether to use distill data for setting min/max range of activation quantization
run the equalized model by:
python main_cls.py --quantize --relu --equalize
run the equalized and bias-corrected model by:
python main_cls.py --quantize --relu --equalize --correction
run the equalized and bias-corrected model with distilled data by:
python main_cls.py --quantize --relu --equalize --correction --distill
According to recent paper ZeroQ, we can distill some fake data to match the statistics from batch-normalization layers, then use it to set the min/max value range of activation quantization.
It does not need each conv followed by batch norm layer, and should produce better and more stable results using distilled data (the method from DFQ failed on some models due to too large value range).
Here are some modifications that differs from original ZeroQ implementation:
- Initialization of distilled data
- Early stop criterion
Also, I think it can be applied to optimizing cross layer equalization and bias correction. The results will be updated as long as I make it to work.
For cross layer equalization, it actually performs worse than standard method from DFQ in mobilenetv2 classification task. However, it provide some possibility to optimize structures like branching.
The 'Int8' model in this repo is actually simulation of 8 bits, the actual calculation is done in floating points.
This is done by quantizing-dequantizing parameters in each layer and activation between 2 consecutive layers;
Which means each tensor will have dtype 'float32', but there would be at most 256 (2^8) unique values in it.
Weight_quant(Int8) = Quant(Weight)
Weight_quant(FP32) = Weight_quant(Int8*) = Dequant(Quant(Weight))
Somehow I cannot make Bias-Correction work on 8-bits bias quantization (even with data dependent correction).
I am not sure how the original paper managed to do it with 8 bits quantization, but I guess they either use some non-uniform quantization techniques or use more bits for bias parameters as I do.
- cross layer equalization
- high bias absorption
- data-free bias correction
- test with detection model
- test with classification model
- use distilled data to set min/max activation range
- use distilled data to find optimal scale matrix
- use distilled data to do bias correction
- True Int8 inference