This module contains Eager mode quantization APIs.
.. currentmodule:: torch.quantization
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
quantize
quantize_dynamic
quantize_qat
prepare
prepare_qat
convert
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
fuse_modules
QuantStub
DeQuantStub
QuantWrapper
add_quant_dequant
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
add_observer_
swap_module
propagate_qconfig_
default_eval_fn
get_observer_dict
This module contains FX graph mode quantization APIs (prototype).
.. currentmodule:: torch.quantization.quantize_fx
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
prepare_fx
prepare_qat_fx
convert_fx
fuse_fx
This describes the quantization related functions of the torch namespace.
.. currentmodule:: torch
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
quantize_per_tensor
quantize_per_channel
dequantize
Quantized Tensors support a limited subset of data manipulation methods of the regular full-precision tensor.
.. currentmodule:: torch.Tensor
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
view
as_strided
expand
flatten
select
ne
eq
ge
le
gt
lt
copy_
clone
dequantize
equal
int_repr
max
mean
min
q_scale
q_zero_point
q_per_channel_scales
q_per_channel_zero_points
q_per_channel_axis
resize_
sort
topk
This module contains observers which are used to collect statistics about the values observed during calibration (PTQ) or training (QAT).
.. currentmodule:: torch.quantization.observer
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
ObserverBase
MinMaxObserver
MovingAverageMinMaxObserver
PerChannelMinMaxObserver
MovingAveragePerChannelMinMaxObserver
HistogramObserver
PlaceholderObserver
RecordingObserver
NoopObserver
get_observer_state_dict
load_observer_state_dict
default_observer
default_placeholder_observer
default_debug_observer
default_weight_observer
default_histogram_observer
default_per_channel_weight_observer
default_dynamic_quant_observer
default_float_qparams_observer
This module implements modules which are used to perform fake quantization during QAT.
.. currentmodule:: torch.quantization.fake_quantize
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
FakeQuantizeBase
FakeQuantize
FixedQParamsFakeQuantize
FusedMovingAvgObsFakeQuantize
default_fake_quant
default_weight_fake_quant
default_per_channel_weight_fake_quant
default_histogram_fake_quant
default_fused_act_fake_quant
default_fused_wt_fake_quant
default_fused_per_channel_wt_fake_quant
disable_fake_quant
enable_fake_quant
disable_observer
enable_observer
This module defines QConfig objects which are used to configure quantization settings for individual ops.
.. currentmodule:: torch.quantization.qconfig
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
QConfig
default_qconfig
default_debug_qconfig
default_per_channel_qconfig
default_dynamic_qconfig
float16_dynamic_qconfig
float16_static_qconfig
per_channel_dynamic_qconfig
float_qparams_weight_only_qconfig
default_qat_qconfig
default_weight_only_qconfig
default_activation_only_qconfig
default_qat_qconfig_v2
.. automodule:: torch.nn.intrinsic
.. automodule:: torch.nn.intrinsic.modules
This module implements the combined (fused) modules conv + relu which can then be quantized.
.. currentmodule:: torch.nn.intrinsic
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
ConvReLU1d
ConvReLU2d
ConvReLU3d
LinearReLU
ConvBn1d
ConvBn2d
ConvBn3d
ConvBnReLU1d
ConvBnReLU2d
ConvBnReLU3d
BNReLU2d
BNReLU3d
.. automodule:: torch.nn.intrinsic.qat
.. automodule:: torch.nn.intrinsic.qat.modules
This module implements the versions of those fused operations needed for quantization aware training.
.. currentmodule:: torch.nn.intrinsic.qat
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
LinearReLU
ConvBn1d
ConvBnReLU1d
ConvBn2d
ConvBnReLU2d
ConvReLU2d
ConvBn3d
ConvBnReLU3d
ConvReLU3d
update_bn_stats
freeze_bn_stats
.. automodule:: torch.nn.intrinsic.quantized
.. automodule:: torch.nn.intrinsic.quantized.modules
This module implements the quantized implementations of fused operations like conv + relu. No BatchNorm variants as it's usually folded into convolution for inference.
.. currentmodule:: torch.nn.intrinsic.quantized
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
BNReLU2d
BNReLU3d
ConvReLU1d
ConvReLU2d
ConvReLU3d
LinearReLU
.. automodule:: torch.nn.intrinsic.quantized.dynamic
.. automodule:: torch.nn.intrinsic.quantized.dynamic.modules
This module implements the quantized dynamic implementations of fused operations like linear + relu.
.. currentmodule:: torch.nn.intrinsic.quantized.dynamic
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
LinearReLU
.. automodule:: torch.nn.qat
.. automodule:: torch.nn.qat.modules
This module implements versions of the key nn modules Conv2d() and Linear() which run in FP32 but with rounding applied to simulate the effect of INT8 quantization.
.. currentmodule:: torch.nn.qat
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
Conv2d
Conv3d
Linear
.. automodule:: torch.nn.qat.dynamic
.. automodule:: torch.nn.qat.dynamic.modules
This module implements versions of the key nn modules such as Linear() which run in FP32 but with rounding applied to simulate the effect of INT8 quantization and will be dynamically quantized during inference.
.. currentmodule:: torch.nn.qat.dynamic
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
Linear
.. automodule:: torch.nn.quantized
.. automodule:: torch.nn.quantized.modules
This module implements the quantized versions of the nn layers such as ~`torch.nn.Conv2d` and torch.nn.ReLU.
.. currentmodule:: torch.nn.quantized
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
ReLU6
Hardswish
ELU
LeakyReLU
Sigmoid
BatchNorm2d
BatchNorm3d
Conv1d
Conv2d
Conv3d
ConvTranspose1d
ConvTranspose2d
ConvTranspose3d
Embedding
EmbeddingBag
FloatFunctional
FXFloatFunctional
QFunctional
Linear
LayerNorm
GroupNorm
InstanceNorm1d
InstanceNorm2d
InstanceNorm3d
.. automodule:: torch.nn.quantized.functional
This module implements the quantized versions of the functional layers such as ~`torch.nn.functional.conv2d` and torch.nn.functional.relu. Note: :meth:`~torch.nn.functional.relu` supports quantized inputs.
.. currentmodule:: torch.nn.quantized.functional
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
avg_pool2d
avg_pool3d
adaptive_avg_pool2d
adaptive_avg_pool3d
conv1d
conv2d
conv3d
interpolate
linear
max_pool1d
max_pool2d
celu
leaky_relu
hardtanh
hardswish
threshold
elu
hardsigmoid
clamp
upsample
upsample_bilinear
upsample_nearest
.. automodule:: torch.nn.quantized.dynamic
.. automodule:: torch.nn.quantized.dynamic.modules
Dynamically quantized :class:`~torch.nn.Linear`, :class:`~torch.nn.LSTM`, :class:`~torch.nn.LSTMCell`, :class:`~torch.nn.GRUCell`, and :class:`~torch.nn.RNNCell`.
.. currentmodule:: torch.nn.quantized.dynamic
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
Linear
LSTM
GRU
RNNCell
LSTMCell
GRUCell
Note that operator implementations currently only support per channel quantization for weights of the conv and linear operators. Furthermore, the input data is mapped linearly to the the quantized data and vice versa as follows:
\begin{aligned} \text{Quantization:}&\\ &Q_\text{out} = \text{clamp}(x_\text{input}/s+z, Q_\text{min}, Q_\text{max})\\ \text{Dequantization:}&\\ &x_\text{out} = (Q_\text{input}-z)*s \end{aligned}
where \text{clamp}(.) is the same as :func:`~torch.clamp` while the scale s and zero point z are then computed as decribed in :class:`~torch.ao.quantization.observer.MinMaxObserver`, specifically:
\begin{aligned} \text{if Symmetric:}&\\ &s = 2 \max(|x_\text{min}|, x_\text{max}) / \left( Q_\text{max} - Q_\text{min} \right) \\ &z = \begin{cases} 0 & \text{if dtype is qint8} \\ 128 & \text{otherwise} \end{cases}\\ \text{Otherwise:}&\\ &s = \left( x_\text{max} - x_\text{min} \right ) / \left( Q_\text{max} - Q_\text{min} \right ) \\ &z = Q_\text{min} - \text{round}(x_\text{min} / s) \end{aligned}
where [x_\text{min}, x_\text{max}] denotes the range of the input data while Q_\text{min} and Q_\text{max} are respectively the minimum and maximum values of the quantized dtype.
Note that the choice of s and z implies that zero is represented with no quantization error whenever zero is within the range of the input data or symmetric quantization is being used.
Additional data types and quantization schemes can be implemented through the custom operator mechanism.
- :attr:`torch.qscheme` — Type to describe the quantization scheme of a tensor.
Supported types:
- :attr:`torch.per_tensor_affine` — per tensor, asymmetric
- :attr:`torch.per_channel_affine` — per channel, asymmetric
- :attr:`torch.per_tensor_symmetric` — per tensor, symmetric
- :attr:`torch.per_channel_symmetric` — per channel, symmetric
torch.dtype
— Type to describe the data. Supported types:- :attr:`torch.quint8` — 8-bit unsigned integer
- :attr:`torch.qint8` — 8-bit signed integer
- :attr:`torch.qint32` — 32-bit signed integer
.. automodule:: torch.nn.quantizable
.. automodule:: torch.nn.quantizable.modules