Glossary

.. glossary::

Accelerator
A :term:`device` with specialized processors such as GPUs dedicated to AI computation.

Accuracy
A measure of the percentage of correct predictions made by a model.

Activation
The output of a node's activation function, passed as an input to the subsequent layer of the network.

Activation Quantization
The process of converting the output values (:term:`activations`) of nodes from high precision (for example, 32-bit floating point) to lower precision (for example, 8-bit integer), reducing computation and memory requirements during :term:`inference`.

AdaRound
A technique used to minimize :term:`quantization` errors by carefully selecting how to round weights. AdaRound is especially powerful in retaining accuracy of models that undergo aggressive :term:`quantization`.

AI Model Efficiency Toolkit
An open-source software library developed by the :term:`Qualcomm Innovation Center`, providing a suite of :term:`quantization` and :term:`compression` technologies that reduce the computational load and memory usage of deep learning models.

AIMET
:term:`AI Model Efficiency Toolkit`.

AutoQuant
A feature that automatically chooses optimal :term:`quantization` parameters to automate the process of model quantization.

Batch Normalization
A technique for normalizing a layer's input to accelerate the convergence of deep network models.

BN
:term: `Batch Normalization`.

Batch Normalization Folding (BN Folding)
A model optimization technique that merges :term:`Batch Normalization` layers to eliminate the need to compute :term:`Batch Normalization` during :term:`inference`.

CNN
:term:`Convolutional neural network`.

Compression
The process of reducing the memory footprint and computational requirements of a neural network.

Convolutional Layer
A model layer that contains a set of filters that interact with an input to create an :term:`activation` map.

Convolutional Neural Network
A deep learning model that uses convolutional layers to extract features from input data, such as images.

Device
A portable computation platform such as a mobile phone or a laptop.

DLF
Dynamic Layer Fusion.

Dynamic Layer Fusion
A method for merging adjacent layers to decrease computational load during :term:`inference`.

Edge device
A device at the "edge" of the network. Typically a personal computation device such as a mobile phone or a laptop.

Encoding
The representation of model parameters (weights) and :term:`activations` in a compressed, quantized format. Different encoding schemes embody tradeoffs between model accuracy and efficiency.

FP32
32-bit floating-point precision, the default data type for representing weights and :term:`activations` in most deep learning frameworks.

Inference
The process of employing a trained AI model for its intended purpose: prediction, classification, content generation, etc.

INT8
8-bit integer precision, commonly used by AIMET to reduce the memory size and computational demands during :term:`inference`.

KL Divergence
Kullback-Leibler Divergence. A measure of the difference between two probability distributions. Used during :term:`quantization` calibration to maintain a similar distribution of :term:`activations` to the original floating-point model.

Layer
How nodes are organized in a model. The nodes in a layer are connected to the previous and subsequent layer via :term:`weights`.

Layer-wise quantization
A :term:`quantization` method where each layer is quantized independently. Used to achieve balance between model accuracy and computational efficiency by more aggressively compressing layers that have minimal impact on model performance.

LoRA MobileNet
A family of :term:`convolutional neural network` architectures developed at Google optimized to operate efficiently with constrained computational resources.

Model
A computational structure made up of :term:`layers` of :term:`nodes` connected by :term:`weights`.

Neural Network Compression Framework
Another :term:`compression` and optimization toolkit similar to AIMET.

Node
A computation unit in a :model:`model`. Each node performs a mathematical function on an input to produce an output.

Normalization
Scaling a feature such as a :term:`layer` to standardize the range of the feature.

NNCF
:term:`Neural Network Compression Framework`.

ONNX
:term:`Open Neural Network Exchange`.

Open Neural Network Exchange
An open-source format for the representation of neural network models across different AI frameworks.

Per-channel Quantization
A :term:`quantization` method where each channel of a :term:`convolutional layer` is quantized independently, reducing the quantization error compared to a global quantization scheme.

Post-Training Quantization
A technique for applying :term:`quantization` to a neural network after it has been trained using full-precision data, avoiding the need for retraining.

Pruning
Systematically removing less important neurons, weights, or connections from a model.

PTQ
:term:`Post-Training Quantization`.

PyTorch
A open-source deep learning framework developed by Facebook's AI Research lab (FAIR), widely used in research environments.

QAT
:term:`Quantization Aware Training`.

QDO
Quantize and dequantize operations.

Qualcomm Innovation Center
A division of Qualcomm, Inc. responsible for developing advanced technologies and open-source projects, including AIMET.

Quantization
A model :term:`compression` technique that reduces the bits used to represent each weight and :term:`activation` in a neural network, typically from floating-point 32-bit numbers to 8-bit integers.

Quantization-Aware Training
A technique in which :term:`quantization` is simulated throughout the training process so that the network adapts to the lower precision during training.

Quantization Simulation
A tool within AIMET that simulates the effects of :term:`quantization` on a model to predict how quantization will affect the model's performance.

QuantSim
:term:`Quantization Simulation`.

QUIC
:term:`Qualcomm Innovation Center`.

Target Hardware Accelerator
Specialized hardware designed to accelerate AI :term:`inference` tasks. Examples include GPUs, TPUs, and custom ASICs, for example Qualcomm's Cloud AI 100 inference accelerator.

Target Runtime
A model quantized for use on a low bitwidth platform, typically an :term:`edge device`.

TensorFlow
A widely-used open-source deep learning framework developed by Google.

TorchScript
An intermediate representation for :term:`PyTorch` models that enables running them independently of the Python environment, making them more suitable for production deployment.

Variant
The combination of machine learning framework (:term:`PyTorch`, :term:`TensorFlow`, or :term:`ONNX`) and processor (Nvidia version or CPU) that determines which version of the AIMET API to install.

Weights
Parameters that collectively represent features in a model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glossary.rst

glossary.rst

Glossary

Files

glossary.rst

Latest commit

History

glossary.rst

File metadata and controls

Glossary