Stars
VPTQ, A Flexible and Extreme low-bit quantization algorithm
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
A profiler to disclose and quantify hardware features on GPUs.
A list of awesome compiler projects and papers for tensor computation and deep learning.
Classical equations and diagrams in machine learning