Stars
Fast Block Sparse Matrices for Pytorch
Sparse-dense matrix-matrix multiplication on GPUs
Efficient GPU kernels for block-sparse matrix multiplication and convolution
CUDA templates for tile-sparse matrix multiplication based on CUTLASS.
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream d…