This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 942 150 Updated Jul 29, 2023

flame / how-to-optimize-gemm

C 1,837 356 Updated Jul 29, 2023

zjhellofss / KuiperInfer

校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

C++ 2,767 313 Updated Oct 26, 2024

JonDoe-297 / cross-view

[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Python 148 21 Updated Sep 8, 2021

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,618 377 Updated Dec 4, 2024

hikvision-research / skelact

Skeleton-based action recognition models in PyTorch, including Two-Stream CNN, HCN, HCN-Baseline, Ta-CNN and Dynamic GCN

Python 146 22 Updated Jul 2, 2022

BVLC / caffe

Caffe: a fast open framework for deep learning.

C++ 34,254 18,650 Updated Jul 31, 2024

Qualcomm-AI-research / oscillations-qat

Python 75 10 Updated Jul 21, 2022

dave-msk / merak

Python binary package builder (via Cython)

Python 32 7 Updated Jan 24, 2025

tinganchen / AlignQ

[CVPR 2022] AlignQ: Alignment Quantization with ADMM-based Correlation Preservation

Python 10 Updated Jan 6, 2023

zhutmost / neuralzip

A Out-of-box PyTorch Scaffold for Neural Network Quantization-Aware-Training (QAT) Research. Website: https://github.com/zhutmost/neuralzip

Python 26 1 Updated Dec 20, 2022

Efficient-ML / Awesome-Model-Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (p…

2,020 218 Updated Mar 4, 2025

hanjiale / HCRP

Code of paper Exploring Task Difficulty for Few-Shot Relation Extraction. https://arxiv.org/abs/2109.05473

Python 35 6 Updated Sep 12, 2021

LUSSeg / ImageNet-S

(TPAMI2022) The ImageNet-S benchmark/method for large-scale unsupervised/semi-supervised semantic segmentation.

Python 175 11 Updated Sep 24, 2023

facebookresearch / mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Python 7,626 1,250 Updated Jul 23, 2024

ccfddl / ccf-deadlines

⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Vue 7,055 480 Updated Mar 12, 2025

openvinotoolkit / nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

Python 983 247 Updated Mar 12, 2025

programthink / books

【编程随想】收藏的电子书清单（多个学科，含下载链接）

18,768 3,387 Updated Aug 16, 2022

jacobgil / pytorch-tensor-decompositions

PyTorch implementation of [1412.6553] and [1511.06530] tensor decomposition methods for convolutional layers.

Python 281 63 Updated Dec 1, 2021

ChenCVer / darknet

Darknet(AB版)框架源码解析：详尽的中文注释(逐句)和原理分析！

C 72 17 Updated Apr 27, 2021

cvlab-yonsei / EWGS

An official implementation of "Network Quantization with Element-wise Gradient Scaling" (CVPR 2021) in PyTorch.

Python 91 17 Updated Jul 14, 2023

cvlab-yonsei / DAQ

An official PyTorch implementation of the paper "Distance-aware Quantization", ICCV 2021.

Python 47 9 Updated Nov 1, 2024

hustzxd / LSQuantization

The PyTorch implementation of Learned Step size Quantization (LSQ) in ICLR2020 (unofficial)

Jupyter Notebook 130 21 Updated Nov 19, 2020

ModelTC / mqbench-paper

Python 44 9 Updated Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qiulin Zhang qiulinzhang

Achievements