WanliZhong

Focusing

Wanli WanliZhong

Focusing

A new 🐤 who want to be a great person. (Member of OpenCV China; Master Student in SUSTech)

98 followers · 102 following

OpenCV China
SUSTech
01:25 (UTC +08:00)

Achievements

x2 x2

Achievements

x2 x2

Highlights

Organizations

Lists (1)

Sort

🗜️Quantization

1 repository

Stars

thu-ml / SpargeAttn

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Cuda 252 8 Updated Mar 7, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,877 478 Updated Mar 10, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

C++ 11,234 785 Updated Mar 1, 2025

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,607 253 Updated Mar 4, 2025

DefTruth / CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,773 285 Updated Mar 4, 2025

hahnyuan / LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 408 48 Updated Sep 11, 2024

apple / security-pcc

Private Cloud Compute (PCC)

Swift 787 71 Updated Oct 24, 2024

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1,098 65 Updated Feb 28, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 14,799 1,851 Updated Mar 10, 2025

Guangxuan-Xiao / torch-int

This repository contains integer operators on GPUs for PyTorch.

Python 192 50 Updated Sep 29, 2023

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,357 163 Updated Jul 12, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 16,192 1,535 Updated Mar 9, 2025

ChatGPTNextWeb / NextChat

TypeScript 81,795 61,217 Updated Mar 10, 2025

microsoft / MMdnn

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, …

Python 5,807 966 Updated May 29, 2024

axinc-ai / ailia-models

The collection of pre-trained, state-of-the-art AI models for ailia SDK

Python 2,144 341 Updated Mar 9, 2025

microsoft / AI-System

System for AI Education Resource.

Python 3,884 485 Updated Oct 25, 2024

zihaomu / mediapipe_cmake

Try to reproduce mediapipe with OpenCV_lite and MNN.

C++ 3 Updated Jun 16, 2024

asiryan / caffe2onnx

Convert Caffe models to ONNX.

Python 60 19 Updated Nov 8, 2021

WongKinYiu / yolov9

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Python 9,187 1,506 Updated Aug 9, 2024

fengyuentau / how-to-optimize-gemm-opencl

Step-by-step GEMM optimization tutorial on OpenCL GPU platforms

C 5 Updated Apr 25, 2024

ml-explore / mlx

MLX: An array framework for Apple silicon

C++ 19,498 1,111 Updated Mar 10, 2025

OpenCVChina / OpenCVBookSourceCode

Python 12 1 Updated Oct 16, 2023

ggml-org / llama.cpp

LLM inference in C/C++

C++ 76,198 11,025 Updated Mar 10, 2025

vpisarev / ficus

The programming language Ficus

C 72 9 Updated Nov 8, 2024

fatedier / frp

A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.

Go 91,305 13,775 Updated Mar 7, 2025

alibaba / MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 9,985 1,777 Updated Mar 10, 2025

PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…

Python 47,161 8,054 Updated Mar 10, 2025

Wwupup / libfacedetection.pip

Forked from ShiqiYu/libfacedetection.pip

official pypi project for libfacedetection

Python 4 Updated Jun 4, 2023

plotly / plotly.py

The interactive graphing library for Python ✨

Python 16,832 2,612 Updated Mar 10, 2025

huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 9,455 852 Updated Feb 16, 2025