Skip to content
View WanliZhong's full-sized avatar
:octocat:
Focusing
:octocat:
Focusing
  • OpenCV China
  • SUSTech
  • 01:25 (UTC +08:00)

Highlights

  • Pro

Organizations

@opencv @SUSTown

Block or report WanliZhong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Cuda 252 8 Updated Mar 7, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,877 478 Updated Mar 10, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,234 785 Updated Mar 1, 2025

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,607 253 Updated Mar 4, 2025

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,773 285 Updated Mar 4, 2025

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 408 48 Updated Sep 11, 2024

Private Cloud Compute (PCC)

Swift 787 71 Updated Oct 24, 2024

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1,098 65 Updated Feb 28, 2025

Development repository for the Triton language and compiler

MLIR 14,799 1,851 Updated Mar 10, 2025

This repository contains integer operators on GPUs for PyTorch.

Python 192 50 Updated Sep 29, 2023

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,357 163 Updated Jul 12, 2024

Fast and memory-efficient exact attention

Python 16,192 1,535 Updated Mar 9, 2025

✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows

TypeScript 81,795 61,217 Updated Mar 10, 2025

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, …

Python 5,807 966 Updated May 29, 2024

The collection of pre-trained, state-of-the-art AI models for ailia SDK

Python 2,144 341 Updated Mar 9, 2025

System for AI Education Resource.

Python 3,884 485 Updated Oct 25, 2024

Try to reproduce mediapipe with OpenCV_lite and MNN.

C++ 3 Updated Jun 16, 2024

Convert Caffe models to ONNX.

Python 60 19 Updated Nov 8, 2021

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Python 9,187 1,506 Updated Aug 9, 2024

Step-by-step GEMM optimization tutorial on OpenCL GPU platforms

C 5 Updated Apr 25, 2024

MLX: An array framework for Apple silicon

C++ 19,498 1,111 Updated Mar 10, 2025

LLM inference in C/C++

C++ 76,198 11,025 Updated Mar 10, 2025

The programming language Ficus

C 72 9 Updated Nov 8, 2024

A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.

Go 91,305 13,775 Updated Mar 7, 2025

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 9,985 1,777 Updated Mar 10, 2025

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…

Python 47,161 8,054 Updated Mar 10, 2025

official pypi project for libfacedetection

Python 4 Updated Jun 4, 2023

The interactive graphing library for Python ✨

Python 16,832 2,612 Updated Mar 10, 2025

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 9,455 852 Updated Feb 16, 2025
Next