chenbohua3

chenbohua3

21 followers · 13 following

@AlibabaPAI

Achievements

Organizations

Stars

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,572 247 Updated Mar 4, 2025

microsoft / MInference

[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an …

Python 924 46 Updated Feb 25, 2025

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 6,950 1,137 Updated Feb 28, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,736 165 Updated Feb 23, 2025

NVIDIA / TensorRT-Model-Optimizer

A unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deploym…

Python 759 55 Updated Mar 3, 2025

Mooler0410 / LLMsPracticalGuide

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

9,750 755 Updated May 31, 2024

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 571 33 Updated Feb 21, 2025

33357 / smartcontract-apps

这是一个面向中文社区，分析市面上智能合约应用的架构与实现的仓库。

Solidity 1,634 345 Updated Dec 4, 2024

linexjlin / GPTs

leaked prompts of GPTs

29,349 3,986 Updated Sep 27, 2024

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,202 73 Updated Oct 14, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,603 1,126 Updated Mar 4, 2025

THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,389 427 Updated May 29, 2024

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,814 378 Updated Jul 11, 2024

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,796 233 Updated Mar 3, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 40,198 6,021 Updated Mar 4, 2025

intel / xFasterTransformer

C++ 404 65 Updated Feb 27, 2025

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,482 116 Updated Feb 16, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 14,702 1,834 Updated Mar 4, 2025

Xwin-LM / Xwin-LM

Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment

Python 1,036 42 Updated May 31, 2024

openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"

Python 2,606 373 Updated Jan 17, 2025

leptonai / leptonai

A Pythonic framework to simplify AI service building

Python 2,692 173 Updated Feb 25, 2025

tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥

Mojo 2,109 140 Updated May 21, 2024

leptonai / examples

Lepton Examples

Jupyter Notebook 141 18 Updated Dec 29, 2024

Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Python 6,036 520 Updated Sep 6, 2024

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 15,603 2,669 Updated Dec 18, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 8,102 2,165 Updated Mar 4, 2025

google / BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Python 2,982 599 Updated Jul 19, 2024

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,162 211 Updated Oct 8, 2024

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,613 377 Updated Dec 4, 2024

pytorch / torchdynamo

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

Python 1,033 125 Updated Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chenbohua3

Achievements

Achievements

Organizations

Block or report chenbohua3

Stars

DefTruth / Awesome-LLM-Inference

microsoft / MInference

NVIDIA / cutlass

kvcache-ai / Mooncake

NVIDIA / TensorRT-Model-Optimizer

Mooler0410 / LLMsPracticalGuide

mit-han-lab / omniserve

33357 / smartcontract-apps

linexjlin / GPTs

hao-ai-lab / LookaheadDecoding

NVIDIA / TensorRT-LLM

THUDM / CogVLM

mit-han-lab / streaming-llm

mit-han-lab / llm-awq

vllm-project / vllm

intel / xFasterTransformer

horseee / Awesome-Efficient-LLM

triton-lang / triton

Xwin-LM / Xwin-LM

openai / human-eval

leptonai / leptonai

tairov / llama2.mojo

leptonai / examples

Lightning-AI / lit-llama

openai / evals

EleutherAI / lm-evaluation-harness

google / BIG-bench

intel / intel-extension-for-transformers

facebookincubator / AITemplate

pytorch / torchdynamo