zhukevkesky

Hanqi Zhu zhukevkesky

11 followers · 36 following

University of Science and Technology of China (USTC)

Achievements

Stars

82 results for source starred repositories

Clear filter

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,185 120 Updated Dec 13, 2024

PaddleJitLab / CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 283 32 Updated Dec 12, 2024

LMCache / LMCache

Making Long-Context LLM Inference 10x Faster and 10x Cheaper

Python 286 32 Updated Dec 14, 2024

NVIDIA / kvpress

LLM KV cache compression made easy

Python 267 14 Updated Dec 12, 2024

microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table

C++ 618 48 Updated Dec 6, 2024

eyalroz / cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs

C++ 804 80 Updated Dec 9, 2024

sgl-project / sgl-learning-materials

Materials for learning SGLang

136 9 Updated Dec 9, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 654 26 Updated Sep 21, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 31,896 4,847 Updated Dec 15, 2024

mirage-project / mirage

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 679 39 Updated Dec 11, 2024

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,326 94 Updated Dec 9, 2024

cvg / NoPoSplat

No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

Python 550 20 Updated Dec 7, 2024

hustvl / Senna

Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Python 219 6 Updated Dec 8, 2024

Thinklab-SJTU / Awesome-LLM4AD

A curated list of awesome LLM for Autonomous Driving resources (continually updated)

1,065 53 Updated Sep 25, 2024

Infini-AI-Lab / MagicDec

Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

Python 100 4 Updated Dec 4, 2024

processorcentricmodel / PCCS

Cuda 6 1 Updated Sep 26, 2021

SJTU-ReArch-Group / Paper-Reading-List

85 8 Updated Dec 13, 2024

Raphael-Hao / brainstorm

Compiler for Dynamic Neural Networks

Python 43 2 Updated Nov 13, 2023

sizheliu-unc / ECRTS24

Supplemental material for ECRTS24 paper: Autonomy Today: Many Delay-Prone Black Boxes

C++ 2 Updated May 27, 2024

fredrickang / Public-RT-Swap

RT-Swap: Addressing GPU Memory Bottlenecks for Real-Time Multi-DNN Inference

Python 2 Updated Dec 2, 2024

NVlabs / InstantSplat

InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds

Python 897 57 Updated Dec 14, 2024

rtenlab / gcaps-super-repo

C 6 Updated May 17, 2024

PantheonInfer / Pantheon

Source code for the paper: "Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs"

Python 7 1 Updated Apr 15, 2024

iamananonymousauthor / missile

A Fine-Grained, Hardware-Level GPU Resource Isolation Solution for Multi-Tenant DNN Inference

C++ 2 Updated May 26, 2024

hao-ai-lab / MuxServe

Jupyter Notebook 47 3 Updated Jun 13, 2024

conveyor-sys / conveyor

Efficient tool-assisted LLM serving runtime.

Python 5 1 Updated Sep 11, 2024

LLMServe / dLoRA-artifact

Jupyter Notebook 15 3 Updated May 28, 2024

chenhongyu2048 / LLM-inference-optimization-paper

Summary of some awesome work for optimizing LLM inference

39 1 Updated Dec 13, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,549 153 Updated Dec 14, 2024

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 8,771 632 Updated Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hanqi Zhu zhukevkesky

Achievements

Achievements

Block or report zhukevkesky

Stars

kvcache-ai / Mooncake

PaddleJitLab / CUDATutorial

LMCache / LMCache

NVIDIA / kvpress

microsoft / T-MAC

eyalroz / cuda-api-wrappers

sgl-project / sgl-learning-materials

efeslab / Nanoflow

vllm-project / vllm

mirage-project / mirage

horseee / Awesome-Efficient-LLM

cvg / NoPoSplat

hustvl / Senna

Thinklab-SJTU / Awesome-LLM4AD

Infini-AI-Lab / MagicDec

processorcentricmodel / PCCS

SJTU-ReArch-Group / Paper-Reading-List

Raphael-Hao / brainstorm

sizheliu-unc / ECRTS24

fredrickang / Public-RT-Swap

NVlabs / InstantSplat

rtenlab / gcaps-super-repo

PantheonInfer / Pantheon

iamananonymousauthor / missile

hao-ai-lab / MuxServe

conveyor-sys / conveyor

LLMServe / dLoRA-artifact

chenhongyu2048 / LLM-inference-optimization-paper

flashinfer-ai / flashinfer

facebookresearch / xformers