Skip to content
View leliyliu's full-sized avatar

Highlights

  • Pro

Block or report leliyliu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,561 181 Updated Mar 4, 2025

Repository for MLCommons Chakra schema and tools

Python 88 49 Updated Feb 26, 2025

Summary of some awesome work for optimizing LLM inference

58 1 Updated Feb 4, 2025

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 142 11 Updated Jul 5, 2024

My learning notes/codes for ML SYS.

Python 1,245 64 Updated Mar 4, 2025
Python 49 8 Updated Dec 31, 2024

LaTeX Proposal Template for the University of Chinese Academy of Sciences

TeX 638 144 Updated Oct 29, 2021

The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.

Python 7 1 Updated Sep 19, 2024

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,472 242 Updated Feb 20, 2025
C++ 412 57 Updated Feb 28, 2025

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

Python 236 14 Updated Jan 13, 2025

Curated collection of papers in machine learning systems

248 14 Updated Feb 28, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,737 166 Updated Feb 23, 2025

UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.

Python 21 3 Updated Feb 11, 2025

Fast Multimodal LLM on Mobile Devices

C++ 725 83 Updated Mar 3, 2025

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

Python 89 12 Updated Feb 24, 2025

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,676 277 Updated Mar 4, 2025
Python 901 105 Updated Jan 23, 2025

AIOS: AI Agent Operating System

Python 3,882 474 Updated Mar 4, 2025

A Overview of Efficiently Serving Foundation Models across Edge Devices

13 Updated Jan 17, 2025

GLake: optimizing GPU memory management and IO transmission.

Python 433 38 Updated Nov 27, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 11,327 1,136 Updated Mar 4, 2025

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.

487 45 Updated Oct 25, 2024

[TMLR 2024] Efficient Large Language Models: A Survey

1,107 95 Updated Feb 27, 2025

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

436 35 Updated Jan 15, 2025

awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.

189 14 Updated Feb 27, 2025

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,573 247 Updated Mar 4, 2025

An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"

Python 35 3 Updated Jun 7, 2024

A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Wel…

170 11 Updated Feb 10, 2025

NeuPIMs Simulator

Jupyter Notebook 71 21 Updated Jun 19, 2024
Next