Skip to content
View leliyliu's full-sized avatar

Highlights

  • Pro

Block or report leliyliu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Curated collection of papers in machine learning systems

192 12 Updated Dec 10, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,178 120 Updated Dec 13, 2024

UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.

Python 8 1 Updated Nov 29, 2024

Fast Multimodal LLM on Mobile Devices

C++ 593 67 Updated Dec 5, 2024

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

Python 66 10 Updated Oct 24, 2024

📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attention-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).

Cuda 1,630 174 Updated Dec 13, 2024
Python 581 62 Updated Nov 27, 2024

AIOS: AI Agent Operating System

Python 3,492 419 Updated Dec 13, 2024

A Overview of Efficiently Serving Large Language Models across Edge Devices

8 Updated Jun 15, 2024

GLake: optimizing GPU memory management and IO transmission.

Python 393 34 Updated Nov 27, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 6,510 578 Updated Dec 12, 2024

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.

440 43 Updated Oct 25, 2024

[TMLR 2024] Efficient Large Language Models: A Survey

1,045 85 Updated Nov 23, 2024

Tensor library for machine learning

C++ 11,362 1,057 Updated Dec 13, 2024

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

402 35 Updated Nov 28, 2024

awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.

166 12 Updated Dec 11, 2024

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

2,991 204 Updated Dec 9, 2024

An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"

Python 34 3 Updated Jun 7, 2024

A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Wel…

159 11 Updated Nov 1, 2024

NeuPIMs Simulator

Jupyter Notebook 60 15 Updated Jun 19, 2024
Python 46 5 Updated Jun 24, 2024

ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference

C++ 73 12 Updated Dec 11, 2024

Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM stan…

C++ 255 62 Updated Dec 11, 2024

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 336 39 Updated Sep 11, 2024

Large Language Model (LLM) Systems Paper List

675 24 Updated Nov 28, 2024

how to optimize some algorithm in cuda.

Cuda 1,726 142 Updated Dec 12, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,546 153 Updated Dec 13, 2024
79 12 Updated Jun 26, 2023

Block-sparse primitives for PyTorch

Python 152 22 Updated Apr 5, 2021
Next