Skip to content
View lzzmm's full-sized avatar
🤯
🤯

Highlights

  • Pro

Organizations

@sysu @HPMLL

Block or report lzzmm

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 399 21 Updated Oct 31, 2024

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Python 16,866 1,709 Updated Oct 15, 2024

 Now we have become very big, Different from the original idea. Collect premium software in various categories.

JavaScript 77,870 6,286 Updated Dec 12, 2024

Low latency JSON generation using LLMs ⚡️

Jupyter Notebook 387 14 Updated Mar 10, 2024

CUDA Kernel Benchmarking Library

Cuda 530 66 Updated Nov 20, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 654 26 Updated Sep 21, 2024

Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.

Python 2,233 153 Updated Aug 21, 2024

Experimental projects related to TensorRT

MLIR 82 12 Updated Dec 13, 2024

User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)

TypeScript 23,720 2,348 Updated Dec 12, 2024

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 651 58 Updated Apr 7, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,890 1,024 Updated Dec 11, 2024

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,849 901 Updated Oct 22, 2024

A lightweight library for portable low-level GPU computation using WebGPU.

C++ 3,770 177 Updated Nov 18, 2024

Assembler for NVIDIA Volta and Turing GPUs

Python 203 40 Updated Jan 13, 2022

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 832 38 Updated Dec 13, 2024

A low-latency & high-throughput serving engine for LLMs

Python 270 33 Updated Sep 12, 2024

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 217 16 Updated Oct 28, 2024

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

C++ 2,660 234 Updated Dec 13, 2024

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Python 139 8 Updated Oct 15, 2024

An awesome repository of local AI tools

1,293 104 Updated Nov 13, 2024

Local AI API Platform

C++ 2,192 131 Updated Dec 13, 2024

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 36,038 4,443 Updated Dec 12, 2024

C++ Insights - See your source code with the eyes of a compiler

C++ 4,132 246 Updated Oct 21, 2024

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs

Python 76 6 Updated Nov 25, 2024

The official Meta Llama 3 GitHub site

Python 27,493 3,129 Updated Aug 12, 2024

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

402 35 Updated Nov 28, 2024

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 13,875 1,128 Updated May 23, 2024

A large-scale simulation framework for LLM inference

Python 294 49 Updated Nov 19, 2024

Bayesian optimisation & Reinforcement Learning library developed by Huawei Noah's Ark Lab

Jupyter Notebook 3,304 590 Updated Nov 30, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 31,845 4,841 Updated Dec 13, 2024
Next