Skip to content
View Xu-Chen's full-sized avatar

Block or report Xu-Chen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A highly optimized inference acceleration engine for Llama and its variants.

C++ 332 28 Updated Dec 12, 2024

A Fast TTS Engine

Python 372 21 Updated Dec 9, 2024

My learning notes/codes for ML SYS.

Python 182 6 Updated Dec 13, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,890 1,024 Updated Dec 11, 2024

PearAI: Open Source AI Code Editor (Fork of VSCode). The PearAI Submodule (https://github.com/trypear/pearai-submodule) is a fork of Continue.

TypeScript 326 98 Updated Dec 12, 2024

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Jupyter Notebook 7,913 596 Updated Nov 30, 2024

Composable building blocks to build Llama Apps

Python 4,918 634 Updated Dec 14, 2024
Python 202 9 Updated Dec 2, 2024

An open-source RAG-based tool for chatting with your documents.

Python 17,970 1,397 Updated Dec 11, 2024
TypeScript 8,501 435 Updated Dec 13, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 31,859 4,843 Updated Dec 14, 2024

A package for parsing PDFs and analyzing their content using LLMs.

Python 251 6 Updated Aug 6, 2024

Microsoft's GraphRAG + AutoGen + Ollama + Chainlit = Fully Local & Free Multi-Agent RAG Superbot

Python 538 107 Updated Jul 20, 2024

SearchGPT / Perplexity clone, but personalised for you.

TypeScript 971 136 Updated Aug 5, 2024

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

JavaScript 5,479 552 Updated Dec 5, 2024

An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents

Python 5,345 424 Updated Sep 26, 2024
Python 79 8 Updated Sep 9, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 2 Updated Aug 19, 2024

Agentic components of the Llama Stack APIs

Python 3,969 572 Updated Dec 14, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 6,510 578 Updated Dec 12, 2024

SearchGPT / Perplexity Pages clone, but personalised for you.

Python 221 24 Updated Aug 31, 2024

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Cuda 199 7 Updated Dec 13, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,547 153 Updated Dec 13, 2024

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 217 21 Updated Nov 22, 2024

TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preproc…

C++ 15 2 Updated Jul 5, 2024

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,186 67 Updated Nov 27, 2024

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 790 66 Updated Dec 13, 2024

GPTModels - a multi model, window based LLM AI plugin for neovim, with an emphasis on stability and clean code

Lua 57 2 Updated Dec 13, 2024

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Python 142 30 Updated Dec 14, 2024

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python 92 8 Updated Dec 5, 2024
Next