Xu-Chen

Follow

Xu-Chen

Follow

3 followers · 40 following

Achievements

Achievements

Stars

zhihu / ZhiLight

A highly optimized inference acceleration engine for Llama and its variants.

C++ 332 28 Updated Dec 12, 2024

astramind-ai / Auralis

A Fast TTS Engine

Python 372 21 Updated Dec 9, 2024

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 182 6 Updated Dec 13, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,890 1,024 Updated Dec 11, 2024

trypear / pearai-app

Forked from microsoft/vscode

PearAI: Open Source AI Code Editor (Fork of VSCode). The PearAI Submodule (https://github.com/trypear/pearai-submodule) is a fork of Continue.

TypeScript 326 98 Updated Dec 12, 2024

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Jupyter Notebook 7,913 596 Updated Nov 30, 2024

meta-llama / llama-stack

Composable building blocks to build Llama Apps

Python 4,918 634 Updated Dec 14, 2024

lmarena / copilot-arena

Python 202 9 Updated Dec 2, 2024

Cinnamon / kotaemon

An open-source RAG-based tool for chatting with your documents.

Python 17,970 1,397 Updated Dec 11, 2024

voideditor / void

TypeScript 8,501 435 Updated Dec 13, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 31,859 4,843 Updated Dec 14, 2024

lazyFrogLOL / llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

Python 251 6 Updated Aug 6, 2024

karthik-codex / Autogen_GraphRAG_Ollama

Microsoft's GraphRAG + AutoGen + Ollama + Chainlit = Fully Local & Free Multi-Agent RAG Superbot

Python 538 107 Updated Jul 20, 2024

supermemoryai / opensearch-ai

SearchGPT / Perplexity clone, but personalised for you.

TypeScript 971 136 Updated Aug 5, 2024

InternLM / MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

JavaScript 5,479 552 Updated Dec 5, 2024

aiwaves-cn / agents

An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents

Python 5,345 424 Updated Sep 26, 2024

AlibabaPAI / FLASHNN

Python 79 8 Updated Sep 9, 2024

izhuhaoran / vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 2 Updated Aug 19, 2024

meta-llama / llama-stack-apps

Agentic components of the Llama Stack APIs

Python 3,969 572 Updated Dec 14, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 6,510 578 Updated Dec 12, 2024

alexfazio / OpenPlexity-Pages

SearchGPT / Perplexity Pages clone, but personalised for you.

Python 221 24 Updated Aug 31, 2024

HanGuo97 / flute

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Cuda 199 7 Updated Dec 13, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,547 153 Updated Dec 13, 2024

mit-han-lab / Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 217 21 Updated Nov 22, 2024

zhihu / TLLM_QMM

TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preproc…

C++ 15 2 Updated Jul 5, 2024

facebookresearch / MobileLLM

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,186 67 Updated Nov 27, 2024

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 790 66 Updated Dec 13, 2024

aaronik / GPTModels.nvim

GPTModels - a multi model, window based LLM AI plugin for neovim, with an emphasis on stability and clean code

Lua 57 2 Updated Dec 13, 2024

ModelCloud / GPTQModel

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Python 142 30 Updated Dec 14, 2024

HandH1998 / QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python 92 8 Updated Dec 5, 2024