An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…

TypeScript 11,659 1,130 Updated Feb 11, 2025

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 971 80 Updated Feb 15, 2025

chenzomi12 / AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 12,341 1,787 Updated Jan 2, 2025

HarleyCoops / Math-To-Manim

Python 677 74 Updated Feb 10, 2025

NVIDIA / cccl

CUDA Core Compute Libraries

C++ 1,459 188 Updated Feb 14, 2025

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,715 448 Updated Oct 9, 2023

HFAiLab / hai-platform

一种任务级GPU算力分时调度的高性能深度学习训练平台

Python 481 66 Updated Oct 24, 2023

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,015 207 Updated Feb 14, 2025

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 19,964 1,663 Updated Feb 12, 2025

snowflakedb / ArcticTraining

ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)

Python 37 4 Updated Feb 15, 2025

FlagOpen / FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

Python 417 64 Updated Feb 14, 2025

deepseek-ai / DeepSeek-V3

Python 84,489 13,547 Updated Feb 14, 2025

mit-han-lab / nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Cuda 660 41 Updated Feb 14, 2025

AnswerDotAI / ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

Python 1,179 78 Updated Feb 13, 2025

triton-inference-server / pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.

Python 771 53 Updated Feb 12, 2025

triton-inference-server / python_backend

Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.

C++ 587 158 Updated Feb 11, 2025

Snowflake-Labs / vllm

Python 13 3 Updated Dec 7, 2024

triton-inference-server / core

The core library and APIs implementing the Triton Inference Server.

C++ 115 104 Updated Feb 14, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 9,610 913 Updated Feb 14, 2025

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 519 39 Updated Feb 14, 2025

triton-inference-server / tutorials

This repository contains tutorials and examples for Triton Inference Server

Python 641 105 Updated Feb 15, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,540 150 Updated Feb 14, 2025

xtuc / triton-rs

Rust bindings to the Triton Inference Server

Rust 11 2 Updated Mar 14, 2024

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 948 58 Updated Jan 30, 2025

eranif / codelite

A multi purpose IDE specialized in C/C++/Rust/Python/PHP and Node.js. Written in C++

C++ 2,191 469 Updated Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LiuXinyu lauthu

Highlights

Block or report lauthu

Stars

NVIDIA / nvbench

fxmeng / TransMLA

huggingface / agents-course

kvcache-ai / ktransformers

sail-sg / oat

dzhng / deep-research