jacklee0575

Follow

💭

Focusing on DeepLearning

Jack_Lee jacklee0575

💭

Focusing on DeepLearning

Follow

3 followers · 9 following

Stars

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,035 601 Updated Mar 6, 2025

tensorgi / T6

The official implementation of Tensor ProducT ATTenTion Transformer (T6)

Python 320 31 Updated Feb 20, 2025

deepseek-ai / DeepSeek-V3

Python 91,224 14,746 Updated Feb 24, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

C++ 11,170 775 Updated Mar 1, 2025

PKU-SEC-Lab / AdapMoE

Code release for AdapMoE accepted by ICCAD 2024

Jupyter Notebook 14 1 Updated Mar 6, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,751 168 Updated Mar 6, 2025

AIDC-AI / Marco-o1

An Open Large Reasoning Model for Real-World Solutions

Python 1,471 78 Updated Mar 4, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 14,748 1,842 Updated Mar 7, 2025

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 18,852 1,351 Updated Mar 3, 2025

mit-han-lab / duo-attention

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 431 26 Updated Feb 10, 2025

microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table

C++ 691 54 Updated Jan 9, 2025

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

14,136 908 Updated Mar 5, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 748 29 Updated Sep 21, 2024

VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,135 164 Updated Feb 13, 2025

UbiquitousLearning / mllm

Fast Multimodal LLM on Mobile Devices

C++ 728 87 Updated Mar 3, 2025

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 318 40 Updated Jan 31, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,298 240 Updated Mar 6, 2025

Mutinifni / splitwise-sim

LLM serving cluster simulator

Jupyter Notebook 93 8 Updated Apr 25, 2024

LLMServe / SwiftTransformer

High performance Transformer implementation in C++.

C++ 103 14 Updated Jan 18, 2025

LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 479 52 Updated Aug 19, 2024

LLM-Red-Team / kimi-free-api

🚀 KIMI AI 长文本大模型逆向API【特长：长文本解读整理】，支持高速流式输出、智能体对话、联网搜索、探索版、K1思考模型、长文档解读、图像解析、多轮对话，零配置部署，多路token支持，自动清理会话痕迹，仅供测试，如需商用请前往官方开放平台。

TypeScript 4,344 733 Updated Dec 30, 2024

AlibabaPAI / llumnix

Efficient and easy multi-instance LLM serving

Python 319 25 Updated Mar 6, 2025

kserve / kserve

Standardized Serverless ML Inference Platform on Kubernetes

Python 3,956 1,121 Updated Mar 7, 2025

predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python 2,381 150 Updated Mar 6, 2025

Significant-Gravitas / AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python 172,361 45,215 Updated Mar 7, 2025

basicmi / AI-Chip

A list of ICs and IPs for AI, Machine Learning and Deep Learning.

PHP 1,655 275 Updated Jun 5, 2024

letta-ai / letta

Letta (formerly MemGPT) is a framework for creating LLM services with memory.

Python 14,875 1,585 Updated Mar 6, 2025

dvlab-research / Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

Python 45 1 Updated Jul 16, 2024

lzhxmu / VTW

Code release for VTW (AAAI 2025) Oral

Python 32 Updated Jan 18, 2025

PrincetonUniversity / LLMCompass

Python 119 27 Updated Jul 1, 2024