Skip to content
View jacklee0575's full-sized avatar
💭
Focusing on DeepLearning
💭
Focusing on DeepLearning

Block or report jacklee0575

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DeepEP: an efficient expert-parallel communication library

Cuda 7,035 601 Updated Mar 6, 2025

The official implementation of Tensor ProducT ATTenTion Transformer (T6)

Python 320 31 Updated Feb 20, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,170 775 Updated Mar 1, 2025

Code release for AdapMoE accepted by ICCAD 2024

Jupyter Notebook 14 1 Updated Mar 6, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,751 168 Updated Mar 6, 2025

An Open Large Reasoning Model for Real-World Solutions

Python 1,471 78 Updated Mar 4, 2025

Development repository for the Triton language and compiler

MLIR 14,748 1,842 Updated Mar 7, 2025

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 18,852 1,351 Updated Mar 3, 2025

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 431 26 Updated Feb 10, 2025

Low-bit LLM inference on CPU with lookup table

C++ 691 54 Updated Jan 9, 2025

✨✨Latest Advances on Multimodal Large Language Models

14,136 908 Updated Mar 5, 2025

A throughput-oriented high-performance serving framework for LLMs

Cuda 748 29 Updated Sep 21, 2024

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,135 164 Updated Feb 13, 2025

Fast Multimodal LLM on Mobile Devices

C++ 728 87 Updated Mar 3, 2025

A low-latency & high-throughput serving engine for LLMs

Python 318 40 Updated Jan 31, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,298 240 Updated Mar 6, 2025

LLM serving cluster simulator

Jupyter Notebook 93 8 Updated Apr 25, 2024

High performance Transformer implementation in C++.

C++ 103 14 Updated Jan 18, 2025

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 479 52 Updated Aug 19, 2024

🚀 KIMI AI 长文本大模型逆向API【特长:长文本解读整理】,支持高速流式输出、智能体对话、联网搜索、探索版、K1思考模型、长文档解读、图像解析、多轮对话,零配置部署,多路token支持,自动清理会话痕迹,仅供测试,如需商用请前往官方开放平台。

TypeScript 4,344 733 Updated Dec 30, 2024

Efficient and easy multi-instance LLM serving

Python 319 25 Updated Mar 6, 2025

Standardized Serverless ML Inference Platform on Kubernetes

Python 3,956 1,121 Updated Mar 7, 2025

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python 2,381 150 Updated Mar 6, 2025

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python 172,361 45,215 Updated Mar 7, 2025

A list of ICs and IPs for AI, Machine Learning and Deep Learning.

PHP 1,655 275 Updated Jun 5, 2024

Letta (formerly MemGPT) is a framework for creating LLM services with memory.

Python 14,875 1,585 Updated Mar 6, 2025

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

Python 45 1 Updated Jul 16, 2024

Code release for VTW (AAAI 2025) Oral

Python 32 Updated Jan 18, 2025
Next