Mooncake: A KVCache-centric Disaggregated
Architecture for LLM Serving

Mooncake is the serving platform for icon Kimi, a leading LLM service provided by icon Moonshot AI. This repository hosts its technical report and will also be utilized for the forthcoming open sourcing of traces. Stay tuned!

🔥 Updates

June 27, 2024: We present a Chinese blog with more discussions on zhihu.
June 26, 2024: Initial technical report release.

🎉 Overview

Mooncake features a KVCache-centric disaggregated architecture that separates the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated cache of KVCache.

The core of Mooncake is its KVCache-centric scheduler, which balances maximizing overall effective throughput while meeting latency-related Service Level Objectives (SLOs) requirements. Unlike traditional studies that assume all requests will be processed, Mooncake faces challenges due to highly overloaded scenarios. To mitigate these, we developed a prediction-based early rejection policy. Experiments show that Mooncake excels in long-context scenarios. Compared to the baseline method, Mooncake can achieve up to a 525% increase in throughput in certain simulated scenarios while adhering to SLOs. Under real workloads, Mooncake’s innovative architecture enables Kimi to handle 75% more requests.

📑 Citation

Please kindly cite our paper if you find it is useful:

@article{qin2024mooncake,
  title        = {Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving},
  author       = {Ruoyu Qin, Zheming Li, Weiran He, Mingxing Zhang, Yongwei Wu, Weimin Zheng, and Xinran Xu},
  year         = {2024}
}

Remark: arXiv version is still on holding.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
image		image
Mooncake-v1.pdf		Mooncake-v1.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mooncake: A KVCache-centric Disaggregated
Architecture for LLM Serving

🔥 Updates

🎉 Overview

📑 Citation

About

Releases

Packages

azywait/Mooncake

Folders and files

Latest commit

History

Repository files navigation

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

🔥 Updates

🎉 Overview

📑 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Mooncake: A KVCache-centric Disaggregated
Architecture for LLM Serving

Packages