- Shenzhen, China
-
20:51
(UTC +08:00) - https://dengyangshen.netlify.app
- https://orcid.org/0009-0003-9487-1455
Highlights
Lists (5)
Sort Name ascending (A-Z)
Starred repositories
🥧 Savoury implementation of the QUIC transport protocol and HTTP/3
HAProxy Load Balancer's development branch (mirror of git.haproxy.org)
An implementation of the QUIC transport protocol.
Internet-Drafts that make up the base QUIC specification
Cross-platform, C implementation of the IETF QUIC protocol, exposed to C, C++, C# and Rust.
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
AIFM: High-Performance, Application-Integrated Far Memory
TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
[ICLR2025] MagicPIG: LSH Sampling for Efficient LLM Generation
[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
Long context evaluation for large language models