- Shenzhen, China
-
10:51
(UTC +08:00) - https://dengyangshen.netlify.app
- https://orcid.org/0009-0003-9487-1455
Highlights
Lists (5)
Sort Name ascending (A-Z)
Starred repositories
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
A sparse attention kernel supporting mix sparse patterns
Cost-efficient and pluggable Infrastructure components for GenAI inference
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
《Machine Learning Systems: Design and Implementation》- Chinese Version
DeepAuto-AI / sglang
Forked from sgl-project/sglangThis is a fork of SGLang for hip-attention integration. Please refer to hip-attention for detail.
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
A high-performance, lightweight, and cross-platform QUIC library
What would you do with 1000 H100s...
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …
🥧 Savoury implementation of the QUIC transport protocol and HTTP/3
HAProxy Load Balancer's development branch (mirror of git.haproxy.org)
An implementation of the QUIC transport protocol.
Internet-Drafts that make up the base QUIC specification
Cross-platform, C implementation of the IETF QUIC protocol, exposed to C, C++, C# and Rust.
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓