Skip to content
View TKONIY's full-sized avatar
🤯
Working on LLM sys and GPU DB
🤯
Working on LLM sys and GPU DB

Organizations

@DBGroup-SUSTech

Block or report TKONIY

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Vue 7,023 477 Updated Mar 6, 2025

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 574 34 Updated Mar 6, 2025

A sparse attention kernel supporting mix sparse patterns

C++ 156 5 Updated Feb 13, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Jupyter Notebook 3,008 266 Updated Mar 7, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,635 186 Updated Mar 4, 2025

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,291 450 Updated Apr 13, 2024

This is a fork of SGLang for hip-attention integration. Please refer to hip-attention for detail.

Python 11 2 Updated Mar 6, 2025

Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.

Python 121 13 Updated Feb 25, 2025

Argument Parser for Modern C++

C++ 2,929 265 Updated Jan 26, 2025

Quick Merkle Database

Rust 215 20 Updated Mar 3, 2025

Network Benchmarking Utility

C++ 626 120 Updated Dec 19, 2024

A high-performance, lightweight, and cross-platform QUIC library

Rust 1,195 99 Updated Feb 25, 2025

What would you do with 1000 H100s...

Jupyter Notebook 1,011 62 Updated Jan 10, 2024

Solve puzzles. Learn CUDA.

Jupyter Notebook 10,631 821 Updated Sep 1, 2024

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …

Python 6,843 562 Updated Mar 6, 2025
Python 14 8 Updated Dec 4, 2024

🥧 Savoury implementation of the QUIC transport protocol and HTTP/3

Rust 9,923 770 Updated Mar 6, 2025

LiteSpeed QUIC and HTTP/3 Library

C 1,614 347 Updated Feb 18, 2025

HAProxy Load Balancer's development branch (mirror of git.haproxy.org)

C 5,343 829 Updated Mar 6, 2025

QUIC and HTTP/3 implementation in Python

Python 1,745 251 Updated Feb 2, 2025

An implementation of the QUIC transport protocol.

C++ 1,531 247 Updated Mar 6, 2025

Internet-Drafts that make up the base QUIC specification

Shell 1,643 204 Updated Mar 6, 2025

An implementation of the IETF QUIC protocol

Rust 1,204 124 Updated Mar 6, 2025

Cross-platform, C implementation of the IETF QUIC protocol, exposed to C, C++, C# and Rust.

C 4,215 553 Updated Mar 7, 2025

A Survey and Benchmark of QUIC

Python 60 9 Updated Feb 3, 2025

Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓

2,719 156 Updated Feb 21, 2025
Next