Skip to content
View TKONIY's full-sized avatar
🤯
Working on LLM sys and GPU DB
🤯
Working on LLM sys and GPU DB

Organizations

@DBGroup-SUSTech

Block or report TKONIY

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Python 14 8 Updated Dec 4, 2024

🥧 Savoury implementation of the QUIC transport protocol and HTTP/3

Rust 9,805 756 Updated Jan 27, 2025

LiteSpeed QUIC and HTTP/3 Library

C 1,595 343 Updated Jan 9, 2025

HAProxy Load Balancer's development branch (mirror of git.haproxy.org)

C 5,190 814 Updated Jan 27, 2025

QUIC and HTTP/3 implementation in Python

Python 1,727 244 Updated Aug 8, 2024

An implementation of the QUIC transport protocol.

C++ 1,523 246 Updated Jan 26, 2025

Internet-Drafts that make up the base QUIC specification

Shell 1,637 203 Updated Jan 26, 2025

An implementation of the IETF QUIC protocol

Rust 1,192 122 Updated Jan 24, 2025

Cross-platform, C implementation of the IETF QUIC protocol, exposed to C, C++, C# and Rust.

C 4,141 546 Updated Jan 27, 2025

A Survey and Benchmark of QUIC

Python 58 8 Updated Jan 23, 2018

Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓

2,328 131 Updated Jan 26, 2025

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 894 53 Updated Jan 23, 2025

ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)

Python 24 3 Updated Dec 17, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,438 144 Updated Jan 24, 2025

AIFM: High-Performance, Application-Integrated Far Memory

C 116 37 Updated Feb 28, 2023

TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.

Rust 2,163 115 Updated Jan 27, 2025

The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.

Python 42 Updated Oct 18, 2024

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 145 6 Updated Oct 30, 2024

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 418 26 Updated Jan 22, 2025

[ICLR2025] MagicPIG: LSH Sampling for Efficient LLM Generation

Python 181 12 Updated Dec 16, 2024

[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Python 1,584 155 Updated Oct 29, 2024

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

Python 305 26 Updated Sep 25, 2024

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)

Python 102 21 Updated Jul 10, 2024

Long context evaluation for large language models

Python 198 15 Updated Jan 20, 2025
Python 70 7 Updated Dec 31, 2024
Next