Taylor-Gavel

Taylor-Gavel

Stars

AnswerDotAI / cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

Python 106 10 Updated Aug 9, 2024

THUDM / LongBench

LongBench v2 and LongBench (ACL 2024)

Python 755 65 Updated Jan 15, 2025

NVIDIA / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 845 57 Updated Dec 16, 2024

Ablustrund / LoRAMoE

LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment

Python 270 20 Updated Apr 29, 2024

FMInference / H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 416 51 Updated Aug 1, 2024

microsoft / MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 877 41 Updated Dec 28, 2024