Highlights
- Pro
Stars
[TMLR 2024] Efficient Large Language Models: A Survey
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
GLake: optimizing GPU memory management and IO transmission.
Dynamic Memory Management for Serving LLMs without PagedAttention
AcadHomepage: A Modern and Responsive Academic Personal Homepage
A curated list for Efficient Large Language Models
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
KV cache compression for high-throughput LLM inference
Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".
A high-throughput and memory-efficient inference and serving engine for LLMs
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Unified KV Cache Compression Methods for Auto-Regressive Models
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Incorporating the memory mechanism into the transformer and employing a parallel weighting structure to obtain a better utterance-level representation on the speaker verification task
基于Clash Core 制作的Clash For Linux备份仓库 A Clash For Linux Backup Warehouse Based on Clash Core
Probabilistic Data Structures and Algorithms in Python
Learning materials of Transformer, including my code, XMind, PDF and so on
A simple, time-tested, family of random hash functions in Python, based on CRC32 and xxHash, affine transformations, and the Mersenne Twister. 🎲
Python package built to ease deep learning on graph, on top of existing DL frameworks.
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
PyTorch implementation of Hash Embeddings (NIPS 2017). Submission to the NIPS Implementation Challenge.
Must-read papers on graph neural networks (GNN)