Stars
Heterogeneous Accelerated Computed Cluster (HACC) Resources Page
A framework for few-shot evaluation of language models.
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…
VPTQ, A Flexible and Extreme low-bit quantization algorithm
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
RapidStream TAPA compiles task-parallel HLS program into high-frequency FPGA accelerators.
SGLang is a fast serving framework for large language models and vision language models.
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
collection of benchmarks to measure basic GPU capabilities
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
High-speed Large Language Model Serving for Local Deployment
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
A collection of extensions for Vitis and Intel FPGA OpenCL to improve developer quality of life.
A library for efficient similarity search and clustering of dense vectors.
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程