Starred repositories
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Official release of InternLM2.5 base and chat models. 1M context support
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Reference implementation of Megalodon 7B model
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)
Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Ring attention implementation with flash attention
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
Official inference library for Mistral models
Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
A library for efficient similarity search and clustering of dense vectors.
Making large AI models cheaper, faster and more accessible
Large World Model -- Modeling Text and Video with Millions Context
XVERSE-13B: A multilingual large language model developed by XVERSE Technology Inc.
VideoSys: An easy and efficient system for video generation
Open-Sora: Democratizing Efficient Video Production for All
A high-throughput and memory-efficient inference and serving engine for LLMs
Retrieval and Retrieval-augmented LLMs
Large Language Model Text Generation Inference
DLRover: An Automatic Distributed Deep Learning System
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Efficient Training (including pre-training and fine-tuning) for Big Models
This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchmark tasks that evaluate a model’s information retrieval cap…
YaRN: Efficient Context Window Extension of Large Language Models
code for Scaling Laws of RoPE-based Extrapolation