Stars
Fetch an academic paper or web article and send it to the reMarkable tablet with a single command
Data and tools for generating and inspecting OLMo pre-training data.
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
A high-throughput and memory-efficient inference and serving engine for LLMs
A Toolkit for Distributional Control of Generative Models
prompt2model - Generate Deployable Models from Natural Language Instructions
Instruct-tune LLaMA on consumer hardware
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch. Includes checkpoints and other tools such as statistical significance Smatch.
SummScreen: A Dataset for Abstractive Screenplay Summarization (ACL 2022)
Code for Massive-scale Decoding for Text Generation using Lattices
The repo containing the Critical Role Dungeons and Dragons Dataset.
SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.
Data and software for building the ACL Anthology.
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction
A tool for holistic analysis of language generations systems
A Spanish Reddit dialogues corpus, constructed using Reddit comments of 2019.
code for EMNLP 2019 paper Text Summarization with Pretrained Encoders
Joint Extraction & Compression text Summarization
Code for ACL 2020 paper: "Extractive Summarization as Text Matching"
BARTScore: Evaluating Generated Text as Text Generation