Lists (2)
Sort Name ascending (A-Z)
Stars
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba
A comparable corpus of Kalaallisut and Danish web-crawled sentences, along with some noisy aligned texts and code for MT finetuning experiments between Kalaallisut and English. Currently looking to…
Unsupervised Language Model Pre-training for French
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Collections of CS PhD Application Fee Waivers of schools in North America
[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
Unsupervised text tokenizer for Neural Network-based text generation.
Head tracking software for MS Windows, Linux, and Apple OSX
Data augmentation for NLP, presented at EMNLP 2019
Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction
Code for the ACL 2022 paper "Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning"
Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".
Graph Convolutional Networks for Text Classification. AAAI 2019
An Incremental Learning, Continual Learning, and Life-Long Learning Repository
Facebook Low Resource (FLoRes) MT Benchmark
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python
HuBERT content encoders for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion