Stars
Make huge neural nets fit in memory
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
Does Refusal Training in LLMs Generalize to the Past Tense? [NeurIPS 2024 Safe Generative AI Workshop (Oral)]
Code accompanying the paper "Massive Activations in Large Language Models"
Diffusion Classifier leverages pretrained diffusion models to perform zero-shot classification without additional training
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
PyTorch code and models for the DINOv2 self-supervised learning method.
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Test-Time Adaptation via Conjugate Pseudo-Labels
Tools for understanding how transformer predictions are built layer-by-layer
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
A prize for finding tasks that cause large language models to show inverse scaling
DeblurSR: Event-Based Motion Deblurring Under the Spiking Representation (AAAI 2024)
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Instruct-tune LLaMA on consumer hardware