Hackable implementation of state-of-the-art open-source large language models released under the Apache 2.0 license.
Supports popular public checkpoints such as:
- Meta AI Llama 2
- Stability AI FreeWilly2
- TII UAE Falcon
- OpenLM Research OpenLLaMA
- LMSYS Vicuna and LongChat
- Together RedPajama-INCITE
- EleutherAI Pythia
- StabilityAI StableLM
This implementation extends on Lit-LLaMA and nanoGPT, and it's powered by Lightning Fabric ⚡.
This repository follows the main principle of openness through clarity.
Lit-GPT is:
- Simple: Single-file implementation without boilerplate.
- Correct: Numerically equivalent to the original model.
- Optimized: Runs fast on consumer hardware or at scale.
- Open-source: No strings attached.
Avoiding code duplication is not a goal. Readability and hackability are.
Join our Discord to build high-performance, truly open-source models for the common benefit of the community.
Clone the repo
git clone https://github.com/Lightning-AI/lit-gpt
cd lit-gpt
Lit-GPT currently relies on flash attention from PyTorch nightly. Until PyTorch 2.1 is released you'll need to install nightly manually. Luckily that is straightforward:
On CUDA
pip install --index-url https://download.pytorch.org/whl/nightly/cu118 --pre 'torch>=2.1.0dev'
On CPU (incl Macs)
pip install --index-url https://download.pytorch.org/whl/nightly/cpu --pre 'torch>=2.1.0dev'
(Optional) install Flash Attention 2
MAX_JOBS=4 pip install 'flash-attn>=2.0.0.post1' --no-build-isolation
All good, now install the dependencies:
pip install -r requirements.txt
You are all set! 🎉
To generate text predictions, you need to download the model weights. If you don't have them, check out our guide.
Run inference:
python generate/base.py --prompt "Hello, my name is"
This will run the 3B pre-trained model and require ~7 GB of GPU memory using the bfloat16
datatype.
Full guide for generating samples from the model.
You can also chat with the model interactively:
python chat/base.py
We support 4-bit quantization (as in QLoRA), LLM.int8, and GPTQ.int4 inference by following this guide.
We provide a simple training scripts (finetune/adapter.py
, finetune/adapter_v2.py
, and finetune/lora.py
) that instruction-tunes a pretrained model on the Alpaca dataset.
- Download the data and generate an instruction tuning dataset:
python scripts/prepare_alpaca.py
- Run the finetuning script
For example, you can either use
Adapter (Zhang et al. 2023):
python finetune/adapter.py
or Adapter v2 (Gao et al. 2023):
python finetune/adapter_v2.py
or LoRA (Hu et al. 2021):
python finetune/lora.py
(Please see the tutorials/finetune_adapter for details on the differences between the two adapter methods.)
The finetuning requires at least one GPU with ~12 GB memory (RTX 3060).
It is expected that you have downloaded the pretrained weights as described above. More details about each finetuning method and how you can apply it to your own data can be found in our technical how-to guides.
These technical tutorials illustrate how to run the finetuning code.
Looking for conceptual tutorials and explanations? We have some additional articles below:
Porting from Lit-LLaMA in progress 👷
We are on a quest towards fully open source AI.
Join us and start contributing, especially on the following areas:
We welcome all individual contributors, regardless of their level of experience or hardware. Your contributions are valuable, and we are excited to see what you can accomplish in this collaborative and supportive environment.
Unsure about contributing? Check out our Contributing to Lit-LLaMA: A Hitchhiker’s Guide to the Quest for Fully Open-Source AI guide. The same guidelines apply to Lit-GPT.
Don't forget to join our Discord!
- @karpathy for nanoGPT
- @EleutherAI for GPT-NeoX
- @TimDettmers for bitsandbytes
- @IST-DASLab for GPTQ
- @Microsoft for LoRA
- @tridao for Flash Attention 2
Lit-GPT is released under the Apache 2.0 license.