Skip to content

Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

License

Notifications You must be signed in to change notification settings

ssghost/lit-gpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lit-LLaMA

⚡ Lit-LLaMA ️

cpu-tests license Discord

⚡ Lit-LLaMA ️

Independent implementation of LLaMA that is fully open source under the Apache 2.0 license.

This implementation builds on nanoGPT.

Why?

The original LLaMA code is GPL licensed which means any project using it must also be released under GPL.

This "taints" any other code and prevents meaningful academic and commercial use.

Lit-LLaMA solves that for good.

 

Design principles

Lit-LLaMA is:

  • Simple: Single-file implementation without boilerplate.
  • Correct: Numerically equivalent to the original model.
  • Optimized: Runs on consumer hardware or at scale.
  • Open-source: No strings attached.

Get involved!

Join our Discord to build high-performance, truly open-source models for the common benefit of the community.

 

Setup

Clone the repo

git clone https://github.com/Lightning-AI/lit-llama
cd lit-llama

install dependencies

pip install -r requirements.txt

You are all set! 🎉

Use the model

To generate text predictions, download the model weights following the instructions on the official LLaMA repository. Now you should have a folder like this:

checkpoints/llama
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── 13B
│   ...
├── tokenizer_checklist.chk
└── tokenizer.model

Convert the weights to the Lit-LLaMA format:

python scripts/convert_checkpoint.py \
    --output_dir checkpoints/lit-llama \
    --ckpt_dir checkpoints/llama \
    --tokenizer_path checkpoints/llama/tokenizer.model \
    --model_size 7B

Run inference:

python generate.py --prompt "Hello, my name is"

This will run the 7B model and require ~26 GB of GPU memory (A100 GPU).

Run Lit-LLaMA on consumer devices

For GPUs with less memory, enable quantization (--quantize true). This will take longer to load but require ~8GB of memory. This can run on any consumer GPU.

python generate.py --quantize true --prompt "Hello, my name is"

See python generate.py --help for more options.

 

Get involved!

We're in a quest towards fully open source AI, especially focusing on models in the 5-20B range, trained using the LLaMA approach (smaller models trained for longer).

Lit-LLaMA

Join us and start contributing, especially on the following areas:

  • Pre-training
  • Fine-tuning (full and LoRA)
  • Quantization
  • Sparsification

Look at train.py for a starting point towards pre-training / fine-tuning using Lightning Fabric.

Don't forget to join our Discord!

Acknowledgements

License

Lit-LLaMA is released under the Apache 2.0 license.

About

Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.9%
  • Other 1.1%