Name	Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github	.github
quantization	quantization
scripts	scripts
tests	tests
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
LICENSE	LICENSE
README.md	README.md
generate.py	generate.py
model.py	model.py
pyproject.toml	pyproject.toml
requirements.txt	requirements.txt
tokenizer.py	tokenizer.py
train.py	train.py

Name

Last commit message

Last commit date

.github

.pre-commit-config.yaml

⚡ Lit LLaMA 🦙

Lit-LLaMA is an independent implementation of LLaMA based on nanoGPT, and released under the Apache 2.0 license.

Backstory: Meta released the original LLaMA code under the GPL license. This means that any project containing LLaMA code must be released as GPL. We've seen LLaMA code leaking into the Apache 2.0 / BSD / MIT deep learning ecosystem, and that is an actual problem. Lit-LLaMA solves that for good.

Lit-LLaMA is:

Simple, single-file, no boilerplate
Numerically equivalent to the original model
Optimized to run on consumer hardware or at scale
Open-source no strings attached

Installation

Clone the repo

git clone https://github.com/Lightning-AI/lit-llama
cd lit-llama

Create a new Python environment

With venv

python -m venv lit-llama
source lit-llama/bin/activate

or Anaconda/Miniconda:

conda create -n lit-llama python=3.10
conda activate lit-llama

Install dependencies

pip install -r requirements.txt

You are all set! 🎉

Inference

To generate text predictions, you first need to download the model weights following the instructions on the official LLaMA repository. After you've done that, you should have a folder like this:

checkpoints/llama
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── 13B
│   ...
├── tokenizer_checklist.chk
└── tokenizer.model

You need to convert the weights to the Lit-LLaMA format by running:

python scripts/convert_checkpoint.py \
    --output_dir checkpoints/lit-llama \
    --ckpt_dir checkpoints/llama \
    --tokenizer_path checkpoints/llama/tokenizer.model \
    --model_size 7B

You can now run inference:

python scripts/generate.py --prompt "Hello, my name is"

This will run using the 7B model and will require roughly 26 GB of GPU memory (A100 GPU).

Run Lit-LLaMA on consumer devices

If you have a GPU with less memory, you can enable quantization with --quantize true which will take longer to load, but requires only ~8 GB of memory. It will run on any modern consumer GPU.

See python scripts/generate.py --help for more options.

Want to contribute?

We're in a quest towards fully open source AI, especially focusing on models in the 5-20B range, trained using the LLaMA philosophy.

Join us and start contributing, especially on the following areas:

Pre-training
Fine-tuning (full and LoRA)
Quantization
Sparsification

Look at train.py for a starting point towards pre-training / fine-tuning using Lightning Fabric.

Acknowledgements

@karpathy for nanoGPT
@FacebookResearch for the original LLaMA implementation
@TimDettmers for bitsandbytes

License

Lit-LLaMA is released under the Apache 2.0 license.

About

Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ Lit LLaMA 🦙

Installation

Clone the repo

Create a new Python environment

Install dependencies

Inference

Run Lit-LLaMA on consumer devices

Want to contribute?

Acknowledgements

License

About

Releases

Packages

Languages

License

sateeshs/lit-gpt

Folders and files

Latest commit

History

Repository files navigation

⚡ Lit LLaMA 🦙

Installation

Clone the repo

Create a new Python environment

Install dependencies

Inference

Run Lit-LLaMA on consumer devices

Want to contribute?

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages