Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
assets		assets
quantization		quantization
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
model.py		model.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tokenizer.py		tokenizer.py
train.py		train.py

Repository files navigation

⚡ Lit-LLaMA ️

Independent implementation of LLaMA that is fully open source under the Apache 2.0 license.

This implementation builds on nanoGPT.

Why ⚡ Lit-LLaMA

The original LLaMA code is GPL license which means any project using it must also be released under GPL.

This "taints" any other code and prevents meaningful academic and commercial use.

Lit-LLaMA solves that for good.

Design principles

Lit-LLaMA is:

Simple: single-file implementaton without boilerplate.
Correct: Numerically equivalent to the original model.
Optimized: Runs on consumer hardware or at scale.
Open-source: No strings attached.

Contribute

Join our Discord to build high-performance, truly open-source models for the common benefit of the community.

Setup

Clone the repo

git clone https://github.com/Lightning-AI/lit-llama
cd lit-llama

and install dependencies

pip install -r requirements.txt

You are all set! 🎉

Use the model

To generate text predictions, you first need to download the model weights following the instructions on the official LLaMA repository. After you've done that, you should have a folder like this:

checkpoints/llama
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── 13B
│   ...
├── tokenizer_checklist.chk
└── tokenizer.model

You need to convert the weights to the Lit-LLaMA format by running:

python scripts/convert_checkpoint.py \
    --output_dir checkpoints/lit-llama \
    --ckpt_dir checkpoints/llama \
    --tokenizer_path checkpoints/llama/tokenizer.model \
    --model_size 7B

You can now run inference:

python scripts/generate.py --prompt "Hello, my name is"

This will run using the 7B model and will require roughly 26 GB of GPU memory (A100 GPU).

Run Lit-LLaMA on consumer devices

If you have a GPU with less memory, you can enable quantization with --quantize true which will take longer to load, but requires only ~8 GB of memory. It will run on any modern consumer GPU.

python scripts/generate.py --quantize true --prompt "Hello, my name is"

See python scripts/generate.py --help for more options.

Want to contribute?

We're in a quest towards fully open source AI, especially focusing on models in the 5-20B range, trained using the LLaMA approach (smaller models trained for longer).

Join us and start contributing, especially on the following areas:

Pre-training
Fine-tuning (full and LoRA)
Quantization
Sparsification

Look at train.py for a starting point towards pre-training / fine-tuning using Lightning Fabric.

Don't forget to join our Discord!

Acknowledgements

@karpathy for nanoGPT
@FacebookResearch for the original LLaMA implementation
@TimDettmers for bitsandbytes

License

Lit-LLaMA is released under the Apache 2.0 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ Lit-LLaMA ️

⚡ Lit-LLaMA ️

Why ⚡ Lit-LLaMA

Design principles

Contribute

Setup

Use the model

Run Lit-LLaMA on consumer devices

Want to contribute?

Acknowledgements

License

About

Releases

Packages

Languages

License

kp-forks/lit-parrot

Folders and files

Latest commit

History

Repository files navigation

⚡ Lit-LLaMA ️

⚡ Lit-LLaMA ️

Why ⚡ Lit-LLaMA

Design principles

Contribute

Setup

Use the model

Run Lit-LLaMA on consumer devices

Want to contribute?

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages