Add TinyLlama model config (Lightning-AI#671)

Co-authored-by: Carlos Mocholí <[email protected]>
nucleons · Oct 25, 2023 · 41e287d · 41e287d
1 parent b23eac5
commit 41e287d
Show file tree

Hide file tree

Showing 3 changed files with 52 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -43,6 +43,7 @@ Supports the following popular model checkpoints:
 | Meta AI [Code Llama](tutorials/download_code_llama.md)                         | 7B, 13B, 34B                       | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950)                                          |
 | Microsoft Research [phi-1.5](tutorials/download_phi15.md)                      | 1.3B                               | [Li et al. 2023](https://arxiv.org/abs/2309.05463)                                               |
 | Mistral AI [Mistral](tutorials/download_mistral.md)                            | 7B                                 | [Mistral  website](https://mistral.ai/)                                                          |
+| [TinyLlama](tutorials/download_mistral.md)                                     | 1.1B                               | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama)
 
 This implementation extends on [Lit-LLaMA](https://github.com/lightning-AI/lit-llama) and [nanoGPT](https://github.com/karpathy/nanoGPT), and it's **powered by [Lightning Fabric](https://lightning.ai/docs/fabric/stable/) ⚡**.
 

diff --git a/lit_gpt/config.py b/lit_gpt/config.py
@@ -1037,4 +1037,30 @@ def norm_class(self) -> Type:
         configs.append(copy)
 
 
+############
+# TinyLlama
+############
+tiny_llama = [
+    dict(
+        org="PY007",  # TODO: update this to the real organization
+        name="TinyLlama-1.1B-intermediate-step-480k-1T",  # TODO: make this a short name: tiny-llama-1b
+        block_size=2048,
+        vocab_size=32000,
+        padding_multiple=64,
+        n_layer=22,
+        n_head=32,
+        n_embd=2048,
+        rotary_percentage=1.0,
+        parallel_residual=False,
+        bias=False,
+        _norm_class="RMSNorm",  # original TinyLlama uses FusedRMSNorm
+        norm_eps=1e-5,
+        _mlp_class="LLaMAMLP",
+        intermediate_size=5632,
+        n_query_groups=4,
+    ),
+]
+configs.extend(tiny_llama)
+
+
 name_to_config = {config["name"]: config for config in configs}
diff --git a/tutorials/download_tinyllama.md b/tutorials/download_tinyllama.md
@@ -0,0 +1,25 @@
+## Download TinyLlama weights
+
+[TinyLlama 1.1B](https://github.com/jzhang38/TinyLlama/) is Apache 2.0 licensed and can be used without restrictions.
+It is still in development and at the time of writing this, checkpoints for the model trained up to 1T tokens are available.
+The target is to train it for ~3 epochs on 3T tokens total. For more details on the schedule and progress of the pretraining, see the official [README](https://github.com/jzhang38/TinyLlama/tree/main).
+
+
+In order to use the TinyLLama 1.1B model checkpoint, which requires about 5 GB of disk space, download the weights and convert the checkpoint to the lit-gpt format:
+
+```bash
+pip install huggingface_hub
+
+python scripts/download.py --repo_id PY007/TinyLlama-1.1B-intermediate-step-480k-1T
+
+python scripts/convert_hf_checkpoint.py \
+    --checkpoint_dir checkpoints/PY007/TinyLlama-1.1B-intermediate-step-480k-1T
+```
+
+You're done! To execute the model just run:
+
+```bash
+pip install sentencepiece
+
+python chat/base.py --checkpoint_dir checkpoints/PY007/TinyLlama-1.1B-intermediate-step-480k-1T
+```