Skip to content

Commit

Permalink
Pythia 14M and 31M (Lightning-AI#783)
Browse files Browse the repository at this point in the history
  • Loading branch information
Andrei-Aksionov authored Nov 27, 2023
1 parent e05fc4a commit 409cee8
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Supports the following popular model checkpoints:
| LMSYS [Vicuna](tutorials/download_vicuna.md) | 7B, 13B, 33B | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/) |
| LMSYS [LongChat](tutorials/download_longchat.md) | 7B, 13B | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/) |
| Together [RedPajama-INCITE](tutorials/download_redpajama_incite.md) | 3B, 7B | [Together 2023](https://together.ai/blog/redpajama-models-v1) |
| EleutherAI [Pythia](tutorials/download_pythia.md) | {70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373) |
| EleutherAI [Pythia](tutorials/download_pythia.md) | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373) |
| StabilityAI [StableLM](tutorials/download_stablelm.md) | 3B, 7B | [Stability AI 2023](https://github.com/Stability-AI/StableLM) |
| Platypus | 7B, 13B, 70B | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317) |
| NousResearch Nous-Hermes | 7B, 13B, 70B | [Org page](https://huggingface.co/NousResearch) |
Expand Down
23 changes: 23 additions & 0 deletions lit_gpt/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,26 @@ def norm_class(self) -> Type:
# EleutherAI Pythia
####################
pythia = [
# https://huggingface.co/EleutherAI/pythia-14m/blob/main/config.json
dict(
name="pythia-14m",
hf_config=dict(org="EleutherAI", name="pythia-14m"),
block_size=512,
n_layer=6,
n_embd=128,
n_head=4,
padding_multiple=128,
),
# https://huggingface.co/EleutherAI/pythia-31m/blob/main/config.json
dict(
name="pythia-31m",
hf_config=dict(org="EleutherAI", name="pythia-31m"),
block_size=1024,
n_layer=6,
n_embd=256,
n_head=8,
padding_multiple=128,
),
# https://huggingface.co/EleutherAI/pythia-70m/blob/main/config.json
dict(
name="pythia-70m",
Expand Down Expand Up @@ -248,6 +268,9 @@ def norm_class(self) -> Type:
]
configs.extend(pythia)
for c in pythia:
# "pythia-14m" and "pythia-31m" don't have deduped version
if c["name"] in ("pythia-14m", "pythia-31m"):
continue
copy = deepcopy(c)
copy["name"] = f"{c['name']}-deduped"
copy["hf_config"]["name"] = f"{c['hf_config']['name']}-deduped"
Expand Down
4 changes: 3 additions & 1 deletion tutorials/download_pythia.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
EleutherAI's project Pythia combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers. Weights are released under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).

For detailed info on the models, their training, and their behavior, please see the [Pythia repository](https://github.com/EleutherAI/pythia).
It includes a suite of 8 checkpoints (weights) on 2 different datasets: [The Pile](https://pile.eleuther.ai/), as well as The Pile with deduplication applied.
It includes a suite of 8 checkpoints (weights) on 2 different datasets: [The Pile](https://pile.eleuther.ai/), as well as The Pile with deduplication applied. In addition there are two small models that come only in non-deduplicated form: `Pythia-14m` and `Pythia-31m`.

To see all the available checkpoints for Pythia, run:

Expand All @@ -14,6 +14,8 @@ python scripts/download.py | grep pythia
which will print

```text
EleutherAI/pythia-14m
EleutherAI/pythia-31m
EleutherAI/pythia-70m
EleutherAI/pythia-160m
EleutherAI/pythia-410m
Expand Down

0 comments on commit 409cee8

Please sign in to comment.