Skip to content

Commit

Permalink
Add rename date
Browse files Browse the repository at this point in the history
  • Loading branch information
haileyschoelkopf authored Apr 4, 2023
1 parent 6f60f83 commit ceb8e52
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,10 @@ We also upload the pre-tokenized data files and a script to reconstruct the data
- We remedied a minor inconsistency that existed in the original suite: all models of size 2.8B parameters or smaller had a learning rate (LR) schedule which decayed to a minimum LR of 10% the starting LR rate, but the 6.9B and 12B models all used an LR schedule which decayed to a minimum LR of 0. In the redone training runs, we rectified this inconsistency: all models now were trained with LR decaying to a minimum of 0.1× their maximum LR.
- the new `EleutherAI/pythia-1b` is trained with bf16, because in fp16 the model corrupted due to loss spikes late in training.

The old models ("V0") remain available at [https://huggingface.co/models?other=pythia_v0](https://huggingface.co/models?other=pythia_v0).


The old models ("V0") are available at [https://huggingface.co/models?other=pythia_v0](https://huggingface.co/models?other=pythia_v0)
[January 20, 2023]
On January 20, 2023, we chose to rename the \textit{Pythia} model suite to better reflect including both embedding layer and unembedding layer parameters in our total parameter counts, in line with many other model suites and because we believe this convention better reflects the on-device memory usage of these models. See [https://huggingface.co/EleutherAI/pythia-410m-deduped#naming-convention-and-parameter-count](https://huggingface.co/EleutherAI/pythia-410m-deduped#naming-convention-and-parameter-count) for more details

## Quickstart

Expand Down

0 comments on commit ceb8e52

Please sign in to comment.