This notebook contains the muP implementation suggested in this EleutherAI blog. Base transformer implementation is based on GPT2, provided in this Arena exercise.
In the modules.py, you can search for cfg.apply_muP
string to see the changes between standard and muP implementation.
- Run
pip install -r requirements.txt
- Create an
.env
file with fieldsHF_TOKEN
,WANDB_API_KEY
containing HuggingFace access token and W&B API keys. - To conduct coordinate check test, run
./run_grid_coord.sh
- To start training over a grid of widths and learning rates, run
./run_grid_train.sh
- To visualize W&B logs, check notebook.ipynb
To train and experiment with different configurations, check train.py