muP-autoregressive

This notebook contains the muP implementation suggested in this EleutherAI blog. Base transformer implementation is based on GPT2, provided in this Arena exercise.

In the modules.py, you can search for cfg.apply_muP string to see the changes between standard and muP implementation.

Usage

Run pip install -r requirements.txt
Create an .env file with fields HF_TOKEN, WANDB_API_KEY containing HuggingFace access token and W&B API keys.
To conduct coordinate check test, run ./run_grid_coord.sh
To start training over a grid of widths and learning rates, run ./run_grid_train.sh
To visualize W&B logs, check notebook.ipynb

To train and experiment with different configurations, check train.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.DS_Store		.DS_Store
README.md		README.md
modules.py		modules.py
notebook.ipynb		notebook.ipynb
params.py		params.py
plotting.py		plotting.py
requirements.txt		requirements.txt
run_grid_coord.sh		run_grid_coord.sh
run_grid_train.sh		run_grid_train.sh
train.py		train.py
trainer.py		trainer.py