ALGPT2

Implementarion of two ALBERT techniques on GPT2 model:

Embeddings matrix factorization
Cross layer parameters sharing

Hugging Face Hub

Standard GPT2 pretrained on wikitext-103-v1:

Model Description	HuggingFace hub	Parameters Count	WANDB Alias
ALGPT2 with embeddings factorization and parameters sharing	Link	13,503,744	devilish-goosebump-96
ALGPT2 with only parameters sharing	Link	46,473,216	ritualistic-mummy-97
Standard GPT2 pretraining	Link	122,356,992	enchanted-bones-98

Technicalities

Trained on 2 x Nvidia RTX6000 Ada 48 GB
Batch size of 60
For 6 epochs
Learning rate of 6e-4
Tokenized the data using a BPE based tokenizer that we pretrained on the same dataset

Runbook

Training the tokenizer on wikitext variants

python train_tokenizer.py --dataset_path [wikitext-103-v1 or wikitext-2-v1]

Running the pretraining

python run_model.py --model_class_name [GPT2LMHeadModel or  ALGPT2LMHeadModel] --batch_size 60 --num_of_epochs 6  --sequence_max_length 256 --learning_rate 0.0006 --device gpu --save_steps 2000 --dataset_path [wikitext-103-v1 or wikitext-2-v1]   --tokenizer_path [wikitext-103-v1 or wikitext-2-v1] [--factorized_embeds]

Results

Eval mean perplexities

For the full pretraining report in WANDB click here

Additional variable depth experiment

In this experiment we check the effect of running the shared parameters layer for variable iterations differing from the trained default constant of 12 iterations (blocks). for the report in WANDB click here

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
readme-media		readme-media
results		results
.DS_Store		.DS_Store
README.md		README.md
main.py		main.py
modeling_algpt2.py		modeling_algpt2.py
push_to_hub.py		push_to_hub.py
requirements.txt		requirements.txt
run_model.py		run_model.py
train_tokenizer.py		train_tokenizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALGPT2

Hugging Face Hub

Technicalities

Runbook

Results

Eval mean perplexities

For the full pretraining report in WANDB click here

Additional variable depth experiment

About

Releases

Packages

Languages

tGhattas/ALGPT2

Folders and files

Latest commit

History

Repository files navigation

ALGPT2

Hugging Face Hub

Technicalities

Runbook

Results

Eval mean perplexities

For the full pretraining report in WANDB click here

Additional variable depth experiment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages