- Clone this repository
git clone [email protected]:kazemnejad/len_gen_lm.git
- Create a conda environment
conda create -n len_gen_lm python=3.9
conda activate len_gen_lm
- Install requirements
# Install pytorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# Install other requirements
pip install -r requirements.txt
- Fill the environment variables in
env.sh
with
# We save checkpoints and logs here. It should be shared network storage accessible from all nodes.
export PROJECT_DIR=/path/to/network/storage/projects/len_gen_lm
# Go to comet.ml and get your API token
export COMET_API_KEY="..."
./run_training.sh <pe> <size>
<pe>
can be chosen from:
alibi
: Alibinone
: NoPE
<size>
can be chosen from:
100m
300m
1b
at least:
- CPU: 6 cores
- Memory: 32GB
it will use all gpus available on the node. So, the more gpus we have, the faster it will be.