diff --git a/README.md b/README.md index e0742f68..d53ff695 100644 --- a/README.md +++ b/README.md @@ -127,6 +127,8 @@ For the old RWKV-2: see the release here for a 27M params model on enwik8 with 0 ### Training / Fine-tuning +pip install deepspeed==0.7.0 // pip install pytorch-lightning==1.9.2 // torch 1.13.1+cu117 + **Training RWKV-4 from scratch:** run train.py, which by default is using the enwik8 dataset (unzip https://data.deepai.org/enwik8.zip). You will be training the "GPT" version because it's paralleziable and faster to train. RWKV-4 can extrapolate, so training with ctxLen 1024 can work for ctxLen of 2500+. You can fine-tune the model with longer ctxLen and it can quickly adapt to longer ctxLens.