diff --git a/README.md b/README.md
index e0742f68..d53ff695 100644
--- a/README.md
+++ b/README.md
@@ -127,6 +127,8 @@ For the old RWKV-2: see the release here for a 27M params model on enwik8 with 0
 
 ### Training / Fine-tuning
 
+pip install deepspeed==0.7.0 // pip install pytorch-lightning==1.9.2 // torch 1.13.1+cu117
+
 **Training RWKV-4 from scratch:** run train.py, which by default is using the enwik8 dataset (unzip https://data.deepai.org/enwik8.zip).
 
 You will be training the "GPT" version because it's paralleziable and faster to train. RWKV-4 can extrapolate, so training with ctxLen 1024 can work for ctxLen of 2500+. You can fine-tune the model with longer ctxLen and it can quickly adapt to longer ctxLens.