Update README.md

Core00-GS · Mar 18, 2024 · 389fe96 · 389fe96
1 parent 2923b21
commit 389fe96
Showing 1 changed file with 4 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -32,9 +32,11 @@ Use https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v5/make_data.py to prepare
 
 The "epoch" in train.py is "mini-epoch" (not real epoch. only for convenience), and 1 mini-epoch = 40320 * ctx_len tokens.
 
-For example, if your binidx has 1498226207 tokens and ctxlen=4096, set "--my_exit_tokens 1498226207" (this will override epoch_count), and it will be 1498226207/(40320 * 4096) = 9.07 miniepochs. The trained will auto-exit after "--my_exit_tokens" tokens.
+For example, if your binidx has 1498226207 tokens and ctxlen=4096, set "--my_exit_tokens 1498226207" (this will override epoch_count), and it will be 1498226207/(40320 * 4096) = 9.07 miniepochs. The trained will auto-exit after "--my_exit_tokens" tokens. Set "--magic_prime" to the largest 3n+2 prime smaller than datalen/ctxlen-1 (= 1498226207/4096-1 = 365776), which is "--magic_prime 365759" in this case.
 
-Set "--magic_prime" to the largest 3n+2 prime smaller than datalen/ctxlen-1 (= 1498226207/4096-1 = 365776), which is "--magic_prime 365759" in this case.
+simple: repeat your SFT data 3 or 4 times in make_data.py. more repetition leads to overfitting.
+
+advanced: repeat your SFT data 3 or 4 times in your jsonl (note make_data.py will shuffle all jsonl items), add some base data (such as slimpajama) to your jsonl, and only repeat 1 times in make_data.py.
 
 **Train RWKV-6**: use /RWKV-v5/ and add --my_testing "x060" to demo-training-prepare.sh and demo-training-run.sh