Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
BlinkDL authored May 5, 2023
1 parent c3a5880 commit 254239b
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,6 @@ More RWKV projects: https://github.com/search?o=desc&q=rwkv&s=updated&type=Repos

A cool paper (Spiking Neural Network) using RWKV: https://github.com/ridgerchu/SpikeGPT

ChatRWKV with RWKV 14B ctx8192:

![RWKV-chat](RWKV-chat.png)

You are welcome to join the RWKV discord https://discord.gg/bDSBUMeFpc to build upon it. We have plenty of potential compute (A100 40Gs) now (thanks to Stability and EleutherAI), so if you have interesting ideas I can run them.

![RWKV-eval2](RWKV-eval2.png)
Expand All @@ -61,6 +57,10 @@ RWKV [loss vs token position] for 10000 ctx4k+ documents in Pile. RWKV 1B5-4k is

![RWKV-ctxlen](RWKV-ctxlen.png)

ChatRWKV with RWKV 14B ctx8192:

![RWKV-chat](RWKV-chat.png)

I believe RNN is a better candidate for fundamental models, because: (1) It's more friendly for ASICs (no kv cache). (2) It's more friendly for RL. (3) When we write, our brain is more similar to RNN. (4) The universe is like an RNN too (because of locality). Transformers are non-local models.

RWKV-3 1.5B on A40 (tf32) = always 0.015 sec/token, tested using simple pytorch code (no CUDA), GPU utilization 45%, VRAM 7823M
Expand Down

0 comments on commit 254239b

Please sign in to comment.