Skip to content

RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

License

Notifications You must be signed in to change notification settings

fengyunzaidushi/RWKV-LM-LoRA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LoRA fork of RWKV-LM

A RWKV-LM fork, added with LoRA finetuning support. Currently only RWKV-v4neo is supported. The LoRA module is self-implemented to work with the TorchScript JIT. Existing RWKV-v4neo models/checkpoints should work out of the box. Seperate storage of LoRA weights is not supported yet: this means the finetuned model contains all from the original model, thus self-contained but larger.

To finetune an existing model with LoRA, just work like full finetuning but with the LoRA options:

python3 train.py \
  --load_model <pretrained base model> \
  --proj_dir <place to save checkpoints> \
  --data_file <data for finetune> \
  --data_type <data type for finetune> \
  --vocab_size 50277 --ctx_len 1024 --epoch_steps 1000 --epoch_count 1000 --epoch_begin 0 --epoch_save 5 --micro_bsz 2 --n_layer 24 --n_embd 1024 --pre_ffn 0 --head_qk 0 --lr_init 1e-5 --lr_final 1e-5 --warmup_steps 0 --beta1 0.9 --beta2 0.999 --adam_eps 1e-8 --accelerator gpu --devices 1 --precision bf16 --strategy deepspeed_stage_2 --grad_cp 0 \ # all your familiar options
  --lora --lora_r 8 --lora_alpha 32 --lora_dropout 0.01

The r, alpha and dropout options are up to your choice.

To use the finetuned model, use chat.py as usual with the checkpoints in your specified proj_dir, but remember to align the LoRA-corresponded options with what you have specified during training!

args.lora_r = 8
args.lora_alpha = 32

TODOs

  • Adaptor support
  • Seperate model merging to allow LoRA pretrained models to be used with other RWKV inference implementation (especially ChatRWKV)

About

RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 93.0%
  • Cuda 5.7%
  • C++ 1.3%