A RWKV-LM fork, added with LoRA finetuning support. Currently only RWKV-v4neo is supported. The LoRA module is self-implemented to work with the TorchScript JIT. Existing RWKV-v4neo models/checkpoints should work out of the box. Seperate storage of LoRA weights is not supported yet: this means the finetuned model contains all from the original model, thus self-contained but larger.
To finetune an existing model with LoRA, just work like full finetuning but with the LoRA options:
python3 train.py \
--load_model <pretrained base model> \
--proj_dir <place to save checkpoints> \
--data_file <data for finetune> \
--data_type <data type for finetune> \
--vocab_size 50277 --ctx_len 1024 --epoch_steps 1000 --epoch_count 1000 --epoch_begin 0 --epoch_save 5 --micro_bsz 2 --n_layer 24 --n_embd 1024 --pre_ffn 0 --head_qk 0 --lr_init 1e-5 --lr_final 1e-5 --warmup_steps 0 --beta1 0.9 --beta2 0.999 --adam_eps 1e-8 --accelerator gpu --devices 1 --precision bf16 --strategy deepspeed_stage_2 --grad_cp 0 \ # all your familiar options
--lora --lora_r 8 --lora_alpha 32 --lora_dropout 0.01
The r
, alpha
and dropout
options are up to your choice.
To use the finetuned model, use chat.py
as usual with the checkpoints in your specified proj_dir
, but remember to align the LoRA-corresponded options with what you have specified during training!
args.lora_r = 8
args.lora_alpha = 32
- Adaptor support
- Seperate model merging to allow LoRA pretrained models to be used with other RWKV inference implementation (especially ChatRWKV)