feat: remove llama as arguments and rename LlamaTokenizer to AutoToke…

…nizer (alibaba#10) Co-authored-by: xianyan.xianyanjia <[email protected]>
lwmlyy · Sep 5, 2023 · 8b41131 · 8b41131
1 parent 233d34b
commit 8b41131
Show file tree

Hide file tree

Showing 9 changed files with 36 additions and 31 deletions.
diff --git a/docs/en/tutorial.md b/docs/en/tutorial.md
@@ -40,7 +40,7 @@ Organize the question-response pairs of SFT data into a jsonl file, where each l
 {'query': question, 'response': reply}
 ```
 
-Taking the example of Anthropic's helpful&harmless data, use the following code to store it in `$DATASET_ROOT/sft/train.json`.
+Taking the example of Anthropic's helpful&harmless data, use the following code to store it in `$DATASET_ROOT/sft/train.jsonl`.
 
 ```bash
 cd ${CHATLEARN}/examples/megatron/step1_sft/
@@ -62,12 +62,12 @@ TRANSFORMERS_CKPT_PATH=path-to-transformer-model \
 MEGATRON_LLAMA_CKPT_PATH=path-to-megatron-model \
 TP=8 \
 PP=1 \
-bash examples/pai/llama/convert_transformers_to_megatron.sh
+bash examples/pai/tools/convert_transformers_to_megatron.sh
 ```
 
 ## 1.3 Start SFT Training
 
-[Aliyun PAI DLC](https://www.aliyun.com/activity/bigdata/pai-dlc)[2] can conveniently and efficiently support training for various tasks. The script below is an example of SFT training. The `DATASET_PATH` is the path to the SFT training set, such as `$DATASET_ROOT/sft/train.json`. In this example, we assume that the tokenizer's path is the same as the model checkpoint's path.
+[Aliyun PAI DLC](https://www.aliyun.com/activity/bigdata/pai-dlc)[2] can conveniently and efficiently support training for various tasks. The script below is an example of SFT training. The `DATASET_PATH` is the path to the SFT training set, such as `$DATASET_ROOT/sft/train.jsonl`. In this example, we assume that the tokenizer's path is the same as the model checkpoint's path.
 
 ```bash
 export CHATLEARN=path-to-chatlearn
@@ -105,15 +105,15 @@ The Reward model refers to the model that serves as a proxy for human evaluation
 
 ## 2.1 Prepare Training Data
 
-1. First, prepare question-different response pairs and organize them into a json file. Each line in the json file represents a Reward model training data sample in the following Python dictionary format:
+1. First, prepare question-different response pairs and organize them into a jsonl file. Each line in the jsonl file represents a Reward model training data sample in the following Python dictionary format:
 
 ```json
 {'query': question, 'response': [reply 1, reply 2, ...], 'score': [score1, score2, ...]}
 ```
 
 The score value indicates the quality of the corresponding response, with higher scores indicating higher quality and closer to human preference.
 
-2. Taking the example of Anthropic's helpful&harmless data, use the following code to store it in `$DATASET_ROOT/rm/train.json` and `$DATASET_ROOT/rm/dev.json`.
+2. Taking the example of Anthropic's helpful&harmless data, use the following code to store it in `$DATASET_ROOT/rm/train.jsonl` and `$DATASET_ROOT/rm/dev.jsonl`.
 
 ```bash
 cd ${CHATLEARN}/examples/megatron/step2_reward/
@@ -150,7 +150,7 @@ RLHF refers to the process of trying different responses on a dataset consisting
 {"prompt": prompt}
 ```
 
-2. Taking Anthropic's helpful & harmless data as an example, use the following code to store the dataset in `$DATASET_ROOT/rlhf/train.json` and `$DATASET_ROOT/rlhf/dev.json`:
+2. Taking Anthropic's helpful & harmless data as an example, use the following code to store the dataset in `$DATASET_ROOT/rlhf/train.jsonl` and `$DATASET_ROOT/rlhf/dev.jsonl`:
 ```bash
 cd ${CHATLEARN}/examples/megatron/step3_rlhf/
 DATASET_ROOT=path-to-dataset-root
@@ -163,7 +163,7 @@ python prepare_data.py $DATASET_ROOT
 ```bash
 export CHATLEARN=path-to-chatlearn
 export MEGATRON=path-to-megatron-lm-extension
-export DATASET_PATH=$DATASET_ROOT/rlhf/train.json
+export DATASET_PATH=$DATASET_ROOT/rlhf/train.jsonl
 
 cd ${CHATLEARN}/examples/megatron/step3_rlhf
 
@@ -193,7 +193,7 @@ cd $MEGATRON
 MEGATRON_CKPT_PATH=ckpt-to-rlhf-policy-ckpt \
 VOCAB_FILE=path-to-vocab-file \
 TRANSFORMERS_CKPT_PATH=path-to-transformers-ckpt-path \
-bash examples/pai/llama/convert_megatron_to_tranformers.sh
+bash examples/pai/tools/convert_megatron_to_tranformers.sh
 ```
 
 We evaluated the performance of LLaMA on the HH dataset, both after SFT and RLHF, using the GPT-4 API provided by MT-Bench. The results show that RLHF improves the average performance of the model compared to SFT. There is a significant improvement in the domains of Humanities, Math, Roleplay, STEM, and Writing. The performance gains observed here are due to the use of a Reward model trained on the open-source HH dataset. Customizing the Reward model contributes to achieving better results.

diff --git a/docs/zh/tutorial.md b/docs/zh/tutorial.md
@@ -40,7 +40,7 @@ SFT 指的是使用有标注的对话数据来微调预训练语言模型的过
 {'query': 问题，'response': 回复}
 ```
 
-以 Anthropic 的 helpful&harmless 的数据为例，使用如下代码，会存一个 `$DATASET_ROOT/sft/train.json`.
+以 Anthropic 的 helpful&harmless 的数据为例，使用如下代码，会存一个 `$DATASET_ROOT/sft/train.jsonl`.
 
 ```bash
 cd ${CHATLEARN}/examples/megatron/step1_sft/
@@ -61,12 +61,12 @@ TRANSFORMERS_CKPT_PATH=path-to-transformer-model \
 MEGATRON_LLAMA_CKPT_PATH=path-to-megatron-model \
 TP=8 \
 PP=1 \
-bash examples/pai/llama/convert_transformers_to_megatron.sh
+bash examples/pai/tools/convert_transformers_to_megatron.sh
 ```
 
 ## 1.3 开启 SFT 训练
 
-[阿里云 PAI DLC](https://www.aliyun.com/activity/bigdata/pai-dlc)[2]可以非常便捷高效地支持各种任务的训练。下面的脚本是一个 SFT 的训练样例。其中 `DATASET_PATH` 为 SFT 训练集路径，比如`$DATASET_ROOT/sft/train.json`，在这个例子中，我们假设 tokenizer 存放的路径和模型 checkpoint 存放的路径相同。
+[阿里云 PAI DLC](https://www.aliyun.com/activity/bigdata/pai-dlc)[2]可以非常便捷高效地支持各种任务的训练。下面的脚本是一个 SFT 的训练样例。其中 `DATASET_PATH` 为 SFT 训练集路径，比如`$DATASET_ROOT/sft/train.jsonl`，在这个例子中，我们假设 tokenizer 存放的路径和模型 checkpoint 存放的路径相同。
 
 ```bash
 export CHATLEARN=path-to-chatlearn
@@ -103,15 +103,15 @@ Reward 模型指的是在 RLHF 中作为人类评价的代理，对模型产生
 
 ## 2.1 准备训练数据
 
-1. 首先准备问题 - 不同回复配对的样本，整理到一个 json 文件中，其中 json 文件中每一行为一条 Reward 模型训练数据，形式为如下的 Python 字典格式：
+1. 首先准备问题 - 不同回复配对的样本，整理到一个 jsonl 文件中，其中 jsonl 文件中每一行为一条 Reward 模型训练数据，形式为如下的 Python 字典格式：
 
 ```json
 {'query': 问题，'response': [回复 1, 回复 2, .....], 'score': [score1, score2, .....]}
 ```
 
 其中 score 的值越高意味着对应回复的质量越高，越贴近人类偏好。
 
-2. 以 Anthropic 的 helpful&harmless 的数据为例，使用如下代码，会存一个`$DATASET_ROOT/rm/train.json和$DATASET_ROOT/rm/dev.json`.
+2. 以 Anthropic 的 helpful&harmless 的数据为例，使用如下代码，会存一个 `$DATASET_ROOT/rm/train.jsonl` 和 `$DATASET_ROOT/rm/dev.jsonl`.
 
 ```bash
 cd ${CHATLEARN}/examples/megatron/step2_reward/
@@ -141,13 +141,13 @@ bash llama_reward.sh
 RLHF 指的是在一个只有指令的数据集上尝试不同的回复然后吸取 Reward 模型给不同回复的 reward 的监督信号的过程。
 ## 3.1 准备训练数据
 
-1. 首先准备一个需要被探索的指令数据集，整理到一个 json 文件中，其中 json 文件中每一行为一条指令，格式为
+1. 首先准备一个需要被探索的指令数据集，整理到一个 jsonl 文件中，其中 jsonl 文件中每一行为一条指令，格式为
 
 ```json
 {"prompt": 问题}
 ```
 
-2. 以 Anthropic 的 helpful&harmless 的数据为例，使用如下代码，会存一个`$DATASET_ROOT/rlhf/train.json` 和`$DATASET_ROOT/rlhf/dev.json`：
+2. 以 Anthropic 的 helpful&harmless 的数据为例，使用如下代码，会存一个`$DATASET_ROOT/rlhf/train.jsonl` 和`$DATASET_ROOT/rlhf/dev.jsonl`：
 ```bash
 cd ${CHATLEARN}/examples/megatron/step3_rlhf/
 DATASET_ROOT=path-to-dataset-root
@@ -160,7 +160,7 @@ python prepare_data.py $DATASET_ROOT
 ```bash
 export CHATLEARN=path-to-chatlearn
 export MEGATRON=path-to-megatron-lm-extension
-export DATASET_PATH=$DATASET_ROOT/rlhf/train.json
+export DATASET_PATH=$DATASET_ROOT/rlhf/train.jsonl
 
 cd ${CHATLEARN}/examples/megatron/step3_rlhf
 
@@ -188,7 +188,7 @@ cd $MEGATRON
 MEGATRON_CKPT_PATH=ckpt-to-rlhf-policy-ckpt \
 VOCAB_FILE=path-to-vocab-file \
 TRANSFORMERS_CKPT_PATH=path-to-transformers-ckpt-path \
-bash examples/pai/llama/convert_megatron_to_tranformers.sh
+bash examples/pai/tools/convert_megatron_to_tranformers.sh
 ```
 
 我们在 MT-Bench 上使用 GPT-4 API 测评了 LLaMA 在 HH 数据集上 SFT 后和 RLHF 后的效果，可以看到相比于 SFT 后的模型，RLHF 提升了模型的平均表现。且在 Humanities、Math、Roleplay、STEM、Writing 项上有显著的提升。我们这里的性能提升来自于开源 HH 数据集训练的 Reward 模型，使用用户自己定制的 Reward 模型有助于取得更好的效果。

diff --git a/examples/megatron/models/old_policy_inference.py b/examples/megatron/models/old_policy_inference.py
@@ -88,7 +88,7 @@ def build_dataset(self, train_prompts):
         '''
         args = get_args()
         max_prompt_length = (
-            args.max_position_embeddings - args.max_new_tokens
+            args.seq_length - args.max_new_tokens
         )
         # TODO: read from files
         prompts_dataset = PromptPipeline(

diff --git a/examples/megatron/step1_sft/llama_sft.sh b/examples/megatron/step1_sft/llama_sft.sh
@@ -69,6 +69,8 @@ gbs=$(($gbs * $dp))
 mkdir -p $CHECKPOINT_PATH
 
 
+MODEL_ARGS="--no-position-embedding --disable-bias-linear --swiglu --untie-embeddings-and-output-weights --use-rotary-position-embeddings --tokenizer-type AutoTokenizer"
+
 log_file=$CHECKPOINT_PATH/stderr_$NODE_RANK.log
 
 export CUDA_DEVICE_MAX_CONNECTIONS=1
@@ -106,8 +108,6 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
   --init-method-std 0.006 \
   --tensorboard-dir $CHECKPOINT_PATH \
   --num-workers 8 \
-  --llama \
-  --tokenizer-type LLAMATokenizer \
   --vocab-file $TOKENIZER_PATH \
   --make-vocab-size-divisible-by 32 \
   --ffn-hidden-size $INTERMEDIATE_SIZE \
@@ -122,4 +122,5 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
   --bf16 \
   --use-distributed-optimizer \
   --adaptive-parallel-strategy-on-checkpoint \
-  --sequence-parallel  2>&1 | tee -a ${log_file}
+  --sequence-parallel  \
+  $MODEL_ARGS 2>&1 | tee -a ${log_file}
diff --git a/examples/megatron/step2_reward/llama_reward.sh b/examples/megatron/step2_reward/llama_reward.sh
@@ -16,7 +16,7 @@ pip install sentencepiece
 [[ -z "${DATASET_PATH}" ]] && { echo "DATASET_PATH is not set"; exit 1; }
 
 
-export PYTHONPATH=${PYTHONPATH}:${MEGATRON}:${CHATLEARN}/examples/megatron
+export PYTHONPATH=${PYTHONPATH}:${MEGATRON}:${CHATLEARN}:${CHATLEARN}/examples/megatron
 
 DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE \
                   --nnodes $WORLD_SIZE \
@@ -63,6 +63,9 @@ gbs=$(($gbs * $dp))
 
 CHECKPOINT_PATH=${CHATLEARN}/output/step2_reward/llamasft_hh_rm_$(date +%F)_gpt_${MODEL_SIZE}_${NNODES}w${GPUS_PER_NODE}g_tp${tp}_pp${pp}_mb${mb}_seqlen${seq_len}
 
+
+MODEL_ARGS="--no-position-embedding --disable-bias-linear --swiglu --untie-embeddings-and-output-weights --use-rotary-position-embeddings --tokenizer-type AutoTokenizer"
+
 mkdir -p $CHECKPOINT_PATH
 
 echo $PARALLEL_ARGS
@@ -107,8 +110,6 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
   --init-method-std 0.006 \
   --tensorboard-dir $CHECKPOINT_PATH \
   --num-workers 8 \
-  --llama \
-  --tokenizer-type LLAMATokenizer \
   --vocab-file $TOKENIZER_PATH \
   --make-vocab-size-divisible-by 32 \
   --ffn-hidden-size $INTERMEDIATE_SIZE \
@@ -122,4 +123,5 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
   --use-flash-attn \
   --bf16 \
   --use-distributed-optimizer \
-  --sequence-parallel  2>&1 | tee -a ${log_file}
+  --sequence-parallel  \
+  $MODEL_ARGS 2>&1 | tee -a ${log_file}
diff --git a/examples/megatron/step3_rlhf/configs/gpt/base.yaml b/examples/megatron/step3_rlhf/configs/gpt/base.yaml
@@ -4,7 +4,6 @@ vocab_file: ${vocab_file}
 merge_file: ${merge_file}
 bf16: True
 seq_length: ${max_seq_len}
-out_seq_length: ${max_seq_len}
 fix_kl_coef: ${fix_kl_coef:True}
 
 log_dir: ${log_dir}

diff --git a/examples/megatron/step3_rlhf/configs/gpt/reward_shared.yaml b/examples/megatron/step3_rlhf/configs/gpt/reward_shared.yaml
@@ -6,7 +6,6 @@ num_attention_heads: ${reward_num_attention_heads}
 use_distributed_optimizer: True
 tensor_model_parallel_size: ${reward_tp}
 pipeline_model_parallel_size: 1
-out_seq_length: ${max_seq_len}
 seq_length: ${max_seq_len}
 max_position_embeddings: ${max_seq_len}
 

diff --git a/examples/megatron/step3_rlhf/configs/llama/base.yaml b/examples/megatron/step3_rlhf/configs/llama/base.yaml
@@ -1,14 +1,18 @@
-llama: True
-tokenizer_type: LLAMATokenizer
+# llama config
+add_position_embedding: False
+add_bias_linear: False
+swiglu: True
+use_rotary_position_embeddings: True
 untie_embeddings_and_output_weights: True
+tokenizer_type: AutoTokenizer
+
 vocab_file: ${vocab_file}
 num_layers: 40
 hidden_size: 5120
 num_attention_heads: 40
 max_position_embeddings: 1024
 bf16: True
 seq_length: 1024
-out_seq_length: 1024
 fix_kl_coef: ${fix_kl_coef:True}
 log_dir: ${log_dir}
 exp_name: ${exp_name:test}

diff --git a/examples/megatron/step3_rlhf/configs/llama/reward_inference.yaml b/examples/megatron/step3_rlhf/configs/llama/reward_inference.yaml
@@ -2,5 +2,5 @@ includes:
   - base_inference.yaml
   - reward_shared.yaml
 
-tokenizer_type: LLAMATokenizer
+tokenizer_type: AutoTokenizer
 reward_bias: 0