Skip to content

Commit

Permalink
feat: remove llama as arguments and rename LlamaTokenizer to AutoToke…
Browse files Browse the repository at this point in the history
…nizer (alibaba#10)

Co-authored-by: xianyan.xianyanjia <[email protected]>
  • Loading branch information
SeaOfOcean and SeaOfOcean authored Sep 5, 2023
1 parent 233d34b commit 8b41131
Show file tree
Hide file tree
Showing 9 changed files with 36 additions and 31 deletions.
16 changes: 8 additions & 8 deletions docs/en/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Organize the question-response pairs of SFT data into a jsonl file, where each l
{'query': question, 'response': reply}
```

Taking the example of Anthropic's helpful&harmless data, use the following code to store it in `$DATASET_ROOT/sft/train.json`.
Taking the example of Anthropic's helpful&harmless data, use the following code to store it in `$DATASET_ROOT/sft/train.jsonl`.

```bash
cd ${CHATLEARN}/examples/megatron/step1_sft/
Expand All @@ -62,12 +62,12 @@ TRANSFORMERS_CKPT_PATH=path-to-transformer-model \
MEGATRON_LLAMA_CKPT_PATH=path-to-megatron-model \
TP=8 \
PP=1 \
bash examples/pai/llama/convert_transformers_to_megatron.sh
bash examples/pai/tools/convert_transformers_to_megatron.sh
```

## 1.3 Start SFT Training

[Aliyun PAI DLC](https://www.aliyun.com/activity/bigdata/pai-dlc)[2] can conveniently and efficiently support training for various tasks. The script below is an example of SFT training. The `DATASET_PATH` is the path to the SFT training set, such as `$DATASET_ROOT/sft/train.json`. In this example, we assume that the tokenizer's path is the same as the model checkpoint's path.
[Aliyun PAI DLC](https://www.aliyun.com/activity/bigdata/pai-dlc)[2] can conveniently and efficiently support training for various tasks. The script below is an example of SFT training. The `DATASET_PATH` is the path to the SFT training set, such as `$DATASET_ROOT/sft/train.jsonl`. In this example, we assume that the tokenizer's path is the same as the model checkpoint's path.

```bash
export CHATLEARN=path-to-chatlearn
Expand Down Expand Up @@ -105,15 +105,15 @@ The Reward model refers to the model that serves as a proxy for human evaluation

## 2.1 Prepare Training Data

1. First, prepare question-different response pairs and organize them into a json file. Each line in the json file represents a Reward model training data sample in the following Python dictionary format:
1. First, prepare question-different response pairs and organize them into a jsonl file. Each line in the jsonl file represents a Reward model training data sample in the following Python dictionary format:

```json
{'query': question, 'response': [reply 1, reply 2, ...], 'score': [score1, score2, ...]}
```

The score value indicates the quality of the corresponding response, with higher scores indicating higher quality and closer to human preference.

2. Taking the example of Anthropic's helpful&harmless data, use the following code to store it in `$DATASET_ROOT/rm/train.json` and `$DATASET_ROOT/rm/dev.json`.
2. Taking the example of Anthropic's helpful&harmless data, use the following code to store it in `$DATASET_ROOT/rm/train.jsonl` and `$DATASET_ROOT/rm/dev.jsonl`.

```bash
cd ${CHATLEARN}/examples/megatron/step2_reward/
Expand Down Expand Up @@ -150,7 +150,7 @@ RLHF refers to the process of trying different responses on a dataset consisting
{"prompt": prompt}
```

2. Taking Anthropic's helpful & harmless data as an example, use the following code to store the dataset in `$DATASET_ROOT/rlhf/train.json` and `$DATASET_ROOT/rlhf/dev.json`:
2. Taking Anthropic's helpful & harmless data as an example, use the following code to store the dataset in `$DATASET_ROOT/rlhf/train.jsonl` and `$DATASET_ROOT/rlhf/dev.jsonl`:
```bash
cd ${CHATLEARN}/examples/megatron/step3_rlhf/
DATASET_ROOT=path-to-dataset-root
Expand All @@ -163,7 +163,7 @@ python prepare_data.py $DATASET_ROOT
```bash
export CHATLEARN=path-to-chatlearn
export MEGATRON=path-to-megatron-lm-extension
export DATASET_PATH=$DATASET_ROOT/rlhf/train.json
export DATASET_PATH=$DATASET_ROOT/rlhf/train.jsonl

cd ${CHATLEARN}/examples/megatron/step3_rlhf

Expand Down Expand Up @@ -193,7 +193,7 @@ cd $MEGATRON
MEGATRON_CKPT_PATH=ckpt-to-rlhf-policy-ckpt \
VOCAB_FILE=path-to-vocab-file \
TRANSFORMERS_CKPT_PATH=path-to-transformers-ckpt-path \
bash examples/pai/llama/convert_megatron_to_tranformers.sh
bash examples/pai/tools/convert_megatron_to_tranformers.sh
```

We evaluated the performance of LLaMA on the HH dataset, both after SFT and RLHF, using the GPT-4 API provided by MT-Bench. The results show that RLHF improves the average performance of the model compared to SFT. There is a significant improvement in the domains of Humanities, Math, Roleplay, STEM, and Writing. The performance gains observed here are due to the use of a Reward model trained on the open-source HH dataset. Customizing the Reward model contributes to achieving better results.
Expand Down
18 changes: 9 additions & 9 deletions docs/zh/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ SFT 指的是使用有标注的对话数据来微调预训练语言模型的过
{'query': 问题,'response': 回复}
```

以 Anthropic 的 helpful&harmless 的数据为例,使用如下代码,会存一个 `$DATASET_ROOT/sft/train.json`.
以 Anthropic 的 helpful&harmless 的数据为例,使用如下代码,会存一个 `$DATASET_ROOT/sft/train.jsonl`.

```bash
cd ${CHATLEARN}/examples/megatron/step1_sft/
Expand All @@ -61,12 +61,12 @@ TRANSFORMERS_CKPT_PATH=path-to-transformer-model \
MEGATRON_LLAMA_CKPT_PATH=path-to-megatron-model \
TP=8 \
PP=1 \
bash examples/pai/llama/convert_transformers_to_megatron.sh
bash examples/pai/tools/convert_transformers_to_megatron.sh
```

## 1.3 开启 SFT 训练

[阿里云 PAI DLC](https://www.aliyun.com/activity/bigdata/pai-dlc)[2]可以非常便捷高效地支持各种任务的训练。下面的脚本是一个 SFT 的训练样例。其中 `DATASET_PATH` 为 SFT 训练集路径,比如`$DATASET_ROOT/sft/train.json`,在这个例子中,我们假设 tokenizer 存放的路径和模型 checkpoint 存放的路径相同。
[阿里云 PAI DLC](https://www.aliyun.com/activity/bigdata/pai-dlc)[2]可以非常便捷高效地支持各种任务的训练。下面的脚本是一个 SFT 的训练样例。其中 `DATASET_PATH` 为 SFT 训练集路径,比如`$DATASET_ROOT/sft/train.jsonl`,在这个例子中,我们假设 tokenizer 存放的路径和模型 checkpoint 存放的路径相同。

```bash
export CHATLEARN=path-to-chatlearn
Expand Down Expand Up @@ -103,15 +103,15 @@ Reward 模型指的是在 RLHF 中作为人类评价的代理,对模型产生

## 2.1 准备训练数据

1. 首先准备问题 - 不同回复配对的样本,整理到一个 json 文件中,其中 json 文件中每一行为一条 Reward 模型训练数据,形式为如下的 Python 字典格式:
1. 首先准备问题 - 不同回复配对的样本,整理到一个 jsonl 文件中,其中 jsonl 文件中每一行为一条 Reward 模型训练数据,形式为如下的 Python 字典格式:

```json
{'query': 问题,'response': [回复 1, 回复 2, .....], 'score': [score1, score2, .....]}
```

其中 score 的值越高意味着对应回复的质量越高,越贴近人类偏好。

2. 以 Anthropic 的 helpful&harmless 的数据为例,使用如下代码,会存一个`$DATASET_ROOT/rm/train.json和$DATASET_ROOT/rm/dev.json`.
2. 以 Anthropic 的 helpful&harmless 的数据为例,使用如下代码,会存一个 `$DATASET_ROOT/rm/train.jsonl``$DATASET_ROOT/rm/dev.jsonl`.

```bash
cd ${CHATLEARN}/examples/megatron/step2_reward/
Expand Down Expand Up @@ -141,13 +141,13 @@ bash llama_reward.sh
RLHF 指的是在一个只有指令的数据集上尝试不同的回复然后吸取 Reward 模型给不同回复的 reward 的监督信号的过程。
## 3.1 准备训练数据

1. 首先准备一个需要被探索的指令数据集,整理到一个 json 文件中,其中 json 文件中每一行为一条指令,格式为
1. 首先准备一个需要被探索的指令数据集,整理到一个 jsonl 文件中,其中 jsonl 文件中每一行为一条指令,格式为

```json
{"prompt": 问题}
```

2. 以 Anthropic 的 helpful&harmless 的数据为例,使用如下代码,会存一个`$DATASET_ROOT/rlhf/train.json``$DATASET_ROOT/rlhf/dev.json`
2. 以 Anthropic 的 helpful&harmless 的数据为例,使用如下代码,会存一个`$DATASET_ROOT/rlhf/train.jsonl``$DATASET_ROOT/rlhf/dev.jsonl`
```bash
cd ${CHATLEARN}/examples/megatron/step3_rlhf/
DATASET_ROOT=path-to-dataset-root
Expand All @@ -160,7 +160,7 @@ python prepare_data.py $DATASET_ROOT
```bash
export CHATLEARN=path-to-chatlearn
export MEGATRON=path-to-megatron-lm-extension
export DATASET_PATH=$DATASET_ROOT/rlhf/train.json
export DATASET_PATH=$DATASET_ROOT/rlhf/train.jsonl

cd ${CHATLEARN}/examples/megatron/step3_rlhf

Expand Down Expand Up @@ -188,7 +188,7 @@ cd $MEGATRON
MEGATRON_CKPT_PATH=ckpt-to-rlhf-policy-ckpt \
VOCAB_FILE=path-to-vocab-file \
TRANSFORMERS_CKPT_PATH=path-to-transformers-ckpt-path \
bash examples/pai/llama/convert_megatron_to_tranformers.sh
bash examples/pai/tools/convert_megatron_to_tranformers.sh
```

我们在 MT-Bench 上使用 GPT-4 API 测评了 LLaMA 在 HH 数据集上 SFT 后和 RLHF 后的效果,可以看到相比于 SFT 后的模型,RLHF 提升了模型的平均表现。且在 Humanities、Math、Roleplay、STEM、Writing 项上有显著的提升。我们这里的性能提升来自于开源 HH 数据集训练的 Reward 模型,使用用户自己定制的 Reward 模型有助于取得更好的效果。
Expand Down
2 changes: 1 addition & 1 deletion examples/megatron/models/old_policy_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ def build_dataset(self, train_prompts):
'''
args = get_args()
max_prompt_length = (
args.max_position_embeddings - args.max_new_tokens
args.seq_length - args.max_new_tokens
)
# TODO: read from files
prompts_dataset = PromptPipeline(
Expand Down
7 changes: 4 additions & 3 deletions examples/megatron/step1_sft/llama_sft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ gbs=$(($gbs * $dp))
mkdir -p $CHECKPOINT_PATH


MODEL_ARGS="--no-position-embedding --disable-bias-linear --swiglu --untie-embeddings-and-output-weights --use-rotary-position-embeddings --tokenizer-type AutoTokenizer"

log_file=$CHECKPOINT_PATH/stderr_$NODE_RANK.log

export CUDA_DEVICE_MAX_CONNECTIONS=1
Expand Down Expand Up @@ -106,8 +108,6 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
--init-method-std 0.006 \
--tensorboard-dir $CHECKPOINT_PATH \
--num-workers 8 \
--llama \
--tokenizer-type LLAMATokenizer \
--vocab-file $TOKENIZER_PATH \
--make-vocab-size-divisible-by 32 \
--ffn-hidden-size $INTERMEDIATE_SIZE \
Expand All @@ -122,4 +122,5 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
--bf16 \
--use-distributed-optimizer \
--adaptive-parallel-strategy-on-checkpoint \
--sequence-parallel 2>&1 | tee -a ${log_file}
--sequence-parallel \
$MODEL_ARGS 2>&1 | tee -a ${log_file}
10 changes: 6 additions & 4 deletions examples/megatron/step2_reward/llama_reward.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ pip install sentencepiece
[[ -z "${DATASET_PATH}" ]] && { echo "DATASET_PATH is not set"; exit 1; }


export PYTHONPATH=${PYTHONPATH}:${MEGATRON}:${CHATLEARN}/examples/megatron
export PYTHONPATH=${PYTHONPATH}:${MEGATRON}:${CHATLEARN}:${CHATLEARN}/examples/megatron

DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE \
--nnodes $WORLD_SIZE \
Expand Down Expand Up @@ -63,6 +63,9 @@ gbs=$(($gbs * $dp))

CHECKPOINT_PATH=${CHATLEARN}/output/step2_reward/llamasft_hh_rm_$(date +%F)_gpt_${MODEL_SIZE}_${NNODES}w${GPUS_PER_NODE}g_tp${tp}_pp${pp}_mb${mb}_seqlen${seq_len}


MODEL_ARGS="--no-position-embedding --disable-bias-linear --swiglu --untie-embeddings-and-output-weights --use-rotary-position-embeddings --tokenizer-type AutoTokenizer"

mkdir -p $CHECKPOINT_PATH

echo $PARALLEL_ARGS
Expand Down Expand Up @@ -107,8 +110,6 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
--init-method-std 0.006 \
--tensorboard-dir $CHECKPOINT_PATH \
--num-workers 8 \
--llama \
--tokenizer-type LLAMATokenizer \
--vocab-file $TOKENIZER_PATH \
--make-vocab-size-divisible-by 32 \
--ffn-hidden-size $INTERMEDIATE_SIZE \
Expand All @@ -122,4 +123,5 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
--use-flash-attn \
--bf16 \
--use-distributed-optimizer \
--sequence-parallel 2>&1 | tee -a ${log_file}
--sequence-parallel \
$MODEL_ARGS 2>&1 | tee -a ${log_file}
1 change: 0 additions & 1 deletion examples/megatron/step3_rlhf/configs/gpt/base.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ vocab_file: ${vocab_file}
merge_file: ${merge_file}
bf16: True
seq_length: ${max_seq_len}
out_seq_length: ${max_seq_len}
fix_kl_coef: ${fix_kl_coef:True}

log_dir: ${log_dir}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ num_attention_heads: ${reward_num_attention_heads}
use_distributed_optimizer: True
tensor_model_parallel_size: ${reward_tp}
pipeline_model_parallel_size: 1
out_seq_length: ${max_seq_len}
seq_length: ${max_seq_len}
max_position_embeddings: ${max_seq_len}

Expand Down
10 changes: 7 additions & 3 deletions examples/megatron/step3_rlhf/configs/llama/base.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
llama: True
tokenizer_type: LLAMATokenizer
# llama config
add_position_embedding: False
add_bias_linear: False
swiglu: True
use_rotary_position_embeddings: True
untie_embeddings_and_output_weights: True
tokenizer_type: AutoTokenizer

vocab_file: ${vocab_file}
num_layers: 40
hidden_size: 5120
num_attention_heads: 40
max_position_embeddings: 1024
bf16: True
seq_length: 1024
out_seq_length: 1024
fix_kl_coef: ${fix_kl_coef:True}
log_dir: ${log_dir}
exp_name: ${exp_name:test}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ includes:
- base_inference.yaml
- reward_shared.yaml

tokenizer_type: LLAMATokenizer
tokenizer_type: AutoTokenizer
reward_bias: 0

0 comments on commit 8b41131

Please sign in to comment.