Merge pull request FlagAI-Open#370 from Anhforth/merge_aquila

Merge aquila
jkhomme4 · Jun 12, 2023 · 72ed48b · 72ed48b
2 parents 1ac57a0 + 98e8a84
commit 72ed48b
Show file tree

Hide file tree

Showing 10 changed files with 48 additions and 32 deletions.
diff --git a/README.md b/README.md
@@ -142,14 +142,14 @@ git clone https://github.com/NVIDIA/apex
 cd apex
 pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
 ```
-- [Optional] For ZeRO optimizers, install [DEEPSPEED](https://github.com/microsoft/DeepSpeed)
+- [Optional] For ZeRO optimizers, install [DEEPSPEED](https://github.com/microsoft/DeepSpeed) (>= 0.7.7)
 ```
 git clone https://github.com/microsoft/DeepSpeed
 cd DeepSpeed
 DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e .
 ds_report # check the deespeed status
 ```
-- [Optional] For BMTrain training, install [BMTrain](https://github.com/OpenBMB/BMTrain)
+- [Optional] For BMTrain training, install [BMTrain](https://github.com/OpenBMB/BMTrain) (>= 0.2.2)
 ```
 git clone https://github.com/OpenBMB/BMTrain
 cd BMTrain
@@ -160,7 +160,7 @@ python setup.py install
 pip install bminf
 
 ```
-- [Optional] For Flash Attention, install [Flash-attention](https://github.com/HazyResearch/flash-attention)
+- [Optional] For Flash Attention, install [Flash-attention](https://github.com/HazyResearch/flash-attention) (>=1.0.2)
 ```
 pip install flash-attn
 ```

diff --git a/README_zh.md b/README_zh.md
@@ -131,14 +131,14 @@ git clone https://github.com/NVIDIA/apex
 cd apex
 pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
 ```
-- [可选] 使用 ZeRO 优化器，需要安装 [DEEPSPEED](https://github.com/microsoft/DeepSpeed)
+- [可选] 使用 ZeRO 优化器，需要安装 [DEEPSPEED](https://github.com/microsoft/DeepSpeed) (>= 0.7.7)
 ```
 git clone https://github.com/microsoft/DeepSpeed
 cd DeepSpeed
 DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e .
 ds_report # 检查deepspeed的状态
 ```
-- [可选] 开启BMTrain训练，需要安装 [BMTrain](https://github.com/OpenBMB/BMTrain)
+- [可选] 开启BMTrain训练，需要安装 [BMTrain](https://github.com/OpenBMB/BMTrain) ((>= 0.2.2))
 ```
 git clone https://github.com/OpenBMB/BMTrain
 cd BMTrain
@@ -150,7 +150,7 @@ python setup.py install
 pip install bminf
 
 ```
-- [可选] 对于FlashAttention, 需要安装[Flash-attention](https://github.com/HazyResearch/flash-attention)
+- [可选] 对于FlashAttention, 需要安装[Flash-attention](https://github.com/HazyResearch/flash-attention) （>=1.0.2）
 ```
 pip install flash-attn
 ```

diff --git a/examples/Aquila/Aquila-chat/Aquila-chat.yaml b/examples/Aquila/Aquila-chat/Aquila-chat.yaml
@@ -8,5 +8,6 @@ bmt_pre_load: True
 
 save_optim: True
 save_rng: True
+lora: True
 enable_sft_dataset_dir: '../datasets/'
 enable_sft_dataset_file: 'convo_v2.jsonl'
diff --git a/examples/Aquila/Aquila-chat/README.md b/examples/Aquila/Aquila-chat/README.md
@@ -11,9 +11,13 @@ AquilaChat-7B是在Aquila-7B模型的基础上，进行SFT微调后的支持中
 AquilaChat-7B is a conversational language model that supports Chinese-English dialogue. It is based on the Aquila-7B model and fine-tuned using SFT. AquilaChat-7B model was developed by Beijing Academy of Artificial Intelligence. 
 
 
-我们的模型也同时支持[Huggingface平台](https://huggingface.co/BAAI)。
+<!-- 我们的模型也同时支持[Huggingface平台](https://huggingface.co/BAAI)。
 
-We also support [Huggingface](https://huggingface.co/BAAI).
+We also support [Huggingface](https://huggingface.co/BAAI). -->
+
+运行Aquila-7B系列需要内存30G, 显存18G，生成最大长度200 token。
+
+To run the Aquila-7b series, you need at least 30GB of memory and 18GB of GPU memory, and the maximum length of text generated should be 200 tokens.
 
 AquilaChat模型主要为了验证基础模型能力，您可以根据自己需要对模型进行使用，修改和商业化，但必须遵守所有国家的法律法规，并且对任何第三方使用者提供Aquila系列模型的来源以及Aquila系列模型协议的副本。
 
@@ -45,21 +49,19 @@ The tokenizer used in the Aquila model was trained from scratch by us and suppor
 | LlaMA | 32000 | sp(bpe)|1805| 1257|1970 |
 | Aquila | 100000 | bpe|1575 | 477|1679 |
 
-Aquila系列模型均可在24G显卡上运行。
-
-The Aquila series models can all run on a 24GB graphics card.
-
 ## 训练数据集/Training data 
 
-我们采用了一系列高质量中英文数据集来训练和微调我们的对话语言模型，并且在不断更新迭代。
+我们采用了一系列高质量中英文数据集来训练和微调我们的对话语言模型，并且在不断更新迭代。Aquila 系列模型的预训练数据和SFT数据不开源，但数据分布情况将在官方技术报告中展现（预计6月底发布，敬请期待）。
 
-We used a series of high-quality Chinese and English datasets to train and fine-tune our conversational language model, and continuously updated it through iterations.
+We used a series of high-quality Chinese and English datasets to train and fine-tune our conversational language model, and continuously updated it through iterations. The pre-training data and SFT data of the Aquila series models are not open-sourced, but the data distribution will be presented in the official technical report (expected to be released by the end of June, stay tuned).
 
 我们额外支持了两种多模态的指令: 文图生成和图片编辑，数据集格式请参考[这里](https://github.com/FlagAI-Open/FlagAI/blob/master/examples/Aquila/Aquila-chat/data/sft_samples.jsonl)
 
 We have added support for two additional multimodal instructions: text-to-image generation and image editing. Please refer to the dataset format [here](https://github.com/FlagAI-Open/FlagAI/blob/master/examples/Aquila/Aquila-chat/data/sft_samples.jsonl).
 
 
+
+
 ## 使用方式/How to use
 
 ### 1. 推理/Inference
@@ -186,6 +188,7 @@ Create a new directory named `aquilachat-7b` inside `./checkpoints_in`. Place th
 
 | 参数名 Parameter             | 类型 Type | 描述 Description                                        |
 |--------------------------------|------------|-------------------------------------------------------|
+| lora | bool   | 是否启用[LoRA](https://github.com/microsoft/LoRA)来减少微调成本；Whether to enable [LoRA](https://github.com/microsoft/LoRA) to reduce fine-tuning costs                   |
 | batch_size | int   | 每次迭代训练时，从数据集中抽取的样本数。一般来说，它越大，处理速度越快，但会占用更多的内存; The number of samples extracted from the dataset for each iteration during training. Generally, a larger batch size can speed up processing but may also consume more memory                    |
 | gradient_accumulation_steps | int   | 在更新模型权重之前，要对多个小批次进行梯度计算的次数。主要应用于GPU显存较小的情况下，可以使用小的batch_size，通过梯度累积达到与大batch_size相同的效果; The number of samples extracted from the dataset for each iteration during training. Generally, a larger batch size can speed up processing but may also consume more memoryimages                  |
 | lr | float   | 指控制模型更新参数时的步长或速率。学习率过高可能导致模型不收敛，而学习率过低则可能导致训练时间过长或者陷入局部最优解; The step size or rate at which the model updates its parameters during training. A high learning rate may cause the model not to converge, while a low learning rate may result in long training times or being stuck in a local optimum                  |

diff --git a/examples/Aquila/Aquila-chat/bmtrain_mgpu.sh b/examples/Aquila/Aquila-chat/bmtrain_mgpu.sh
@@ -1,6 +1,6 @@
 # Defined by User
 export TRIGGER_FILE=bmtrain_mgpu.sh
-export SCRIPT_FILE=aquila_sft.py
+export SCRIPT_FILE=aquila_chat.py
 
 # ENVS
 export PROJ_HOME=$PWD

diff --git a/examples/Aquila/Aquila-code/README.md b/examples/Aquila/Aquila-code/README.md
@@ -15,9 +15,14 @@ The Aquila language model inherits the architectural design advantages of GPT-3
 | AquilaCode-7B-NV         | 25.0  |   37.9   | 61.9  |  
 
 
-我们的模型也同时支持[Huggingface平台](https://huggingface.co/BAAI)。
+<!-- 我们的模型也同时支持[Huggingface平台](https://huggingface.co/BAAI)。
+
+We also support [Huggingface](https://huggingface.co/BAAI). -->
+
+运行Aquila-7B系列需要内存30G, 显存18G，生成最大长度200 token。
+
+To run the Aquila-7b series, you need at least 30GB of memory and 18GB of GPU memory, and the maximum length of text generated should be 200 tokens.
 
-We also support [Huggingface](https://huggingface.co/BAAI).
 
 ## 模型细节/Model details
 
@@ -50,15 +55,15 @@ We used different tokenizers to extract ten thousand data samples from English,
 | LlaMA | 32000 | sp(bpe)|1805| 1257|1970 |
 | Aquila | 100000 | bpe|1575 | 477|1679 |
 
-Aquila系列模型均可在24G显卡上运行。
-
-The Aquila series models can all run on a 24GB graphics card.
 
 ## 训练数据集/Training data 
 `AquilaCode-7B-NV`和`AquilaCode-7B-TS`模型训练使用了[starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata)中的shell, sql，C, C++, Java, Javascript, Python, git-commits, github-issues, jupyter-scripts, jupyter-structured-text数据
 
 The `AquilaCode-7B-NV` and `AquilaCode-7B-TS` model was continue pretrained on  [starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata)(shell, sql，C, C++, Java, Javascript, Python, git-commits, github-issues, jupyter-scripts, jupyter-structured-text).
 
+Aquila 系列模型的预训练数据不开源，但数据分布情况将在官方技术报告中展现（预计6月底发布，敬请期待）。
+
+The pre-training data of the Aquila series models are not open-sourced, but the data distribution will be presented in the official technical report (expected to be released by the end of June, stay tuned).
 ## 使用方式/How to use
 
 ### 1. 推断/Inference
@@ -131,7 +136,7 @@ Create a new directory named `aquilacode-7b-NV` (or`aquilacode-7b-TS`) inside `.
 * `cd /examples/Aquila/Aquila-code`
 * 配置`hostfile`文件, 参考[这里](../../../doc_zh/TUTORIAL_8_ENVIRONMENT_SETUP.md#a配置hostfilehostfile-中的v100-1-与sshconfig-对应) ; Configure the `hostfile` file, refer to [here](../../../docs/TUTORIAL_8_ENVIRONMENT_SETUP.md)
 * 配置`bmtrain_mgpu.sh`文件, 将`SCRIPT_FILE`改成`aquila_sft_code.py`; configure the `bmtrain_mgpu.sh` file, change `SCRIPT_FILE` to `aquila_sft_code.py`
-* (可选) 在`Aquila-sft.yaml`文件里更改参数 ; (optional) change parameters in `Aquila-sft-code.yaml`
+* (可选) 在`Aquila-chat.yaml`文件里更改参数 ; (optional) change parameters in `Aquila-chat.yaml`
 
 | 参数名 Parameter             | 类型 Type | 描述 Description                                        |
 |--------------------------------|------------|-------------------------------------------------------|
@@ -144,7 +149,7 @@ Create a new directory named `aquilacode-7b-NV` (or`aquilacode-7b-TS`) inside `.
 
 #### Step 3: 启动可监督微调/Start SFT
 ```
-bash dist_trigger_docker.sh hostfile Aquila-sft.yaml [aquilacode-7b-nv/aquilacode-7b-ts] [实验名]
+bash dist_trigger_docker.sh hostfile Aquila-code.yaml [aquilacode-7b-nv/aquilacode-7b-ts] [实验名]
 ```
 接下来会输出下列信息，注意`NODES_NUM`应该与节点数相等，`LOGFILE`是模型运行的日志文件；The following information will be output. Note that `NODES_NUM` should be equal to the number of nodes, and `LOGFILE` is the log file for the model run.
 

diff --git a/examples/Aquila/Aquila-pretrain/README.md b/examples/Aquila/Aquila-pretrain/README.md
@@ -7,9 +7,14 @@ Aquila语言大模型在技术上继承了GPT-3、LLaMA等的架构设计优点
 The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
 
 
-我们同时也支持[Huggingface平台](https://huggingface.co/BAAI)。
+<!-- 我们同时也支持[Huggingface平台](https://huggingface.co/BAAI)。
+
+We also support [Huggingface](https://huggingface.co/BAAI). -->
+
+运行Aquila-7B系列需要内存30G, 显存18G，生成最大长度200 token。
+
+To run the Aquila-7b series, you need at least 30GB of memory and 18GB of GPU memory, and the maximum length of text generated should be 200 tokens.
 
-We also support [Huggingface](https://huggingface.co/BAAI).
 
 ## 模型细节/Model details
 
@@ -38,9 +43,6 @@ The tokenizer used in the Aquila model was trained from scratch by us and suppor
 | LlaMA | 32000 | sp(bpe)|1805| 1257|1970 |
 | Aquila | 100000 | bpe|1575 | 477|1679 |
 
-Aquila系列模型均可在24G显卡上运行。
-
-The Aquila series models can all run on a 24GB graphics card.
 
 
 ## 训练数据集/Training data 
@@ -49,7 +51,9 @@ Aquila预训练使用了Pile，[RedPajama-Data-1T](https://huggingface.co/datase
 The Aquila-7B model was pretrained on Pile，[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), Wudao Corpus、e-book、Patent, encyclopedia, forum, github etc. Details are given in the figure below.
 
 <!-- ![Screenshot](../img/data_dist.png) -->
+Aquila 系列模型的预训练数据不开源，但数据分布情况将在官方技术报告中展现（预计6月底发布，敬请期待）。
 
+The pre-training data of the Aquila series models are not open-sourced, but the data distribution will be presented in the official technical report (expected to be released by the end of June, stay tuned).
 
 
 ## 使用方式/How to use

diff --git a/examples/Aquila/README.md b/examples/Aquila/README.md
@@ -6,9 +6,13 @@ Aquila语言大模型在技术上继承了GPT-3、LLaMA等的架构设计优点
 The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
 
 
-我们同时也支持[Huggingface平台](https://huggingface.co/BAAI)。
+<!-- 我们同时也支持[Huggingface平台](https://huggingface.co/BAAI)。
 
-We also support [Huggingface](https://huggingface.co/BAAI).
+We also support [Huggingface](https://huggingface.co/BAAI). -->
+
+运行Aquila-7B系列需要内存30G, 显存18G，生成最大长度200 token。
+
+To run the Aquila-7b series, you need at least 30GB of memory and 18GB of GPU memory, and the maximum length of text generated should be 200 tokens.
 
 ## 模型细节/Model details
 
@@ -37,17 +41,16 @@ The tokenizer used in the Aquila model was trained from scratch by us and suppor
 | LlaMA | 32000 | sp(bpe)|1805| 1257|1970 |
 | Aquila | 100000 | bpe|1575 | 477|1679 |
 
-Aquila系列模型均可在24G显卡上运行。
-
-The Aquila series models can all run on a 24GB graphics card.
 
 ## 训练数据集/Training data 
 Aquila预训练使用了Pile，[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), 悟道中文数据集、电子书、专利、百科、论坛, github数据等, 详情可见下图。
 
 The Aquila-7B model was pretrained on Pile，[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), Wudao Corpus、e-book、Patent, encyclopedia, forum, github etc. Details are given in the figure below.
 
 <!-- ![Screenshot](./img/data_dist.png) -->
+Aquila 系列模型的预训练数据不开源，但数据分布情况将在官方技术报告中展现（预计6月底发布，敬请期待）。
 
+The pre-training data of the Aquila series models are not open-sourced, but the data distribution will be presented in the official technical report (expected to be released by the end of June, stay tuned).
 
 
 ## 使用方式/How to use

diff --git a/examples/Aquila/img/data_dist.png b/examples/Aquila/img/data_dist.png
diff --git a/flagai/model/tools/__init__.py b/flagai/model/tools/__init__.py