Skip to content

Commit

Permalink
removed some parts
Browse files Browse the repository at this point in the history
Signed-off-by: ftgreat <[email protected]>
  • Loading branch information
ftgreat committed Jun 9, 2023
1 parent 0d7d046 commit 26202a2
Show file tree
Hide file tree
Showing 6 changed files with 31 additions and 86 deletions.
30 changes: 9 additions & 21 deletions examples/Aquila/Aquila-code/README_AquilaCode.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,6 @@ Aquila语言大模型在技术上继承了GPT-3、LLaMA等的架构设计优点

The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.

<!-- AquilaCode-7B-NV是在Aquila-7B模型的基础上,经过代码数据的继续预训练得到的基础代码模型。此模型由智源研究院研发。在主流评测数据集上的评测结果如下
AquilaCode-7B-nv is a foundational code model obtained by further pretraining on code data based on the Aquila-7B model. It was developed by Beijing Academy of Artificial Intelligence. The evaluation results on mainstream benchmark datasets are as follows:
| 名称/Name | MMLU_Chinese_EM | CLUE-EM |MMLU-EM| BoolQ-EM| TruthfulQA-EM |IMDB-EM| RAFT-EM|
| ----- | ---- | ----- | ---- | ----- | ---- | ----- | ----- |
| [AquilaCode-7B-nv](https://model.baai.ac.cn/model-detail/xxxxx) | 0.xxx | 0.xxx|0.xxx | 0.xxx|0.xxx | -->


<!-- 您可以在[FlagEval基础模型评测平台](https://flageval.baai.ac.cn/#/home) 查看更多评测指标
You can view [FlagEval Model Evaluation Platform](https://flageval.baai.ac.cn/#/home) for more details -->



我们的模型也同时支持[Huggingface平台](hflink)
Expand All @@ -29,12 +16,13 @@ We also support [Huggingface](hflink)

## 模型细节/Model details

| Model | License | Commercial use? | GPU | Model link
| :---------------- | :------- | :-- |:-- | :-- |
| Aquila-7B | Apache 2.0 | ✅ | Nvidia-A100 | https://model.baai.ac.cn/model-detail/100098
| AquilaCode-7B-NV | Apache 2.0 | ✅ | Nvidia-A100 | https://model.baai.ac.cn/model-detail/100102
| AquilaCode-7B-TS | Apache 2.0 | ✅ | Tianshu-BI-V100 | https://model.baai.ac.cn/model-detail/100099
| AquilaChat-7B | Apache 2.0 | ✅ | Nvidia-A100 | https://model.baai.ac.cn/model-detail/100101
| 模型/Model | 状态/State | 能否商用/Commercial use? | 所用显卡/GPU |
| :---------------- | :------- | :-- |:-- |
| Aquila-7B | 已发布 || Nvidia-A100 |
| Aquila-30B | 敬请期待 || Nvidia-A100 |
| <font color=red>AquilaCode-7B-NV </font> |已发布 || Nvidia-A100 |
| <font color=red>AquilaCode-7B-TS </font> |已发布 || Tianshu-BI-V100 |
| AquilaChat-7B |已发布 || Nvidia-A100 |

我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们应用了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。

Expand Down Expand Up @@ -125,9 +113,9 @@ with torch.no_grad():

### 2. 可监督微调/Supervised Fine-tuning(SFT)
#### Step 1: 配置模型/ Setup Checkpoints
`./checkpoints_in`里新建`aquilacode-7b-nv`(或`aquilacode-7b-ts`)目录。将微调后的checkpoint,以及原始`aquilacode-7b-nv`模型里的其余文件,包括`config.json`, `mergex.txt`, `vocab.json`, `special_tokens_map.json`放进去
`./checkpoints_in`里新建`aquilacode-7b-NV`(或`aquilacode-7b-TS`)目录。将微调后的checkpoint,以及原始`aquilacode-7b-NV/aquilacode-7b-TS`模型里的其余文件,包括`config.json`, `mergex.txt`, `vocab.json`, `special_tokens_map.json`放进去

Create a new directory named `aquilacode-7b-nv` (or`aquilacode-7b-ts`) inside `./checkpoints_in`. Place the fine-tuned checkpoint and all other files from the original `aquilacode-7b-nv` model, including `config.json`, `mergex.txt`, `vocab.json`, and `special_tokens_map.json`, into this directory.
Create a new directory named `aquilacode-7b-NV` (or`aquilacode-7b-TS`) inside `./checkpoints_in`. Place the fine-tuned checkpoint and all other files from the original `aquilacode-7b-NV/aquilacode-7b-TS` model, including `config.json`, `mergex.txt`, `vocab.json`, and `special_tokens_map.json`, into this directory.

#### Step 2: 修改参数/Modify Parameters
* `cd /examples/Aquila/Aquila-code`
Expand Down
10 changes: 4 additions & 6 deletions examples/Aquila/Aquila-code/generate_code.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,20 @@
import os
import argparse
import sys
sys.path.append("/data2/yzd/workspace/FlagAI")
from flagai import mpu
from flagai.auto_model.auto_loader import AutoLoader
import random
import numpy as np
from flagai.model.predictor.predictor import Predictor
from flagai.data.tokenizer import Tokenizer


model_dir = "./checkpoints_in"
# model_dir = "../converted_models_ldwang"
device = "cuda"

print(f"building model...")
loader = AutoLoader("lm", model_name="aquilacode-7b-nv",
loader = AutoLoader("lm", model_name="aquilacode-7b-ts",
use_cache=True,
model_dir=model_dir)

Expand All @@ -35,9 +36,7 @@

max_new_tokens = 256

texts = ["#补全代码\ndef quick_sort(x):",
'"""\n向用户询问他们的名字并说“你好”\m"""',
'"""\nAsk the user for their name and say "Hello\n""""' ]
texts = ["#补全代码\ndef quick_sort(x):"]

for text in texts:
input_ids = tokenizer.encode_plus_non_glm(text)["input_ids"][:-1]
Expand All @@ -52,4 +51,3 @@
print(res)



22 changes: 8 additions & 14 deletions examples/Aquila/Aquila-pretrain/README_Aquila.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,14 @@ You can view [FlagEval Model Evaluation Platform](https://flageval.baai.ac.cn/#/
We also support [Huggingface](hflink)

## 模型细节/Model details
<!-- | Model | License | Commercial use? | Pretraining length [tokens] | Pretraining compute (GPU days) | GPU
| :---------------- | :------- | :-- |:-- | :-- | :-- |
| Aquila-7B | Apache 2.0 | ✅ | 400B | dx22x8 | Nvidia-A100 |
| Aquila-33B | Apache 2.0 | ✅ | xx | xx | Nvidia-A100 |
| AquilaCode-7B-nv | Apache 2.0 | ✅ | 235B | 14x8x8 | Nvidia-A100 |
| AquilaCode-7B-ts | Apache 2.0 | ✅ | 75B | 9x32x8 | Tianshu-BI-V100 |
| AquilaChat-7B | Apache 2.0 | ✅ | 15万条 | 8/24x1x8 | Nvidia-A100 | -->

| Model | License | Commercial use? | GPU | Model link
| :---------------- | :------- | :-- |:-- | :-- |
| <font color=red>Aquila-7B </font> | Apache 2.0 | ✅ | Nvidia-A100 | https://model.baai.ac.cn/model-detail/100098
| AquilaCode-7B-nv | Apache 2.0 | ✅ | Nvidia-A100 | https://model.baai.ac.cn/model-detail/100102
| AquilaCode-7B-ts | Apache 2.0 | ✅ | Tianshu-BI-V100 | https://model.baai.ac.cn/model-detail/100099
| AquilaChat-7B | Apache 2.0 | ✅ | Nvidia-A100 | https://model.baai.ac.cn/model-detail/100101

| 模型/Model | 状态/State | 能否商用/Commercial use? | 所用显卡/GPU |
| :---------------- | :------- | :-- |:-- |
| <font color=red>Aquila-7B </font> | 已发布 || Nvidia-A100 |
| <font color=red>Aquila-30B </font> | 敬请期待 || Nvidia-A100 |
| AquilaCode-7B-NV |已发布 || Nvidia-A100 |
| AquilaCode-7B-TS |已发布 || Tianshu-BI-V100 |
| AquilaChat-7B |已发布 || Nvidia-A100 |

我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。

Expand Down
40 changes: 2 additions & 38 deletions examples/Aquila/Aquila-pretrain/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,59 +14,23 @@
model_name=model_name,
use_cache=True)
model = loader.get_model()
# import pdb;pdb.set_trace()
tokenizer = loader.get_tokenizer()
# from flagai.model.aquila_model import AQUILAModel
# model = AQUILAModel.from
# tokenizer = Tokenizer.from_pretrained('aquila-7b', cache_dir='./checkpoints_in/aquila-7b')
# pl_sd = torch.load('./checkpoints_in/aquila-7b/pytorch_model.bin', map_location="cpu")
# if "state_dict" in pl_sd:
# sd = pl_sd["state_dict"]
# else:
# sd = pl_sd
# model.load_state_dict(sd, strict=True)

model.eval()
model.half()
# with torch.cuda.device(0):
# model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30)

model.cuda()

predictor = Predictor(model, tokenizer)

texts = [
# "I am ",
#"1月7日,五华区召开“中共昆明市五华区委十届三次全体(扩大)会议”,",
#"1月7日,五华区召开“中共昆明市五华区委十届三次全体(扩大)会议”,区委书记金幼和作了《深入学习贯彻党的十八大精神,奋力开创五华跨越发展新局面》的工作报告。",
#"拥有美丽身材是大多数女人追求的梦想,甚至有不少mm为了实现这个梦而心甘情愿付出各种代价,",
#"2007年乔布斯向人们展示iPhone并宣称它将会改变世界",
#"从前有座山,",
#"如何摆脱无效焦虑?",
"北京在哪儿?",
#"北京",
#"汽车EDR是什么",
#"My favorite animal is",
#"今天天气不错",
#"如何评价许嵩?",
#"汽车EDR是什么",
#"给妈妈送生日礼物,怎么选好?",
#"1加1等于18497是正确的吗?",
#"如何给汽车换胎?",
#"以初春、黄山为题,做一首诗。",
#"What is machine learning?",
#"Machine learning is",
#"Nigerian billionaire Aliko Dangote says he is planning a bid to buy the UK Premier League football club.",
#"The capital of Germany is the city of ",
"汽车EDR是什么",
]


for text in texts:
print('-'*80)
text = f'{text}' #base
text = f'{text}'
print(f"text is {text}")
#out = predictor.predict_generate_randomsample(text, out_max_length=200,top_p=0.95)
with torch.no_grad():
# out = predictor.predict_generate_randomsample(text, out_max_length=200, temperature=0)
out = predictor.predict_generate_randomsample(text, out_max_length=200,top_p=0.95)
print(f"pred is {out}")
14 changes: 8 additions & 6 deletions examples/Aquila/Aquila-sft/README_AquilaChat.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,14 @@ AquilaChat模型主要为了验证基础模型能力,您可以根据自己需
The AquilaChat model was primarily developed to verify the capabilities of the foundational model. You can use, modify, and commercialize the model according to your needs, but you must comply with all applicable laws and regulations in your country. Additionally, you must provide the source of the Aquila series models and a copy of the Aquila series model lincense to any third-party users.

## 模型细节/Model details
| Model | License | Commercial use? | GPU | Model link
| :---------------- | :------- | :-- |:-- | :-- |
|Aquila-7B | Apache 2.0 | ✅ | Nvidia-A100 | https://model.baai.ac.cn/model-detail/100098
| AquilaCode-7B-nv | Apache 2.0 | ✅ | Nvidia-A100 | https://model.baai.ac.cn/model-detail/100102
| AquilaCode-7B-ts | Apache 2.0 | ✅ | Tianshu-BI-V100 | https://model.baai.ac.cn/model-detail/100099
| AquilaChat-7B | Apache 2.0 | ✅ | Nvidia-A100 | https://model.baai.ac.cn/model-detail/100101

| 模型/Model | 状态/State | 能否商用/Commercial use? | 所用显卡/GPU |
| :---------------- | :------- | :-- |:-- |
| Aquila-7B | 已发布 || Nvidia-A100 |
| Aquila-30B | 敬请期待 || Nvidia-A100 |
| AquilaCode-7B-NV |已发布 || Nvidia-A100 |
| AquilaCode-7B-TS |已发布 || Tianshu-BI-V100 |
| <font color=red>AquilaChat-7B </font> |已发布 || Nvidia-A100 |


我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们应用了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
Expand Down
1 change: 0 additions & 1 deletion flagai/model/base_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ def _load_state_dict_into_model(cls,
sd = pl_sd
if "global_step" in pl_sd:
print(f"Global Step: {pl_sd['global_step']}")
import pdb;pdb.set_trace()
m, u = model.load_state_dict(sd, strict=True)
if len(m) > 0 and verbose:
print("missing keys:")
Expand Down

0 comments on commit 26202a2

Please sign in to comment.