Skip to content

Commit

Permalink
update the documents and removed redundant files (FlagAI-Open#97)
Browse files Browse the repository at this point in the history
* added diffusion model

Signed-off-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>

* fixed download path

Signed-off-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>

* added online version of diffusion

Signed-off-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>

* removed unused code

Signed-off-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>

* solved pr issue

Signed-off-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>

* solve unicode issue

Signed-off-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>

* solve unicode issue

Signed-off-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>

* fixed pr issue

Signed-off-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>

* fixed pr issue

Signed-off-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>

* fix issue

Signed-off-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>

* updated requireed packages

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* updated requirement.txt

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* updated import error

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* updatde the import

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* seperate the cn_clip module

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* updated the packages

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* updatde the package versions

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* updated torchvision version

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* fixed tokenizer import

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* recover change

* Delete Bert.py

* reformate the code

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* fix json read warning and reformat code

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* updated the changes

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* remove local path

* added readme file

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* fix cpm3 infer bugs

* fix cpm3 model bugs

* Update README.md

* updated the diffusion README and commets

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* removed bmtrain install to avoid error

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* add xlmroberta

* added flagstudio

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* enabled online loading

Signed-off-by: BAAI-OpenPlatform <[email protected]>

* imporved README and changed filename

Signed-off-by: Anhforth <[email protected]>

* changed the filename

Signed-off-by: Anhforth <[email protected]>

* add readme file for AltCLIP

Signed-off-by: zhaohu xing <[email protected]>

* removed pdb

Signed-off-by: Anhforth <[email protected]>

* add model_dir

Signed-off-by: Anhforth <[email protected]>

* updated

Signed-off-by: Anhforth <[email protected]>

* added more requirements

Signed-off-by: Anhforth <[email protected]>

* updated pr info

Signed-off-by: Anhforth <[email protected]>

* modified readme

Signed-off-by: Anhforth <[email protected]>

* modified readme

Signed-off-by: Anhforth <[email protected]>

* added the RightBrainAI for our long image generation technology

Signed-off-by: Anhforth <[email protected]>

* updated the readme

Signed-off-by: Anhforth <[email protected]>

* fix bugs

Signed-off-by: zhaohu xing <[email protected]>

* tokenizer

Signed-off-by: zhaohu xing <[email protected]>

* Delete vocab.txt

* fix tokenizer bugs

Signed-off-by: Anhforth <[email protected]>

* added huggingface open-source address to docs

Signed-off-by: Anhforth <[email protected]>

* delete some files (FlagAI-Open#95)

Signed-off-by: zhaohu xing <[email protected]>

Signed-off-by: zhaohu xing <[email protected]>

* modified docs

Signed-off-by: ZhaodongYan1 <[email protected]>

* modified laion link and turn parameters descriptions in a table

Signed-off-by: ZhaodongYan1 <[email protected]>

* modify file name (FlagAI-Open#96)



Signed-off-by: zhaohu xing <[email protected]>

* updated the docs

Signed-off-by: ZhaodongYan1 <[email protected]>

* modified the docs

Signed-off-by: ZhaodongYan1 <[email protected]>

* updated

Signed-off-by: ZhaodongYan1 <[email protected]>

* updated the docs

Signed-off-by: ZhaodongYan1 <[email protected]>

* solve license linke issye

Signed-off-by: ZhaodongYan1 <[email protected]>

* modify classnames

Signed-off-by: zhaohu xing <[email protected]>

Signed-off-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>
Signed-off-by: BAAI-OpenPlatform <[email protected]>
Signed-off-by: Anhforth <[email protected]>
Signed-off-by: zhaohu xing <[email protected]>
Signed-off-by: ZhaodongYan1 <[email protected]>
Co-authored-by: root <root@zhadong-4mhn8-8819-worker-0.yanzhaodong.baaishare-sailing.svc.kubebrain.local>
Co-authored-by: shunxing1234 <[email protected]>
Co-authored-by: zhaohu xing <[email protected]>
Co-authored-by: Anhforth <[email protected]>
Co-authored-by: Zac Liu <[email protected]>
Co-authored-by: zhaohu xing <[email protected]>
Co-authored-by: ZhaodongYan1 <[email protected]>
  • Loading branch information
8 people authored Nov 12, 2022
1 parent 0cc600c commit a44e7b8
Show file tree
Hide file tree
Showing 5 changed files with 88 additions and 86 deletions.
49 changes: 33 additions & 16 deletions examples/AltCLIP/README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,44 @@


我们提出了一个简单高效的方法去训练更加优秀的双语CLIP模型。命名为AltCLIP。

训练共有两个阶段。
在平行知识蒸馏阶段,我们只是使用平行语料文本来进行蒸馏(平行语料相对于图文对更容易获取且数量更大)。在双语对比学习阶段,我们使用少量的中-英 图像-文本对(一共约2百万)来训练我们的文本编码器以更好地适应图像编码器。

# AltCLIP

## 简介/Overview

我们提出了一个简单高效的方法去训练更加优秀的双语CLIP模型。命名为AltCLIP。AltCLIP基于 [Stable Diffusiosn](https://github.com/CompVis/stable-diffusion) 训练,训练数据来自 [WuDao数据集](https://data.baai.ac.cn/details/WuDaoCorporaText)[LIAON](https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6plus)

AltCLIP模型可以为本项目中的AltDiffusion模型提供支持,关于AltDiffusion模型的具体信息可查看[此教程](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltDiffusion/README.md)

模型与权重已经在FlagAI([https://github.com/FlagAI-Open/FlagAI](https://github.com/FlagAI-Open/FlagAI))上开源,我们还提供了微调,推理,验证的脚本,欢迎试用。
模型代码已经在 [FlagAI](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP) 上开源,权重位于我们搭建的 [modelhub](https://model.baai.ac.cn/model-detail/100075) 上。我们还提供了微调,推理,验证的脚本,欢迎试用。

We propose a simple and efficient method to train a better bilingual CLIP model. It is named AltCLIP.
首次运行AltCLIP时,下列权重将会自动从modelhub上下载。

| 模型名称 Model name | 大小 Size | 描述 Description |
| ------------------- | --------- | -------------------------------------------------- |
| AltCLIP | 3.22G | 我们的双语AltCLIP模型;Our bilingual AltCLIP model |

There are two phases of training.
In the parallel knowledge distillation phase, we only use parallel corpus texts for distillation (parallel corpus is easier to obtain and larger in number compared to image text pairs). In the bilingual comparison learning phase, we use a small number of Chinese-English image-text pairs (about 2 million in total) to train our text encoder to better fit the image encoder.


The AltCLIP model can provide support for the AltDiffusion model in this project. For specific information about the AltDiffusion model, see [this tutorial](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltDiffusion/README.md).
We propose a simple and efficient method to train a better bilingual CLIP model. Named AltCLIP. AltCLIP is trained based on [Stable Diffusiosn](https://github.com/CompVis/stable-diffusion) with training data from [WuDao dataset](https://data.baai.ac.cn/details/WuDaoCorporaText) and [Liaon](https://huggingface.co/datasets/laion/laion2B-en).

The AltCLIP model can provide support for the AltDiffusion model in this project. Specific information on the AltDiffusion model can be found in [this tutorial](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltDiffusion/README.md).

The model code has been open sourced on [FlagAI](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP) and the weights are located on [modelhub](https://model.baai.ac.cn/model-detail/100075). We also provide scripts for fine-tuning, inference, and validation, so feel free to try them out.

The model and weights have been open sourced on FlagAI ([https://github.com/FlagAI-Open/FlagAI](https://github.com/FlagAI-Open/FlagAI)), and we also provide scripts for fine-tuning, inference, and evaluation, so feel free to try them out.

# 下游效果 Performance

## 训练/Training

训练共有两个阶段。
在平行知识蒸馏阶段,我们只是使用平行语料文本来进行蒸馏(平行语料相对于图文对更容易获取且数量更大)。在双语对比学习阶段,我们使用少量的中-英 图像-文本对(一共约2百万)来训练我们的文本编码器以更好地适应图像编码器。

There are two phases of training.
In the parallel knowledge distillation phase, we only use parallel corpus texts for distillation (parallel corpus is easier to obtain and larger in number compared to image text pairs). In the bilingual comparison learning phase, we use a small number of Chinese-English image-text pairs (about 2 million in total) to train our text encoder to better fit the image encoder.



## 下游效果/Performance

<table>
<tr>
Expand Down Expand Up @@ -183,20 +200,20 @@ The model and weights have been open sourced on FlagAI ([https://github.com/Flag
</tr>
</table>

![image-20221111172255521](imgs/image.png)
![image-20221111172255521](https://raw.githubusercontent.com/920232796/test/master/image.png)




# 可视化效果 Visualization effects
## 可视化效果/Visualization effects

基于AltCLIP,我们还开发了AltDiffusion模型,可视化效果如下。

Based on AltCLIP, we have also developed the AltDiffusion model, visualized as follows.

![](https://raw.githubusercontent.com/920232796/test/master/image7.png)

# 模型推理 Inference
## 模型推理 Inference

```python
import torch
Expand Down Expand Up @@ -237,7 +254,7 @@ with torch.no_grad():
print(text_probs.cpu().numpy()[0].tolist())
```

# CLIP微调 Finetuning
## CLIP微调/Finetuning

微调采用cifar10数据集,并使用FlagAI的Trainer快速开始训练过程。

Expand All @@ -264,7 +281,7 @@ classes = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'hors

auto_loader = AutoLoader(
task_name="txt_img_matching",
model_dir="/sharefs/baai-mrnd/xingzhaohu/",
model_dir="./checkpoints/",
model_name="AltCLIP-XLMR-L" # Load the checkpoints from Modelhub(model.baai.ac.cn/models)
)

Expand Down Expand Up @@ -303,7 +320,7 @@ if __name__ == "__main__":



# 模型验证 Evaluation
## 模型验证/Evaluation

我们提供了可以直接运行的验证脚本,在cifar10数据集上进行验证。

Expand Down
99 changes: 43 additions & 56 deletions examples/AltDiffusion/README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,36 @@

# 模型信息 Model Information
# 模型信息/Model Information

我们使用预训练好的双语语言模型作为我们的text encoder,并使用美学评分在5.5分以上的WuDao数据集(6M)和美学评分在以上的Laion数据(5M)进行微调。

微调时,我们使用stable-diffusion-v1-4作初始化,并冻住双语语言模型,只微调Unet模型中Transformer Block的key模块与vuale模块。

并且在训练时,我们将数据集按图片的长宽比进行分桶,同一个batch内的数据都裁剪到与图片尺寸相近的固定大小,从而克服了原版stable diffusion长图与宽图生成多头的问题。
我们使用 [AltCLIP](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP/README.md) 作为text encoder,基于 [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) 训练了双语Diffusion模型,训练数据来自 [WuDao数据集](https://data.baai.ac.cn/details/WuDaoCorporaText)[LAION](https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6plus)

我们的版本在中英文对齐方面表现非常出色,是目前市面上开源的最强版本,保留了原版stable diffusion的大部分能力,并且在某些例子上比有着比原版模型更出色的能力。

AltDiffusion 模型由名为 AltCLIP 的双语 CLIP 模型支持,该模型也可在本项目中访问。您可以阅读 [此教程](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP/README.md) 了解更多信息。

注意:模型推理要求一张至少10G以上的GPU。


AltDiffusion模型现在支持线上演示,点击 [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/BAAI/FlagStudio) 在线试玩!

同时,AltDiffusion的 web demo 也部署在huggingface spaces上,欢迎试用!网址为:[https://huggingface.co/spaces/BAAI/bilingual_stable_diffusion](https://huggingface.co/spaces/BAAI/bilingual_stable_diffusion)
Our model performs well in aligning Chinese and English, and is the strongest open source version on the market today, retaining most of the stable diffusion capabilities of the original, and in some cases even better than the original model.

We used [AltCLIP](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP/README.md) as the text encoder, and trained a bilingual Diffusion model based on [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion), with training data from [WuDao dataset](https://data.baai.ac.cn/details/WuDaoCorporaText) and [LAION](https://huggingface.co/datasets/laion/laion2B-en).

AltDiffusion model is backed by a bilingual CLIP model named AltCLIP, which is also accessible in FlagAI. You can read [this tutorial](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP/README.md) for more information.

We use the pre-trained bilingual language model as our text encoder and fine-tune it using the WuDao dataset (6M) with an aesthetic score above 5.5 and the Laion data (5M) with an aesthetic score above 5.5.
AltDiffusion now supports online demo,Try out it by clicking [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/BAAI/FlagStudio) !

When fine-tuning, we use stable-diffusion v1-4 as initialization, freeze the bilingual language model, and only fine-tune the key module and vuale module of the Transformer Block in the Unet model.
# 模型权重/Model Weights

And during training, we divide the data set into bucketed according to the aspect ratio of the image, and the data in the same batch are cropped to a fixed size similar to the image size, so as to overcome the problem of generating multiple heads for the long and wide images of the original stable diffusion.
第一次运行AltDiffusion模型时会自动从 [这里](https://model.baai.ac.cn/model-detail/100076) 下载如下权重,

AltDiffusion model is backed by a bilingual CLIP model named AltCLIP, which is also accessible in FlagAI. You can read [this tutorial](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP/README.md) for more information.
The following weights are automatically downloaded from [here](https://model.baai.ac.cn/model-detail/100076) when the AltDiffusion model is run for the first time:

Note that the model inference requires a GPU of at least 10G above.
| 模型名称 Model name | 大小 Size | 描述 Description |
|------------------------------|---------|-------------------------------------------------------|
| StableDiffusionSafetyChecker | 1.13G | 图片的安全检查器;Safety checker for image |
| AltDiffusion | 8.0G | 我们的双语AltDiffusion模型; Our bilingual AltDiffusion model |
| AltCLIP | 3.22G | 我们的双语AltCLIP模型;Our bilingual AltCLIP model |

Also, AltDiffusion's web demo is deployed on huggingface spaces, feel free to try it out! The web address is:[https://huggingface.co/spaces/BAAI/bilingual_stable_diffusion](https://huggingface.co/spaces/BAAI/bilingual_stable_diffusion)

# 示例 Example
# 示例/Example

以下示例将为文本输入`Anime portrait of natalie portman as an anime girl by stanley artgerm lau, wlop, rossdraws, james jean, andrei riabovitchev, marc simonetti, and sakimichan, trending on artstation` 在目录`./AltDiffusionOutputs`下生成图片结果。

Expand All @@ -58,59 +57,45 @@ predictor = Predictor(model)
predictor.predict_generate_images(prompt)
```


您可以在`predict_generate_images`函数里通过改变参数来调整设置,具体信息如下:

More parameters of predict_generate_images for you to adjust for `predict_generate_images` are listed below:


`prompt: str`: 提示文本; The prompt text

`out_path: str`: 输出路径; The output path to save images

`n_samples: int`: 输出图片数量; Number of images to be generated

`skip_grid: bool`: 如果为True, 会将所有图片拼接在一起,输出一张新的图片; If set to true, image gridding step will be skipped
| 参数名 Parameter | 类型 Type | 描述 Description |
|--------------------------------|------------|-------------------------------------------------------|
| prompt | str | 提示文本; The prompt text |
| out_path | str | 输出路径; The output path to save images |
| n_samples | int | 输出图片数量; Number of images to be generate |
| skip_grid | bool | 如果为True, 会将所有图片拼接在一起,输出一张新的图片; If set to true, image gridding step will be skipped |
| ddim_step | int | DDIM模型的步数; Number of steps in ddim model |
| plms | bool | 如果为True, 则会使用plms模型; If set to true, PLMS Sampler instead of DDIM Sampler will be applied |
| scale | float | 这个值决定了文本在多大程度上影响生成的图片,值越大影响力越强; This value determines how important the prompt incluences generate images |
| H | int | 图片的高度; Height of image |
| W | int | 图片的宽度; Width of image |
| C | int | 图片的channel数; Numeber of channels of generated images |
| seed | int | 随机种子; Random seed number |

`ddim_step: int`: DDIM模型的步数; Number of steps in ddim model

`plms: bool`: 如果为True, 则会使用plms模型; If set to true, PLMS Sampler instead of DDIM Sampler will be applied

`scale: float` : 这个值决定了文本在多大程度上影响生成的图片,值越大影响力越强; This value determines how important the prompt incluences generate images

`H: int`: 图片的高度; Height of image

`W: int`: 图片的宽度; Width of image

`C: int`: 图片的channel数; Numeber of channels of generated images

`seed: int`: 随机种子; Random seed number

# 模型权重 Model Weights

第一次运行AltDiffusion模型时会自动下载下列权重:
注意:模型推理要求一张至少10G以上的GPU。

The following weights are automatically downloaded when the AltDiffusion model is run for the first time:
Note that the model inference requires a GPU of at least 10G above.

| 模型名称 Model name | 大小 Size | 描述 Description |
|------------------------------|---------|-------------------------------------------------------|
| StableDiffusionSafetyChecker | 1.13G | 图片的安全检查器;Safety checker for image |
| AltDiffusion | 8.0G | 我们的双语AltDiffusion模型; Our bilingual AltDiffusion model |
| AltCLIP | 3.22G | 我们的双语AltCLIP模型;Our bilingual AltCLIP model |

# 更多生成结果 More Results
# 更多生成结果/More Results

## 中英文对齐能力
## 中英文对齐能力/Chinese and English alignment ability

### prompt:dark elf princess, highly detailed, d & d, fantasy, highly detailed, digital painting, trending on artstation, concept art, sharp focus, illustration, art by artgerm and greg rutkowski and fuji choko and viktoria gavrilenko and hoang lap
### 英文生成结果
### 英文生成结果/Generated results from English prompts

![image](./imgs/en_暗黑精灵.png)

### prompt:黑暗精灵公主,非常详细,幻想,非常详细,数字绘画,概念艺术,敏锐的焦点,插图
### 中文生成结果
### 中文生成结果/Generated results from Chinese prompts
![image](./imgs/cn_暗黑精灵.png)

## 中文表现能力
## 中文表现能力/The performance for Chinese prompts

## prompt:带墨镜的男孩肖像,充满细节,8K高清
![image](./imgs/小男孩.png)
Expand All @@ -119,7 +104,7 @@ The following weights are automatically downloaded when the AltDiffusion model i
## prompt:带墨镜的中国男孩肖像,充满细节,8K高清
![image](./imgs/cn_小男孩.png)

## 长图生成能力
## 长图生成能力/The ability to generate long images

### prompt: 一只带着帽子的小狗
### 原版 stable diffusion:
Expand All @@ -130,8 +115,10 @@ The following weights are automatically downloaded when the AltDiffusion model i

注: 此处长图生成技术由右脑科技(RightBrain AI)提供。

Note: Note: The long image generation technology here is provided by Right Brain Technology.
Note: The long image generation technology here is provided by Right Brain Technology.

# 许可/License

# License
该模型通过 [CreativeML Open RAIL-M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) 获得许可。作者对您生成的输出不主张任何权利,您可以自由使用它们并对它们的使用负责,不得违反本许可中的规定。该许可证禁止您分享任何违反任何法律、对他人造成伤害、传播任何可能造成伤害的个人信息、传播错误信息和针对弱势群体的任何内容。您可以出于商业目的修改和使用模型,但必须包含相同使用限制的副本。有关限制的完整列表,请[阅读许可证](https://huggingface.co/spaces/CompVis/stable-diffusion-license)

The model is licensed with a [CreativeML Open RAIL-M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license). The authors claim no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in this license. The license forbids you from sharing any content that violates any laws, produce any harm to a person, disseminate any personal information that would be meant for harm, spread misinformation and target vulnerable groups. You can modify and use the model for commercial purposes, but a copy of the same use restrictions must be included. For the full list of restrictions please [read the license](https://huggingface.co/spaces/CompVis/stable-diffusion-license)
The model is licensed with a [CreativeML Open RAIL-M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license). The authors claim no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in this license. The license forbids you from sharing any content that violates any laws, produce any harm to a person, disseminate any personal information that would be meant for harm, spread misinformation and target vulnerable groups. You can modify and use the model for commercial purposes, but a copy of the same use restrictions must be included. For the full list of restrictions please [read the license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) .
Loading

0 comments on commit a44e7b8

Please sign in to comment.