Skip to content

Commit

Permalink
update wukong-huahua inpaint (mindspore-lab#54)
Browse files Browse the repository at this point in the history
* update inpaint

* update download path
  • Loading branch information
LoomisChen authored Apr 18, 2023
1 parent 68ccf23 commit a3cb11a
Show file tree
Hide file tree
Showing 9 changed files with 524 additions and 63 deletions.
27 changes: 20 additions & 7 deletions vision/wukong-huahua/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Wukong-Huahua是基于扩散模型的中文文生图大模型,由**华为诺

下载Wukong-Huahua预训练参数 [wukong-huahua-ms.ckpt](https://download.mindspore.cn/toolkits/minddiffusion/wukong-huahua/wukong-huahua-ms.ckpt) 至 wukong-huahua/models/ 目录.

对于微调任务,我们提供了示例数据来展示格式,点击[这里](https://opt-release.obs.cn-central-221.ovaijisuan.com:443/wukonghuahua/dataset.tar.gz)下载.
对于微调任务,我们提供了示例数据来展示格式,点击[这里](https://download.mindspore.cn/toolkits/minddiffusion/wukong-huahua/dataset.tar.gz)下载.

#### 推理生成

Expand All @@ -62,7 +62,7 @@ python txt2img.py --prompt [input text] --ckpt_path [ckpt_path] --ckpt_name [ckp
```
或者
```shell
bash scripts/infer.sh
bash scripts/run_txt2img.sh
```

更高的分辨率需要更大的显存. 对于 Ascend 910 芯片, 我们可以同时生成2张1024x768的图片或者16张512x512的图片。
Expand All @@ -85,8 +85,6 @@ bash scripts/run_train.sh
bash scripts/run_train_parallel.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE]
```



### 任务二:个性化文生图任务

能够基于3-5张同一主体的照片,经过25-35分钟的个性化微调,得到该主体定制化的图片生成模型。
Expand All @@ -95,8 +93,6 @@ bash scripts/run_train_parallel.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,

![个性化训练数据-猫](demo/个性化训练数据-猫.jpg)



效果展示,生成各种风格的主体图片:

![个性化生成效果-猫](demo/个性化生成效果-猫.jpg)
Expand All @@ -105,7 +101,7 @@ bash scripts/run_train_parallel.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,

1. 下载Wukong-Huahua预训练参数 [wukong-huahua-ms.ckpt](https://download.mindspore.cn/toolkits/minddiffusion/wukong-huahua/wukong-huahua-ms.ckpt) 至 wukong-huahua/models/ 目录
2. 训练数据,3-5张同一主体的照片(训练照片规格为512*512,尽量选取单一干净的背景,主体突出)
3. 准备正则数据200张。如训练主体为狗,则需准备200张各种其他狗的图片,这些图片可以通过通用模型生成,也可以手动收集。提供了男人、女人、狗、猫四个类别的正则数据各200张。点击[这里](https://opt-release.obs.cn-central-221.ovaijisuan.com:443/wkhh-db/dataset/reg_data.rar)下载
3. 准备正则数据200张。如训练主体为狗,则需准备200张各种其他狗的图片,这些图片可以通过通用模型生成,也可以手动收集。提供了男人、女人、狗、猫四个类别的正则数据各200张。点击[这里](https://download.mindspore.cn/toolkits/minddiffusion/wukong-huahua/reg_data.rar)下载

#### 个性化微调

Expand All @@ -123,3 +119,20 @@ bash scripts/run_db_train.sh
bash scripts/infer.sh
```

### 任务三:图像编辑任务

### 准备checkpoint

下载Wukong-Huahua预训练参数 [wukong-huahua-inpaint-ms.ckpt](https://download.mindspore.cn/toolkits/minddiffusion/wukong-huahua/wukong-huahua-inpaint-ms.ckpt) 至 wukong-huahua/models/ 目录

#### 推理生成

要进行图像编辑,可以运行 inpaint.py 或者直接使用默认参数运行 run_inpaint.sh.

```shell
python inpaint.py --prompt [prompt] --img [origin image path] --mask [mask image path]
```
或者
```shell
bash scripts/run_inpaint.sh
```
63 changes: 63 additions & 0 deletions vision/wukong-huahua/configs/wukong-huahua_inpaint_inference.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
model:
target: ldm.models.diffusion.ddpm.LatentInpaintDiffusion
params:
linear_start: 0.00085
linear_end: 0.0120
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: "image"
cond_stage_key: "caption"
image_size: 64
channels: 4
cond_stage_trainable: false # Note: different from the one we trained before
conditioning_key: hybrid # important
monitor: val/loss_simple_ema
scale_factor: 0.18215
finetune_keys: null
use_ema: false

unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 64 # unused
in_channels: 9 # 4 data + 4 downscaled image + 1 mask
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2, 1 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4, 4 ]
num_heads: 8
use_spatial_transformer: true
transformer_depth: 1
context_dim: 768
use_checkpoint: true
legacy: false
use_fp16: True

first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
use_fp16: True
ddconfig:
double_z: true
z_channels: 4
resolution: 512
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0

cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPEmbedder_ZH
params:
use_fp16: True
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit a3cb11a

Please sign in to comment.