update wukong-huahua inpaint (mindspore-lab#54)

* update inpaint * update download path
ly-study-py · Apr 18, 2023 · a3cb11a · a3cb11a
1 parent 68ccf23
commit a3cb11a
Show file tree

Hide file tree

Showing 9 changed files with 524 additions and 63 deletions.
diff --git a/vision/wukong-huahua/README.md b/vision/wukong-huahua/README.md
@@ -49,7 +49,7 @@ Wukong-Huahua是基于扩散模型的中文文生图大模型，由**华为诺
 
 下载Wukong-Huahua预训练参数 [wukong-huahua-ms.ckpt](https://download.mindspore.cn/toolkits/minddiffusion/wukong-huahua/wukong-huahua-ms.ckpt) 至 wukong-huahua/models/ 目录.
 
-对于微调任务，我们提供了示例数据来展示格式，点击[这里](https://opt-release.obs.cn-central-221.ovaijisuan.com:443/wukonghuahua/dataset.tar.gz)下载.
+对于微调任务，我们提供了示例数据来展示格式，点击[这里](https://download.mindspore.cn/toolkits/minddiffusion/wukong-huahua/dataset.tar.gz)下载.
 
 #### 推理生成
 
@@ -62,7 +62,7 @@ python txt2img.py --prompt [input text] --ckpt_path [ckpt_path] --ckpt_name [ckp
 ```
 或者
 ```shell
-bash scripts/infer.sh
+bash scripts/run_txt2img.sh
 ```
 
 更高的分辨率需要更大的显存. 对于 Ascend 910 芯片, 我们可以同时生成2张1024x768的图片或者16张512x512的图片。
@@ -85,8 +85,6 @@ bash scripts/run_train.sh
 bash scripts/run_train_parallel.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE]
 ```
 
-
-
 ### 任务二：个性化文生图任务
 
 能够基于3-5张同一主体的照片，经过25-35分钟的个性化微调，得到该主体定制化的图片生成模型。
@@ -95,8 +93,6 @@ bash scripts/run_train_parallel.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,
 
 ![个性化训练数据-猫](demo/个性化训练数据-猫.jpg)
 
-
-
 效果展示,生成各种风格的主体图片：
 
 ![个性化生成效果-猫](demo/个性化生成效果-猫.jpg)
@@ -105,7 +101,7 @@ bash scripts/run_train_parallel.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,
 
 1. 下载Wukong-Huahua预训练参数 [wukong-huahua-ms.ckpt](https://download.mindspore.cn/toolkits/minddiffusion/wukong-huahua/wukong-huahua-ms.ckpt) 至 wukong-huahua/models/ 目录
 2. 训练数据，3-5张同一主体的照片（训练照片规格为512*512，尽量选取单一干净的背景，主体突出）
-3. 准备正则数据200张。如训练主体为狗，则需准备200张各种其他狗的图片，这些图片可以通过通用模型生成，也可以手动收集。提供了男人、女人、狗、猫四个类别的正则数据各200张。点击[这里](https://opt-release.obs.cn-central-221.ovaijisuan.com:443/wkhh-db/dataset/reg_data.rar)下载
+3. 准备正则数据200张。如训练主体为狗，则需准备200张各种其他狗的图片，这些图片可以通过通用模型生成，也可以手动收集。提供了男人、女人、狗、猫四个类别的正则数据各200张。点击[这里](https://download.mindspore.cn/toolkits/minddiffusion/wukong-huahua/reg_data.rar)下载
 
 #### 个性化微调
 
@@ -123,3 +119,20 @@ bash scripts/run_db_train.sh
 bash scripts/infer.sh
 ```
 
+### 任务三：图像编辑任务
+
+### 准备checkpoint
+
+下载Wukong-Huahua预训练参数 [wukong-huahua-inpaint-ms.ckpt](https://download.mindspore.cn/toolkits/minddiffusion/wukong-huahua/wukong-huahua-inpaint-ms.ckpt) 至 wukong-huahua/models/ 目录
+
+#### 推理生成
+
+要进行图像编辑，可以运行 inpaint.py 或者直接使用默认参数运行 run_inpaint.sh.
+
+```shell
+python inpaint.py --prompt [prompt] --img [origin image path] --mask [mask image path]
+```
+或者
+```shell
+bash scripts/run_inpaint.sh
+```
diff --git a/vision/wukong-huahua/configs/wukong-huahua_inpaint_inference.yaml b/vision/wukong-huahua/configs/wukong-huahua_inpaint_inference.yaml
@@ -0,0 +1,63 @@
+model:
+  target: ldm.models.diffusion.ddpm.LatentInpaintDiffusion
+  params:
+    linear_start: 0.00085
+    linear_end: 0.0120
+    num_timesteps_cond: 1
+    log_every_t: 200
+    timesteps: 1000
+    first_stage_key: "image"
+    cond_stage_key: "caption"
+    image_size: 64
+    channels: 4
+    cond_stage_trainable: false   # Note: different from the one we trained before
+    conditioning_key: hybrid   # important
+    monitor: val/loss_simple_ema
+    scale_factor: 0.18215
+    finetune_keys: null
+    use_ema: false
+
+    unet_config:
+      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
+      params:
+        image_size: 64 # unused
+        in_channels: 9  # 4 data + 4 downscaled image + 1 mask
+        out_channels: 4
+        model_channels: 320
+        attention_resolutions: [ 4, 2, 1 ]
+        num_res_blocks: 2
+        channel_mult: [ 1, 2, 4, 4 ]
+        num_heads: 8
+        use_spatial_transformer: true
+        transformer_depth: 1
+        context_dim: 768
+        use_checkpoint: true
+        legacy: false
+        use_fp16: True
+
+    first_stage_config:
+      target: ldm.models.autoencoder.AutoencoderKL
+      params:
+        embed_dim: 4
+        monitor: val/rec_loss
+        use_fp16: True
+        ddconfig:
+          double_z: true
+          z_channels: 4
+          resolution: 512
+          in_channels: 3
+          out_ch: 3
+          ch: 128
+          ch_mult:
+          - 1
+          - 2
+          - 4
+          - 4
+          num_res_blocks: 2
+          attn_resolutions: []
+          dropout: 0.0
+
+    cond_stage_config:
+      target: ldm.modules.encoders.modules.FrozenCLIPEmbedder_ZH
+      params:
+        use_fp16: True
diff --git a/vision/wukong-huahua/demo/inpaint/overture-creations-5sI6fQgYIuo.png b/vision/wukong-huahua/demo/inpaint/overture-creations-5sI6fQgYIuo.png
diff --git a/vision/wukong-huahua/demo/inpaint/overture-creations-5sI6fQgYIuo_mask.png b/vision/wukong-huahua/demo/inpaint/overture-creations-5sI6fQgYIuo_mask.png