Skip to content

Commit

Permalink
V2v (TMElyralab#3)
Browse files Browse the repository at this point in the history
* add video2video
add musev_referencenet_pose

* update datas
  • Loading branch information
itechmusic authored Mar 27, 2024
1 parent 48eb08a commit f58a159
Show file tree
Hide file tree
Showing 10 changed files with 59 additions and 26 deletions.
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -326,12 +326,13 @@ cd MuseV
```bash
git clone https://huggingface.co/TMElyralab/MuseV ./checkpoints
```
- `motion`: text2video model trained on tiny `ucf101` and tiny `webvid` dataset, approximately 60K videos text pairs.
- `musev/unet`: only has and train `unet` motion module.
- `motion`: text2video model, trained on tiny `ucf101` and tiny `webvid` dataset, approximately 60K videos text pairs.
- `musev/unet`: only has and train `unet` motion module, need less gpu memory.
- `musev_referencenet`: train `unet` module, `referencenet`, `IPAdapter`
- `unet`: `motion` module, which has `to_k`, `to_v` in `Attention` layer refer to `IPAdapter`
- `referencenet`: similar to `AnimateAnyone`
- `ip_adapter_image_proj.bin`: images clip emb project layer, refer to `IPAdapter`
- `musev_referencenet_pose`: based on `musev_referencenet`, fix `referencenet`and `controlnet_pose`, train `unet motion` and `IPAdapter`
- `t2i/sd1.5`: text2image model, paramter are frozen when training motion module.
- majicmixRealv6Fp16: example, could be replaced with other t2i base. download from [majicmixRealv6Fp16](https://civitai.com/models/43331/majicmix-realistic)
- `IP-Adapter/models`: download from [IPAdapter](https://huggingface.co/h94/IP-Adapter/tree/main)
Expand Down Expand Up @@ -388,12 +389,19 @@ python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16 --un

Most of paramters are same as `musev_text2video`. Special parameters of `video2video` are
1. need set `video_path` in `test_data`. Now support `rgb video` and `controlnet_middle_video`
- `need_video2video`: whether `rgb` video influence initial noise.
- `which2video`: whether `rgb` video influence initial noise, more than controlnet condition. If True, redraw video.
- `controlnet_name`:whether use `controlnet condition`, such as `dwpose,depth`.
- `video_is_middle`: `video_path` is `rgb video` or `controlnet_middle_video`. could set for every `test_data` in test_data_path.
- `video_has_condition`: whether condtion_images is aligned with the first frame of video_path. If Not, firstly generate `condition_images` and align with concatation. set in `test_data`

### musev_referencenet_pose
only used for `pose2video`
```bash
python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16 --unet_model_name musev_referencenet --referencenet_model_name musev_referencenet --ip_adapter_model_name musev_referencenet -test_data_path ./configs/tasks/example.yaml --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder --output_dir ./output --n_batch 1 --controlnet_name dwpose_body_hand --which2video "video_middle" --target_datas wavehand --fps 12 --time_size 12
```

### musev
only has motion module, no referencenet, need less gpu memory.
#### text2video
```bash
python scripts/inference/text2video.py --sd_model_name majicmixRealv6Fp16 --unet_model_name musev -test_data_path ./configs/tasks/example.yaml --output_dir ./output --n_batch 1 --target_datas yongen --time_size 12 --fps 12
Expand Down
10 changes: 10 additions & 0 deletions configs/model/ip_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,14 @@
"clip_embeddings_dim": 1024,
"desp": "",
},
"musev_referencenet_pose": {
"ip_image_encoder": os.path.join(IPAdapterModelDir, "image_encoder"),
"ip_ckpt": os.path.join(
MotionDir, "musev_referencenet_pose/ip_adapter_image_proj.bin"
),
"ip_scale": 1.0,
"clip_extra_context_tokens": 4,
"clip_embeddings_dim": 1024,
"desp": "",
},
}
8 changes: 6 additions & 2 deletions configs/model/motion_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,14 @@
MODEL_CFG = {
"musev": {
"unet": os.path.join(MotionDIr, "musev"),
"desp": "",
"desp": "only train unet motion module, fix t2i",
},
"musev_referencenet": {
"unet": os.path.join(MotionDIr, "musev_referencenet"),
"desp": "",
"desp": "train referencenet, IPAdapter and unet motion module, fix t2i",
},
"musev_referencenet_pose": {
"unet": os.path.join(MotionDIr, "musev_referencenet_pose"),
"desp": "train unet motion module and IPAdapter, fix t2i and referencenet",
},
}
48 changes: 29 additions & 19 deletions configs/tasks/example.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# text/image2video
- condition_images: /cfs-datasets/projects/ProjectV/data/lol/yongen.jpeg
- condition_images: ./data/images/yongen.jpeg
eye_blinks_factor: 1.8
height: 1308
img_length_ratio: 0.957
Expand All @@ -10,7 +10,7 @@
refer_image: ${.condition_images}
video_path: null
width: 736
- condition_images: /cfs-datasets/projects/ProjectV/data/lol/jinkesi2.jpeg
- condition_images: ./data/images/jinkesi2.jpeg
eye_blinks_factor: 1.8
height: 714
img_length_ratio: 1.25
Expand All @@ -21,7 +21,7 @@
refer_image: ${.condition_images}
video_path: null
width: 563
- condition_images: /cfs-datasets/projects/ProjectV/data/scenes/seaside4.jpeg
- condition_images: ./data/images/seaside4.jpeg
eye_blinks_factor: 1.8
height: 317
img_length_ratio: 2.221
Expand All @@ -31,7 +31,7 @@
refer_image: ${.condition_images}
video_path: null
width: 564
- condition_images: /cfs-datasets/projects/ProjectV/data/portraits/real_girl_seaside2.jpeg
- condition_images: ./data/images/real_girl_seaside2.jpeg
eye_blinks_factor: 1.8
height: 1029
img_length_ratio: 0.958
Expand All @@ -41,7 +41,7 @@
refer_image: ${.condition_images}
video_path: null
width: 735
- condition_images: /cfs-datasets/projects/ProjectV/data/portraits/seaside_girl.jpeg
- condition_images: ./data/images/seaside_girl.jpeg
eye_blinks_factor: 1.8
height: 736
img_length_ratio: 0.957
Expand All @@ -51,7 +51,7 @@
refer_image: ${.condition_images}
video_path: null
width: 736
- condition_images: /cfs-datasets/projects/ProjectV/data/portraits/boy_play_guitar.jpeg
- condition_images: ./data/images/boy_play_guitar.jpeg
eye_blinks_factor: 1.8
height: 846
img_length_ratio: 1.248
Expand All @@ -61,7 +61,7 @@
refer_image: ${.condition_images}
video_path: null
width: 564
- condition_images: /cfs-datasets/projects/ProjectV/data/portraits/girl_play_guitar2.jpeg
- condition_images: ./data/images/girl_play_guitar2.jpeg
eye_blinks_factor: 1.8
height: 1002
img_length_ratio: 1.248
Expand All @@ -71,7 +71,7 @@
refer_image: ${.condition_images}
video_path: null
width: 564
- condition_images: /cfs-datasets/projects/ProjectV/data/portraits/boy_play_guitar2.jpeg
- condition_images: ./data/images/boy_play_guitar2.jpeg
eye_blinks_factor: 1.8
height: 630
img_length_ratio: 1.676
Expand All @@ -81,7 +81,7 @@
refer_image: ${.condition_images}
video_path: null
width: 420
- condition_images: /cfs-datasets/projects/ProjectV/data/portraits/girl_play_guitar4.jpeg
- condition_images: ./data/images/girl_play_guitar4.jpeg
eye_blinks_factor: 1.8
height: 846
img_length_ratio: 1.248
Expand All @@ -91,7 +91,7 @@
refer_image: ${.condition_images}
video_path: null
width: 564
- condition_images: /cfs-datasets/projects/ProjectV/data/famous_images/dufu.jpeg
- condition_images: ./data/images/dufu.jpeg
eye_blinks_factor: 1.8
height: 500
img_length_ratio: 1.495
Expand All @@ -102,18 +102,18 @@
refer_image: ${.condition_images}
video_path: null
width: 471
- condition_images: /cfs-datasets/projects/ProjectV/data/famous_images/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg
- condition_images: ./data/images/Mona_Lisa..jpg
eye_blinks_factor: 1.8
height: 894
img_length_ratio: 1.173
ipadapter_image: ${.condition_images}
name: Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched
name: Mona_Lisa.
prompt: (masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face,
soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3)
refer_image: ${.condition_images}
video_path: null
width: 600
- condition_images: /cfs-datasets/projects/ProjectV/data/famous_images/Portrait-of-Dr.-Gachet.jpg
- condition_images: ./data/images/Portrait-of-Dr.-Gachet.jpg
eye_blinks_factor: 1.8
height: 985
img_length_ratio: 0.88
Expand All @@ -124,7 +124,7 @@
refer_image: ${.condition_images}
video_path: null
width: 800
- condition_images: /cfs-datasets/projects/ProjectV/data/famous_images/Self-Portrait-with-Cropped-Hair.jpg
- condition_images: ./data/images/Self-Portrait-with-Cropped-Hair.jpg
eye_blinks_factor: 1.8
height: 565
img_length_ratio: 1.246
Expand All @@ -135,7 +135,7 @@
refer_image: ${.condition_images}
video_path: null
width: 848
- condition_images: /cfs-datasets/projects/ProjectV/data/famous_images/The-Laughing-Cavalier.jpg
- condition_images: ./data/images/The-Laughing-Cavalier.jpg
eye_blinks_factor: 1.8
height: 1462
img_length_ratio: 0.587
Expand All @@ -148,7 +148,7 @@
width: 1200

# scene
- condition_images: /cfs-datasets/projects/ProjectV/data/scenes/waterfall4.jpeg
- condition_images: ./data/images/waterfall4.jpeg
eye_blinks_factor: 1.8
height: 846
img_length_ratio: 1.248
Expand All @@ -159,7 +159,7 @@
refer_image: ${.condition_images}
video_path: null
width: 564
- condition_images: /cfs-datasets/projects/ProjectV/data/scenes/river.jpeg
- condition_images: ./data/images/river.jpeg
eye_blinks_factor: 1.8
height: 736
img_length_ratio: 0.957
Expand All @@ -169,7 +169,7 @@
refer_image: ${.condition_images}
video_path: null
width: 736
- condition_images: /cfs-datasets/projects/ProjectV/data/scenes/seaside2.jpeg
- condition_images: ./data/images/seaside2.jpeg
eye_blinks_factor: 1.8
height: 1313
img_length_ratio: 0.957
Expand All @@ -183,10 +183,20 @@
# video2video
- name: "bilibili_queencard"
prompt: "(best quality), ((masterpiece)), (highres), illustration, original, extremely detailed wallpaper"
video_path: /cfs-datasets/projects/ProjectV/data/bilibili_queencard_s2_8s.mp4
video_path: ./data/bilibili_queencard_s2_8s.mp4
condition_images: null
refer_image: ${.condition_images}
ipadapter_image: ${.condition_images}
height: 1280
width: 720
img_length_ratio: 1.0

- name: "wavehand"
prompt: "(best quality), ((masterpiece)), (highres), illustration, original, extremely detailed wallpaper"
video_path: ./data/source_video/bilibili_queencard_s2_8s.mp4
condition_images: null
refer_image: ${.condition_images}
ipadapter_image: ${.condition_images}
height: 1280
width: 720
img_length_ratio: 1.0
Binary file removed data/source_video/bilibili_queencard_s2_8s.mp4
Binary file not shown.
Binary file removed data/source_video/dance_boy.mp4
Binary file not shown.
Binary file removed data/source_video/yuhewei_dance_8s.mp4
Binary file not shown.
1 change: 1 addition & 0 deletions musev/models/ip_adapter_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ def load_ip_adapter_image_proj_by_name(
if model_name in [
"IPAdapter",
"musev_referencenet",
"musev_referencenet_pose",
]:
ip_adapter_image_proj = ImageProjModel(
cross_attention_dim=cross_attention_dim,
Expand Down
3 changes: 2 additions & 1 deletion musev/models/unet_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,7 @@ def load_unet_by_name(
)
elif model_name in [
"musev_referencenet",
"musev_referencenet_pose",
]:
unet = load_unet(
sd_unet_model=sd_unet_model,
Expand All @@ -267,6 +268,6 @@ def load_unet_by_name(
)
else:
raise ValueError(
f"unsupport model_name={model_name}, only support musev, musev_referencenet"
f"unsupport model_name={model_name}, only support musev, musev_referencenet, musev_referencenet_pose"
)
return unet
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,6 @@ IProgress==0.4
markupsafe==2.0.1
xlsxwriter
cuid
git+https://github.com/tencent-ailab/IP-Adapter.git
git+https://github.com/tencent-ailab/IP-Adapter.git@main
git+https://github.com/openai/CLIP.git
git+https://github.com/TMElyralab/MMCM.git
Expand Down

0 comments on commit f58a159

Please sign in to comment.