forked from fudan-generative-vision/hallo
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1. Add films showcases. 2. Add community resources. 3. Updating roadmap. 4. Optimizing format.
- Loading branch information
Showing
1 changed file
with
90 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,28 +24,68 @@ | |
<a href='https://fudan-generative-vision.github.io/hallo/#/'><img src='https://img.shields.io/badge/Project-HomePage-Green'></a> | ||
<a href='https://arxiv.org/pdf/2406.08801'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> | ||
<a href='https://huggingface.co/fudan-generative-ai/hallo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a> | ||
<a href='https://www.modelscope.cn/models/fudan-generative-vision/Hallo/summary'><img src='https://img.shields.io/badge/Modelscope-Model-purple'></a> | ||
<a href='assets/wechat.jpeg'><img src='https://badges.aleen42.com/src/wechat.svg'></a> | ||
</div> | ||
|
||
<br> | ||
|
||
# Showcase | ||
## 📸 Showcase | ||
|
||
|
||
https://github.com/fudan-generative-vision/hallo/assets/17402682/294e78ef-c60d-4c32-8e3c-7f8d6934c6bd | ||
|
||
### 🎬 Honoring Classic Films | ||
|
||
# Framework | ||
<table class="center"> | ||
<tr> | ||
<td style="text-align: center"><b>Devil Wears Prada</b></td> | ||
<td style="text-align: center"><b>Green Book</b></td> | ||
<td style="text-align: center"><b>Infernal Affairs</b></td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Devil_Wears_Prada-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Devil_Wears_Prada_GIF.gif"></a></td> | ||
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Green_Book-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Green_Book_GIF.gif"></a></td> | ||
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/无间道-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Infernal_Affairs_GIF.gif"></a></td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: center"><b>Patch Adams</b></td> | ||
<td style="text-align: center"><b>Tough Love</b></td> | ||
<td style="text-align: center"><b>Shawshank Redemption</b></td> | ||
</tr> | ||
<tr> | ||
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Patch_Adams-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Patch_Adams_GIF.gif"></a></td> | ||
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Tough_Love-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Tough_Love_GIF.gif"></a></td> | ||
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Shawshank-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Shawshank_GIF.gif"></a></td> | ||
</tr> | ||
</table> | ||
|
||
![abstract](assets/framework_1.jpg) | ||
![framework](assets/framework_2.jpg) | ||
Explore [more examples](https://fudan-generative-vision.github.io/hallo). | ||
|
||
## 📰 News | ||
|
||
- **`2024/06/15`**: ✨✨✨ Released some images and audios for inference testing on [🤗Huggingface](https://huggingface.co/datasets/fudan-generative-ai/hallo_inference_samples). | ||
- **`2024/06/15`**: 🎉🎉🎉 Launched the first version on 🫡[GitHub](https://github.com/fudan-generative-vision/hallo). | ||
|
||
## 🤝 Community Resources | ||
|
||
Explore the resources developed by our community to enhance your experience with Hallo: | ||
|
||
- [Demo on Huggingface](https://huggingface.co/spaces/multimodalart/hallo) - Check out this easy-to-use Gradio demo by [@multimodalart](https://huggingface.co/multimodalart). | ||
- [hallo-webui](https://github.com/daswer123/hallo-webui) - Explore the WebUI created by [@daswer123](https://github.com/daswer123). | ||
- [hallo-for-windows](https://github.com/sdbds/hallo-for-windows) - Utilize Hallo on Windows with the guide by [@sdbds](https://github.com/sdbds). | ||
- [ComfyUI-Hallo](https://github.com/AIFSH/ComfyUI-Hallo) - Integrate Hallo with the ComfyUI tool by [@AIFSH](https://github.com/AIFSH). | ||
|
||
Thanks to all of them. | ||
|
||
Join our community and explore these amazing resources to make the most out of Hallo. Enjoy and elevate their creative projects! | ||
|
||
# News | ||
## 🔧️ Framework | ||
|
||
- **`2024/06/15`**: 🎉🎉🎉 Release the first version on [GitHub](https://github.com/fudan-generative-vision/hallo). | ||
- **`2024/06/15`**: ✨✨✨ Release some images and audios for inference testing on [Huggingface](https://huggingface.co/datasets/fudan-generative-ai/hallo_inference_samples). | ||
![abstract](assets/framework_1.jpg) | ||
![framework](assets/framework_2.jpg) | ||
|
||
# Installation | ||
## ⚙️ Installation | ||
|
||
- System requirement: Ubuntu 20.04/Ubuntu 22.04, Cuda 12.1 | ||
- Tested GPUs: A100 | ||
|
@@ -69,15 +109,15 @@ Besides, ffmpeg is also need: | |
apt-get install ffmpeg | ||
``` | ||
|
||
# Inference | ||
## 🗝️️ Usage | ||
|
||
The inference entrypoint script is `scripts/inference.py`. Before testing your cases, there are two preparations need to be completed: | ||
|
||
1. [Download all required pretrained models](#download-pretrained-models). | ||
2. [Prepare source image and driving audio pairs](#prepare-inference-data). | ||
3. [Run inference](#run-inference). | ||
|
||
## Download pretrained models | ||
### 📥 Download Pretrained Models | ||
|
||
You can easily get all pretrained models required by inference from our [HuggingFace repo](https://huggingface.co/fudan-generative-ai/hallo). | ||
|
||
|
@@ -91,12 +131,12 @@ git clone https://huggingface.co/fudan-generative-ai/hallo pretrained_models | |
Or you can download them separately from their source repo: | ||
|
||
- [hallo](https://huggingface.co/fudan-generative-ai/hallo/tree/main/hallo): Our checkpoints consist of denoising UNet, face locator, image & audio proj. | ||
- [audio_separator](https://huggingface.co/huangjackson/Kim_Vocal_2): Kim\_Vocal\_2 MDX-Net vocal removal model by [KimberleyJensen](https://github.com/KimberleyJensen). (_Thanks to runwayml_) | ||
- [audio_separator](https://huggingface.co/huangjackson/Kim_Vocal_2): Kim\_Vocal\_2 MDX-Net vocal removal model. (_Thanks to [KimberleyJensen](https://github.com/KimberleyJensen)_) | ||
- [insightface](https://github.com/deepinsight/insightface/tree/master/python-package#model-zoo): 2D and 3D Face Analysis placed into `pretrained_models/face_analysis/models/`. (_Thanks to deepinsight_) | ||
- [face landmarker](https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task): Face detection & mesh model from [mediapipe](https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker#models) placed into `pretrained_models/face_analysis/models`. | ||
- [motion module](https://github.com/guoyww/AnimateDiff/blob/main/README.md#202309-animatediff-v2): motion module from [AnimateDiff](https://github.com/guoyww/AnimateDiff). (_Thanks to guoyww_). | ||
- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse): Weights are intended to be used with the diffusers library. (_Thanks to stablilityai_) | ||
- [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5): Initialized and fine-tuned from Stable-Diffusion-v1-2. (_Thanks to runwayml_) | ||
- [motion module](https://github.com/guoyww/AnimateDiff/blob/main/README.md#202309-animatediff-v2): motion module from [AnimateDiff](https://github.com/guoyww/AnimateDiff). (_Thanks to [guoyww](https://github.com/guoyww)_). | ||
- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse): Weights are intended to be used with the diffusers library. (_Thanks to [stablilityai](https://huggingface.co/stabilityai)_) | ||
- [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5): Initialized and fine-tuned from Stable-Diffusion-v1-2. (_Thanks to [runwayml](https://huggingface.co/runwayml)_) | ||
- [wav2vec](https://huggingface.co/facebook/wav2vec2-base-960h): wav audio to vector model from [Facebook](https://huggingface.co/facebook/wav2vec2-base-960h). | ||
|
||
Finally, these pretrained models should be organized as follows: | ||
|
@@ -137,7 +177,7 @@ Finally, these pretrained models should be organized as follows: | |
| `-- vocab.json | ||
``` | ||
|
||
## Prepare Inference Data | ||
### 🛠️ Prepare Inference Data | ||
|
||
Hallo has a few simple requirements for input data: | ||
|
||
|
@@ -153,9 +193,9 @@ For the driving audio: | |
2. It must be in English since our training datasets are only in this language. | ||
3. Ensure the vocals are clear; background music is acceptable. | ||
|
||
We have provided some samples for your reference. | ||
We have provided [some samples](examples/) for your reference. | ||
|
||
## Run inference | ||
### 🎮 Run Inference | ||
|
||
Simply to run the `scripts/inference.py` and pass `source_image` and `driving_audio` as input: | ||
|
||
|
@@ -189,31 +229,45 @@ options: | |
face region | ||
``` | ||
|
||
# Roadmap | ||
## 📅️ Roadmap | ||
|
||
| Status | Milestone | ETA | | ||
| :----: | :---------------------------------------------------------------------------------------------------- | :--------: | | ||
| ✅ | **[Inference source code meet everyone on GitHub](https://github.com/fudan-generative-vision/hallo)** | 2024-06-15 | | ||
| ✅ | **[Pretrained models on Huggingface](https://huggingface.co/fudan-generative-ai/hallo)** | 2024-06-15 | | ||
| 🚀🚀🚀 | **[Training: data preparation and training scripts]()** | 2024-06-25 | | ||
| 🚀🚀🚀 | **[Optimize inference performance in Mandarin]()** | TBD | | ||
| 🚧 | **[Optimizing Inference Performance]()** | 2024-06-23 | | ||
| 🚧 | **[Optimizing Performance on images with a resolution of 256x256.]()** | 2024-06-23 | | ||
| 🚀 | **[Improving the model's performance on Mandarin Chinese]()** | 2024-06-25 | | ||
| 🚀 | **[Releasing data preparation and training scripts]()** | 2024-06-28 | | ||
|
||
<details> | ||
<summary>Other Enhacements</summary> | ||
|
||
- [ ] Enhancement: Test and ensure compatibility with Windows operating system. [#39](https://github.com/fudan-generative-vision/hallo/issues/39) | ||
- [ ] Bug: Output video may lose several frames. [#41](https://github.com/fudan-generative-vision/hallo/issues/41) | ||
- [ ] Bug: Sound volume affecting inference results (audio normalization). | ||
- [ ] Enhancement: Inference code logic optimization. | ||
- [ ] Enhancement: Enhancing performance on low resolutions(256x256) to support more efficient usage. | ||
|
||
</details> | ||
|
||
# Citation | ||
|
||
## 📝 Citation | ||
|
||
If you find our work useful for your research, please consider citing the paper: | ||
|
||
``` | ||
@misc{xu2024hallo, | ||
title={Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation}, | ||
author={Mingwang Xu and Hui Li and Qingkun Su and Hanlin Shang and Liwei Zhang and Ce Liu and Jingdong Wang and Yao Yao and Siyu zhu}, | ||
year={2024}, | ||
eprint={2406.08801}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CV} | ||
author={Mingwang Xu and Hui Li and Qingkun Su and Hanlin Shang and Liwei Zhang and Ce Liu and Jingdong Wang and Yao Yao and Siyu zhu}, | ||
year={2024}, | ||
eprint={2406.08801}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CV} | ||
} | ||
``` | ||
|
||
# Opportunities available | ||
## 🌟 Opportunities Available | ||
|
||
Multiple research positions are open at the **Generative Vision Lab, Fudan University**! Include: | ||
|
||
|
@@ -224,6 +278,14 @@ Multiple research positions are open at the **Generative Vision Lab, Fudan Unive | |
|
||
Interested individuals are encouraged to contact us at [[email protected]](mailto://[email protected]) for further information. | ||
|
||
# Social Risks and Mitigations | ||
## ⚠️ Social Risks and Mitigations | ||
|
||
The development of portrait image animation technologies driven by audio inputs poses social risks, such as the ethical implications of creating realistic portraits that could be misused for deepfakes. To mitigate these risks, it is crucial to establish ethical guidelines and responsible use practices. Privacy and consent concerns also arise from using individuals' images and voices. Addressing these involves transparent data usage policies, informed consent, and safeguarding privacy rights. By addressing these risks and implementing mitigations, the research aims to ensure the responsible and ethical development of this technology. | ||
|
||
## 👏 Community Contributors | ||
|
||
Thank you to all the contributors who have helped to make this project better! | ||
|
||
<a href="https://github.com/fudan-generative-vision/hallo/graphs/contributors"> | ||
<img src="https://contrib.rocks/image?repo=fudan-generative-vision/hallo" /> | ||
</a> |