Skip to content

Commit

Permalink
docs: update README
Browse files Browse the repository at this point in the history
1. Add films showcases.
2. Add community resources.
3. Updating roadmap.
4. Optimizing format.
  • Loading branch information
AricGamma committed Jun 18, 2024
1 parent 31c91b1 commit 11f0892
Showing 1 changed file with 90 additions and 28 deletions.
118 changes: 90 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,28 +24,68 @@
<a href='https://fudan-generative-vision.github.io/hallo/#/'><img src='https://img.shields.io/badge/Project-HomePage-Green'></a>
<a href='https://arxiv.org/pdf/2406.08801'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
<a href='https://huggingface.co/fudan-generative-ai/hallo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>
<a href='https://www.modelscope.cn/models/fudan-generative-vision/Hallo/summary'><img src='https://img.shields.io/badge/Modelscope-Model-purple'></a>
<a href='assets/wechat.jpeg'><img src='https://badges.aleen42.com/src/wechat.svg'></a>
</div>

<br>

# Showcase
## 📸 Showcase


https://github.com/fudan-generative-vision/hallo/assets/17402682/294e78ef-c60d-4c32-8e3c-7f8d6934c6bd

### 🎬 Honoring Classic Films

# Framework
<table class="center">
<tr>
<td style="text-align: center"><b>Devil Wears Prada</b></td>
<td style="text-align: center"><b>Green Book</b></td>
<td style="text-align: center"><b>Infernal Affairs</b></td>
</tr>
<tr>
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Devil_Wears_Prada-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Devil_Wears_Prada_GIF.gif"></a></td>
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Green_Book-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Green_Book_GIF.gif"></a></td>
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/无间道-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Infernal_Affairs_GIF.gif"></a></td>
</tr>
<tr>
<td style="text-align: center"><b>Patch Adams</b></td>
<td style="text-align: center"><b>Tough Love</b></td>
<td style="text-align: center"><b>Shawshank Redemption</b></td>
</tr>
<tr>
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Patch_Adams-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Patch_Adams_GIF.gif"></a></td>
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Tough_Love-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Tough_Love_GIF.gif"></a></td>
<td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Shawshank-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Shawshank_GIF.gif"></a></td>
</tr>
</table>

![abstract](assets/framework_1.jpg)
![framework](assets/framework_2.jpg)
Explore [more examples](https://fudan-generative-vision.github.io/hallo).

## 📰 News

- **`2024/06/15`**: ✨✨✨ Released some images and audios for inference testing on [🤗Huggingface](https://huggingface.co/datasets/fudan-generative-ai/hallo_inference_samples).
- **`2024/06/15`**: 🎉🎉🎉 Launched the first version on 🫡[GitHub](https://github.com/fudan-generative-vision/hallo).

## 🤝 Community Resources

Explore the resources developed by our community to enhance your experience with Hallo:

- [Demo on Huggingface](https://huggingface.co/spaces/multimodalart/hallo) - Check out this easy-to-use Gradio demo by [@multimodalart](https://huggingface.co/multimodalart).
- [hallo-webui](https://github.com/daswer123/hallo-webui) - Explore the WebUI created by [@daswer123](https://github.com/daswer123).
- [hallo-for-windows](https://github.com/sdbds/hallo-for-windows) - Utilize Hallo on Windows with the guide by [@sdbds](https://github.com/sdbds).
- [ComfyUI-Hallo](https://github.com/AIFSH/ComfyUI-Hallo) - Integrate Hallo with the ComfyUI tool by [@AIFSH](https://github.com/AIFSH).

Thanks to all of them.

Join our community and explore these amazing resources to make the most out of Hallo. Enjoy and elevate their creative projects!

# News
## 🔧️ Framework

- **`2024/06/15`**: 🎉🎉🎉 Release the first version on [GitHub](https://github.com/fudan-generative-vision/hallo).
- **`2024/06/15`**: ✨✨✨ Release some images and audios for inference testing on [Huggingface](https://huggingface.co/datasets/fudan-generative-ai/hallo_inference_samples).
![abstract](assets/framework_1.jpg)
![framework](assets/framework_2.jpg)

# Installation
## ⚙️ Installation

- System requirement: Ubuntu 20.04/Ubuntu 22.04, Cuda 12.1
- Tested GPUs: A100
Expand All @@ -69,15 +109,15 @@ Besides, ffmpeg is also need:
apt-get install ffmpeg
```

# Inference
## 🗝️️ Usage

The inference entrypoint script is `scripts/inference.py`. Before testing your cases, there are two preparations need to be completed:

1. [Download all required pretrained models](#download-pretrained-models).
2. [Prepare source image and driving audio pairs](#prepare-inference-data).
3. [Run inference](#run-inference).

## Download pretrained models
### 📥 Download Pretrained Models

You can easily get all pretrained models required by inference from our [HuggingFace repo](https://huggingface.co/fudan-generative-ai/hallo).

Expand All @@ -91,12 +131,12 @@ git clone https://huggingface.co/fudan-generative-ai/hallo pretrained_models
Or you can download them separately from their source repo:

- [hallo](https://huggingface.co/fudan-generative-ai/hallo/tree/main/hallo): Our checkpoints consist of denoising UNet, face locator, image & audio proj.
- [audio_separator](https://huggingface.co/huangjackson/Kim_Vocal_2): Kim\_Vocal\_2 MDX-Net vocal removal model by [KimberleyJensen](https://github.com/KimberleyJensen). (_Thanks to runwayml_)
- [audio_separator](https://huggingface.co/huangjackson/Kim_Vocal_2): Kim\_Vocal\_2 MDX-Net vocal removal model. (_Thanks to [KimberleyJensen](https://github.com/KimberleyJensen)_)
- [insightface](https://github.com/deepinsight/insightface/tree/master/python-package#model-zoo): 2D and 3D Face Analysis placed into `pretrained_models/face_analysis/models/`. (_Thanks to deepinsight_)
- [face landmarker](https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task): Face detection & mesh model from [mediapipe](https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker#models) placed into `pretrained_models/face_analysis/models`.
- [motion module](https://github.com/guoyww/AnimateDiff/blob/main/README.md#202309-animatediff-v2): motion module from [AnimateDiff](https://github.com/guoyww/AnimateDiff). (_Thanks to guoyww_).
- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse): Weights are intended to be used with the diffusers library. (_Thanks to stablilityai_)
- [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5): Initialized and fine-tuned from Stable-Diffusion-v1-2. (_Thanks to runwayml_)
- [motion module](https://github.com/guoyww/AnimateDiff/blob/main/README.md#202309-animatediff-v2): motion module from [AnimateDiff](https://github.com/guoyww/AnimateDiff). (_Thanks to [guoyww](https://github.com/guoyww)_).
- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse): Weights are intended to be used with the diffusers library. (_Thanks to [stablilityai](https://huggingface.co/stabilityai)_)
- [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5): Initialized and fine-tuned from Stable-Diffusion-v1-2. (_Thanks to [runwayml](https://huggingface.co/runwayml)_)
- [wav2vec](https://huggingface.co/facebook/wav2vec2-base-960h): wav audio to vector model from [Facebook](https://huggingface.co/facebook/wav2vec2-base-960h).

Finally, these pretrained models should be organized as follows:
Expand Down Expand Up @@ -137,7 +177,7 @@ Finally, these pretrained models should be organized as follows:
| `-- vocab.json
```

## Prepare Inference Data
### 🛠️ Prepare Inference Data

Hallo has a few simple requirements for input data:

Expand All @@ -153,9 +193,9 @@ For the driving audio:
2. It must be in English since our training datasets are only in this language.
3. Ensure the vocals are clear; background music is acceptable.

We have provided some samples for your reference.
We have provided [some samples](examples/) for your reference.

## Run inference
### 🎮 Run Inference

Simply to run the `scripts/inference.py` and pass `source_image` and `driving_audio` as input:

Expand Down Expand Up @@ -189,31 +229,45 @@ options:
face region
```

# Roadmap
## 📅️ Roadmap

| Status | Milestone | ETA |
| :----: | :---------------------------------------------------------------------------------------------------- | :--------: |
|| **[Inference source code meet everyone on GitHub](https://github.com/fudan-generative-vision/hallo)** | 2024-06-15 |
|| **[Pretrained models on Huggingface](https://huggingface.co/fudan-generative-ai/hallo)** | 2024-06-15 |
| 🚀🚀🚀 | **[Training: data preparation and training scripts]()** | 2024-06-25 |
| 🚀🚀🚀 | **[Optimize inference performance in Mandarin]()** | TBD |
| 🚧 | **[Optimizing Inference Performance]()** | 2024-06-23 |
| 🚧 | **[Optimizing Performance on images with a resolution of 256x256.]()** | 2024-06-23 |
| 🚀 | **[Improving the model's performance on Mandarin Chinese]()** | 2024-06-25 |
| 🚀 | **[Releasing data preparation and training scripts]()** | 2024-06-28 |

<details>
<summary>Other Enhacements</summary>

- [ ] Enhancement: Test and ensure compatibility with Windows operating system. [#39](https://github.com/fudan-generative-vision/hallo/issues/39)
- [ ] Bug: Output video may lose several frames. [#41](https://github.com/fudan-generative-vision/hallo/issues/41)
- [ ] Bug: Sound volume affecting inference results (audio normalization).
- [ ] Enhancement: Inference code logic optimization.
- [ ] Enhancement: Enhancing performance on low resolutions(256x256) to support more efficient usage.

</details>

# Citation

## 📝 Citation

If you find our work useful for your research, please consider citing the paper:

```
@misc{xu2024hallo,
title={Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation},
author={Mingwang Xu and Hui Li and Qingkun Su and Hanlin Shang and Liwei Zhang and Ce Liu and Jingdong Wang and Yao Yao and Siyu zhu},
year={2024},
eprint={2406.08801},
archivePrefix={arXiv},
primaryClass={cs.CV}
author={Mingwang Xu and Hui Li and Qingkun Su and Hanlin Shang and Liwei Zhang and Ce Liu and Jingdong Wang and Yao Yao and Siyu zhu},
year={2024},
eprint={2406.08801},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

# Opportunities available
## 🌟 Opportunities Available

Multiple research positions are open at the **Generative Vision Lab, Fudan University**! Include:

Expand All @@ -224,6 +278,14 @@ Multiple research positions are open at the **Generative Vision Lab, Fudan Unive

Interested individuals are encouraged to contact us at [[email protected]](mailto://[email protected]) for further information.

# Social Risks and Mitigations
## ⚠️ Social Risks and Mitigations

The development of portrait image animation technologies driven by audio inputs poses social risks, such as the ethical implications of creating realistic portraits that could be misused for deepfakes. To mitigate these risks, it is crucial to establish ethical guidelines and responsible use practices. Privacy and consent concerns also arise from using individuals' images and voices. Addressing these involves transparent data usage policies, informed consent, and safeguarding privacy rights. By addressing these risks and implementing mitigations, the research aims to ensure the responsible and ethical development of this technology.

## 👏 Community Contributors

Thank you to all the contributors who have helped to make this project better!

<a href="https://github.com/fudan-generative-vision/hallo/graphs/contributors">
<img src="https://contrib.rocks/image?repo=fudan-generative-vision/hallo" />
</a>

0 comments on commit 11f0892

Please sign in to comment.