diff --git a/README.md b/README.md index 1bb976a6..5edb33ab 100644 --- a/README.md +++ b/README.md @@ -74,7 +74,8 @@ Besides, ffmpeg is also need: The inference entrypoint script is `scripts/inference.py`. Before testing your cases, there are two preparations need to be completed: 1. [Download all required pretrained models](#download-pretrained-models). -2. [Run inference](#run-inference). +2. [Prepare source image and driving audio pairs](#prepare-inference-data). +3. [Run inference](#run-inference). ## Download pretrained models @@ -136,6 +137,24 @@ Finally, these pretrained models should be organized as follows: | `-- vocab.json ``` +## Prepare Inference Data + +Hallo has a few simple requirements for input data: + +For the source image: + +1. It should be cropped into squares. +2. The face should be the main focus, making up 50%-70% of the image. +3. The face should be facing forward, with a rotation angle of less than 30° (no side profiles). + +For the driving audio: + +1. It must be in WAV format. +2. It must be in English since our training datasets are only in this language. +3. Ensure the vocals are clear; background music is acceptable. + +We have provided some samples for your reference. + ## Run inference Simply to run the `scripts/inference.py` and pass `source_image` and `driving_audio` as input: @@ -177,6 +196,7 @@ options: | ✅ | **[Inference source code meet everyone on GitHub](https://github.com/fudan-generative-vision/hallo)** | 2024-06-15 | | ✅ | **[Pretrained models on Huggingface](https://huggingface.co/fudan-generative-ai/hallo)** | 2024-06-15 | | 🚀🚀🚀 | **[Traning: data preparation and training scripts]()** | 2024-06-25 | +| 🚀🚀🚀 | **[Optimize inference performance in Mandarin]()** | TBD | # Citation