An advanced AI-powered video generation tool that creates realistic talking avatars from images and audio.
- 🎭 Avatar Generation: Create realistic talking avatars from static images
- 🗣️ Voice Processing: Advanced audio diarization using pyannote.audio
- 🎬 Video Synthesis: High-quality video generation with customizable settings
- 🔄 Multi-pose Support: Generate videos with multiple facial poses
- 🎨 Background Customization: Flexible background handling options
- Python 3.8+
- CUDA-compatible GPU (recommended)
- FFmpeg
- Hugging Face account and access token
- Set up Python environment:
conda create --name hallo-there
conda activate hallo-there
- Clone the repository:
git clone https://github.com/hiktan44/hallo-there.git
cd hallo-there
- Install dependencies:
pip install -r requirements.txt
pip install .
- Install FFmpeg:
- Linux:
sudo apt-get install ffmpeg
- Windows: Download from official FFmpeg website and add to system PATH
-
Create Hugging Face access token:
- Visit Hugging Face Token Settings
- Generate new token with required permissions
-
Set up diarization:
python diarization.py -access_token <YOUR_HUGGING_FACE_TOKEN>
hallo-there/
├── source_images/ # Input images (512x512)
├── audio/ # Input audio files
├── diarization/ # Diarization output
├── output/ # Generated video clips
└── docs/ # Documentation
-
Prepare source images:
- 512x512 pixel squares
- Face should occupy 50-70% of image
- Place in
source_images/
directory
-
Prepare audio:
- Convert to WAV format
- Place in
audio/input_audio.wav
-
Generate video:
python generate_videos.py
python combine_videos.py
-mode full
: Enable subtle head movements during silence-background custom
: Use custom background image-quality high
: Generate higher quality output
Detailed documentation available in docs/ directory:
Contributions welcome! Please read CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see LICENSE file for details.
- pyannote.audio for audio diarization
- Hugging Face for AI models and infrastructure