Skip to content

hiktan44/hallo-there

 
 

Repository files navigation

Hallo There! 🎥

An advanced AI-powered video generation tool that creates realistic talking avatars from images and audio.

🌟 Features

  • 🎭 Avatar Generation: Create realistic talking avatars from static images
  • 🗣️ Voice Processing: Advanced audio diarization using pyannote.audio
  • 🎬 Video Synthesis: High-quality video generation with customizable settings
  • 🔄 Multi-pose Support: Generate videos with multiple facial poses
  • 🎨 Background Customization: Flexible background handling options

📋 Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (recommended)
  • FFmpeg
  • Hugging Face account and access token

🚀 Installation

  1. Set up Python environment:
conda create --name hallo-there
conda activate hallo-there
  1. Clone the repository:
git clone https://github.com/hiktan44/hallo-there.git
cd hallo-there
  1. Install dependencies:
pip install -r requirements.txt
pip install .
  1. Install FFmpeg:
  • Linux: sudo apt-get install ffmpeg
  • Windows: Download from official FFmpeg website and add to system PATH

⚙️ Configuration

  1. Create Hugging Face access token:

  2. Set up diarization:

python diarization.py -access_token <YOUR_HUGGING_FACE_TOKEN>

📁 Project Structure

hallo-there/
├── source_images/      # Input images (512x512)
├── audio/             # Input audio files
├── diarization/       # Diarization output
├── output/           # Generated video clips
└── docs/             # Documentation

🎮 Usage

  1. Prepare source images:

    • 512x512 pixel squares
    • Face should occupy 50-70% of image
    • Place in source_images/ directory
  2. Prepare audio:

    • Convert to WAV format
    • Place in audio/input_audio.wav
  3. Generate video:

python generate_videos.py
python combine_videos.py

🔧 Advanced Options

  • -mode full: Enable subtle head movements during silence
  • -background custom: Use custom background image
  • -quality high: Generate higher quality output

📚 Documentation

Detailed documentation available in docs/ directory:

🤝 Contributing

Contributions welcome! Please read CONTRIBUTING.md for guidelines.

📄 License

This project is licensed under the MIT License - see LICENSE file for details.

🙏 Acknowledgments

  • pyannote.audio for audio diarization
  • Hugging Face for AI models and infrastructure

About

Multi-person podcast audio to videocast

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%