⚠️ IMPORTANT: This project currently uses Moondream 2B (2025-01-09 release) via the Hugging Face Transformers library. We will migrate to the official Moondream client libraries once they become available for this version.
- Overview
- Sample Output
- Features
- Prerequisites
- Installation
- Usage
- Output
- Troubleshooting
- Performance Notes
- Dependencies
- Model Details
- License
This project uses the Moondream 2B model to detect faces and their gaze directions in videos. It processes videos frame by frame, visualizing face detections and gaze directions.
Input Video | Processed Output |
---|---|
- Face detection in video frames
- Gaze direction tracking
- Real-time visualization with:
- Colored bounding boxes for faces
- Gradient lines showing gaze direction
- Gaze target points
- Supports multiple faces per frame
- Processes all common video formats (.mp4, .avi, .mov, .mkv)
- Uses Moondream 2 (2025-01-09 release) via Hugging Face Transformers
- Note: Will be migrated to official client libraries in future updates
- No authentication required
- Python 3.8 or later
- CUDA-capable GPU recommended (but CPU mode works too)
- FFmpeg installed on your system
-
Install system dependencies:
# Ubuntu/Debian sudo apt-get update && sudo apt-get install -y libvips42 libvips-dev ffmpeg # CentOS/RHEL sudo yum install vips vips-devel ffmpeg # macOS brew install vips ffmpeg
-
Clone and setup the project:
git clone https://github.com/vikhyat/moondream.git cd moondream/recipes/gaze-detection-video python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
Windows setup requires a few additional steps for proper GPU support and libvips installation.
-
Clone the repository:
git clone [repository-url] cd moondream/recipes/gaze-detection-video
-
Create and activate virtual environment:
python -m venv venv .\venv\Scripts\activate
-
Install PyTorch with CUDA support:
# For NVIDIA GPUs pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-
Install libvips: Download the appropriate version based on your system architecture:
Architecture VIPS Version to Download 32-bit x86 vips-dev-w32-all-8.16.0.zip 64-bit x64 vips-dev-w64-all-8.16.0.zip - Extract the ZIP file
- Copy all DLL files from
vips-dev-8.16\bin
to either:- Your project's root directory (easier) OR
C:\Windows\System32
(requires admin privileges)
- Add to PATH:
- Open System Properties → Advanced → Environment Variables
- Under System Variables, find PATH
- Add the full path to the
vips-dev-8.16\bin
directory
-
Install FFmpeg:
- Download from https://ffmpeg.org/download.html#build-windows
- Extract and add the
bin
folder to your system PATH (similar to step 4) or to the project root directory
-
Install other dependencies:
pip install -r requirements.txt
-
Place your input videos in the
input
directory- Supported formats: .mp4, .avi, .mov, .mkv
- The directory will be created automatically if it doesn't exist
-
Run the script:
python gaze-detection-video.py
-
The script will:
- Process all videos in the input directory
- Show progress bars for each video
- Save processed videos to the
output
directory with prefix 'processed_'
- Processed videos are saved as
output/processed_[original_name].[ext]
- Each frame in the output video shows:
- Colored boxes around detected faces
- Lines indicating gaze direction
- Points showing where each person is looking
-
CUDA/GPU Issues:
- Ensure you have CUDA installed for GPU support
- The script will automatically fall back to CPU if no GPU is available
-
Memory Issues:
- If processing large videos, ensure you have enough RAM
- Consider reducing video resolution if needed
-
libvips Errors:
- Make sure libvips is properly installed for your OS
- Check system PATH includes libvips
-
Video Format Issues:
- Ensure FFmpeg is installed and in your system PATH
- Try converting problematic videos to MP4 format
- GPU processing is significantly faster than CPU
- Processing time depends on:
- Video resolution
- Number of faces per frame
- Frame rate
- Available computing power
- transformers (for Moondream 2 model access)
- torch
- opencv-python
- pillow
- matplotlib
- numpy
- tqdm
- pyvips
- accelerate
- einops
⚠️ IMPORTANT: This project currently uses Moondream 2 (2025-01-09 release) via the Hugging Face Transformers library. We will migrate to the official Moondream client libraries once they become available for this version.
The model is loaded using: