Gaze Detection Video Processor

⚠️ IMPORTANT: This project currently uses Moondream 2B (2025-01-09 release) via the Hugging Face Transformers library. We will migrate to the official Moondream client libraries once they become available for this version.

Overview

This project uses the Moondream 2B model to detect faces and their gaze directions in videos. It processes videos frame by frame, visualizing face detections and gaze directions.

Sample Output

Input Video	Processed Output

Features

Face detection in video frames
Gaze direction tracking
Real-time visualization with:
- Colored bounding boxes for faces
- Gradient lines showing gaze direction
- Gaze target points
Supports multiple faces per frame
Processes all common video formats (.mp4, .avi, .mov, .mkv)
Uses Moondream 2 (2025-01-09 release) via Hugging Face Transformers
- Note: Will be migrated to official client libraries in future updates
- No authentication required

Prerequisites

Python 3.8 or later
CUDA-capable GPU recommended (but CPU mode works too)
FFmpeg installed on your system

Installation

Linux/macOS Installation

Install system dependencies:

# Ubuntu/Debian
sudo apt-get update && sudo apt-get install -y libvips42 libvips-dev ffmpeg

# CentOS/RHEL
sudo yum install vips vips-devel ffmpeg

# macOS
brew install vips ffmpeg

Clone and setup the project:

git clone https://github.com/vikhyat/moondream.git
cd moondream/recipes/gaze-detection-video
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Windows Installation

Windows setup requires a few additional steps for proper GPU support and libvips installation.

Clone the repository:

git clone [repository-url]
cd moondream/recipes/gaze-detection-video

Create and activate virtual environment:

python -m venv venv
.\venv\Scripts\activate

Install PyTorch with CUDA support:

# For NVIDIA GPUs
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Install libvips: Download the appropriate version based on your system architecture:

Architecture VIPS Version to Download

32-bit x86 vips-dev-w32-all-8.16.0.zip

64-bit x64 vips-dev-w64-all-8.16.0.zip
- Extract the ZIP file
- Copy all DLL files from vips-dev-8.16\bin to either:
  - Your project's root directory (easier) OR
  - C:\Windows\System32 (requires admin privileges)
- Add to PATH:
  1. Open System Properties → Advanced → Environment Variables
  2. Under System Variables, find PATH
  3. Add the full path to the vips-dev-8.16\bin directory
Install FFmpeg:
- Download from https://ffmpeg.org/download.html#build-windows
- Extract and add the bin folder to your system PATH (similar to step 4) or to the project root directory
Install other dependencies:
```
pip install -r requirements.txt
```

Usage

Place your input videos in the input directory
- Supported formats: .mp4, .avi, .mov, .mkv
- The directory will be created automatically if it doesn't exist
Run the script:
```
python gaze-detection-video.py
```
The script will:
- Process all videos in the input directory
- Show progress bars for each video
- Save processed videos to the output directory with prefix 'processed_'

Output

Processed videos are saved as output/processed_[original_name].[ext]
Each frame in the output video shows:
- Colored boxes around detected faces
- Lines indicating gaze direction
- Points showing where each person is looking

Troubleshooting

CUDA/GPU Issues:
- Ensure you have CUDA installed for GPU support
- The script will automatically fall back to CPU if no GPU is available
Memory Issues:
- If processing large videos, ensure you have enough RAM
- Consider reducing video resolution if needed
libvips Errors:
- Make sure libvips is properly installed for your OS
- Check system PATH includes libvips
Video Format Issues:
- Ensure FFmpeg is installed and in your system PATH
- Try converting problematic videos to MP4 format

Performance Notes

GPU processing is significantly faster than CPU
Processing time depends on:
- Video resolution
- Number of faces per frame
- Frame rate
- Available computing power

Dependencies

transformers (for Moondream 2 model access)
torch
opencv-python
pillow
matplotlib
numpy
tqdm
pyvips
accelerate
einops

Model Details

⚠️ IMPORTANT: This project currently uses Moondream 2 (2025-01-09 release) via the Hugging Face Transformers library. We will migrate to the official Moondream client libraries once they become available for this version.

The model is loaded using:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Gaze Detection Video Processor

Table of Contents

Overview

Sample Output

Features

Prerequisites

Installation

Linux/macOS Installation

Windows Installation

Usage

Output

Troubleshooting

Performance Notes

Dependencies

Model Details

Architecture	VIPS Version to Download
32-bit x86	vips-dev-w32-all-8.16.0.zip
64-bit x64	vips-dev-w64-all-8.16.0.zip

Files

README.md

Latest commit

History

README.md

File metadata and controls

Gaze Detection Video Processor

Table of Contents

Overview

Sample Output

Features

Prerequisites

Installation

Linux/macOS Installation

Windows Installation

Usage

Output

Troubleshooting

Performance Notes

Dependencies

Model Details