The idea is to build application for a real-time face detection and recognition using Tensorflow and a notebook's webcam. The model for face prediction should be easy to update online to add new targets.
- Tensorflow 1.7 and python 3
- Everything should be dockerized and easy to reproduce!
Just type:
docker run -it --rm -p 5000:5000 btwardow/tf-face-recognition:1.0.0
Then got to https://localhost:5000/ or type it in your browser to get face detection (without recognition for now).
Note: HTTPS is required from many modern browsers to transfer video outside the localhost, without making any unsafe settings to your browser.
Type this in the root project's directory in order to:
Use main target from Makefile of main directory:
make
docker run --rm -it -p 5000:5000 -v /$(pwd):/workspace btwardow/tf-face-recognition:dev
This volume mapping is very convenient for the development and testing purposes.
To use GPU power - there is dedicated Dockerfile.gpu.
Running application without docker is useful for development. Below is quick how to for *nix environments.
Creating virtual env (with Conda) and installing requirements:
conda create -y -n face_recognition_36 python=3.6
source activate face_recognition_36
pip install -r requirements_dev.txt
Downloading pre-build models:
mkdir ~/pretrained_models
cp docker/download*.py ~/pretrained_models
cd ~/pretrained_models
python download.py
python download_vggace2.py
the ~/pretrained_models
directory should look like that:
(face_recognition_36) b.twardowski@172-16-170-27:~/pretrained_models » tree
.
├── 20180402-114759
│ ├── 20180402-114759.pb
│ ├── model-20180402-114759.ckpt-275.data-00000-of-00001
│ ├── model-20180402-114759.ckpt-275.index
│ └── model-20180402-114759.meta
├── 20180402-114759.zip
├── det1.npy
├── det2.npy
├── det3.npy
├── download.py
└── download_vggace2.py
Then, to start a server, go to ./server
directory and type:
PYTHONPATH=".." python server.py
Everything should be dockerized and easy to reproduce. This makes things interesting even for a toy project from the computer vision area. Why?
- building model/playing around in Jupyter/Python - that's easy... inference
- on data grabbed from the host box camera inside docker - that's tricky!
Why is hard to grab data from camera device from docker? You can read
here.
The main reason - docker is not build for such things, so it's not making life
easier for here. Of course few possibilities are mentioned, like streaming from
the host MBP using ffmpeg
or preparing custom Virtualbox
boot2docker.iso
image and making the MBP webcam pass
through. But
all of them dosn't sound right. All requires additiona effort of installing sth
from brew
or Virualbox configuration (assuming you have docker installed on
your OSX).
The good side of having this as a webapp is fact that you can try it out on your mobile phone! What is very convenient for testing and demos.
Face detection is done to find faces from the video and mark it boundaries. These are areas that can be future use for the face recognition task. To detect faces the pre-trained MTCNN network is being used.
The face detection is using embedding from the VGGFace2 network + KNN model implemented in Tensorflow.
In order to get your face recognized first a few examples have to be provided to our algorithm (now - at least 10).
When you see the application working and correctly detecting faces just click the Capture Examples button.
While capturing examples for the face detection there have to be single face in video!
After 10 examples are collected, we can type the name of the person and upload them to server.
As a result we see the current status of classification examples:
And from now on, the new person is recognized. For this example it's CoverGirl.
If you are interested about the classification, please check out this notebook which will explain in details how it works (e.g. threshold for the recognition).
You can run jupyter notebook from the docker, just type:
docker run --rm -it -p 8888:8888 btwardow/tf-face-recognition:1.0.0 /run_jupyter.sh --allow-root
- face detection with a pre-trained MTCNN network
- training face recognition classifier (use pre-trained embedding + classifier) based on provided examples
- model updates directly from the browser
- save & clear classification model from the browser
- check if detection can be done faster, if so re-implement it (optimize MTCNN for inference?)
- try out port it to Trensorflow.js (as skeptical as I am of crunching numbers in JavaScript...)
Many thanks to creators of facenet
project, which provides pre trained models for VGGFace2. Great job!