Scalable reverse image search
built on Kubernetes and Elasticsearch
Pavlov Match makes it easy to search for images that look similar to each other. Using a state-of-the-art perceptual hash, it is invariant to scaling and 90 degree rotations. Its HTTP API is quick to integrate and flexible for a number of reverse image search applications. Kubernetes and Elasticsearch allow Match to scale to billions of images with ease while giving you full control over where your data is stored. Match uses the awesome ascribe/image-match under the hood for most of the image search legwork.
If you already have ElasticSearch running:
$ docker run -e ELASTICSEARCH_URL=https://daisy.us-west-1.es.amazonaws.com -it pavlov/match
If you want to run ElasticSearch in another docker container and link it to our pavlov/match
container (use the -p
option to export the ports from the containers to the host):
$ docker run --name my_elasticsearch_db -p 59200:9200 elasticsearch
$ docker run --link my_elasticsearch_db:elasticsearch -p 8888:80 pavlov/match
or, if you have docker-compose
installed on your system, type:
$ docker-compose up
(All the commands can be run using make
. Take a look to the Makefile
to check the options.)
Match is packaged as a Docker container (pavlov/match on Docker Hub), making it highly portable and scalable to billions of images. You can configure a few options using environment variables:
-
ELASTICSEARCH_URL (default:
http://elasticsearch
)A URL pointing to the Elasticsearch database where image signatures are to be stored. If you don't want to host your own Elasticsearch cluster, consider using AWS Elasticsearch Service. That's what we use. Note: in order to allow containers linking, the default value is set to
http://elasticsearch
-
ELASTICSEARCH_INDEX (default: images)
The index in the Elasticsearch database where image signatures are to be stored.
-
ELASTICSEARCH_DOC_TYPE (default: images)
The doc type used for storing image signatures.
-
WORKER_COUNT (default: 4)
The number of gunicorn worker forks to maintain in each Docker container.
Match is particularly awesomesauce when integrated into the Kubernetes container orchestration architecture. spread
makes it easy to get Match up and running quickly:
$ go get rsprd.com/spread/cmd/spread
$ git clone https://github.com/pavlovml/match
$ vim .k2e/secret.yml # configure me
$ spread deploy .
You can configure the service, replication controller, and secret like so:
# match-service.yml
apiVersion: v1
kind: Service
metadata:
namespace: default
name: match
spec:
ports:
- name: http
port: 80
protocol: TCP
selector:
app: match
# match-rc.yml
apiVersion: v1
kind: ReplicationController
metadata:
namespace: default
name: match
spec:
replicas: 2
selector:
app: match
template:
metadata:
labels:
app: match
spec:
containers:
- name: match
image: pavlov/match:latest
ports:
- containerPort: 80
env:
- name: WORKER_COUNT
valueFrom:
secretKeyRef:
name: match
key: worker-count
- name: ELASTICSEARCH_URL
valueFrom:
secretKeyRef:
name: match
key: elasticsearch.url
- name: ELASTICSEARCH_INDEX
valueFrom:
secretKeyRef:
name: match
key: elasticsearch.index
- name: ELASTICSEARCH_DOC_TYPE
valueFrom:
secretKeyRef:
name: match
key: elasticsearch.doc-type
# match-secret.yml
apiVersion: v1
kind: Secret
metadata:
namespace: default
name: match
data:
# 4, base64 encoded
worker-count: NA==
# https://daisy.us-west-1.es.amazonaws.com (change me)
elasticsearch.url: aHR0cHM6Ly9kYWlzeS51cy13ZXN0LTEuZXMuYW1hem9uYXdzLmNvbQ==
# images
elasticsearch.index: aW1hZ2Vz
# images
elasticsearch.doc-type: aW1hZ2Vz
Match has a simple HTTP API. All request parameters are specified via application/x-www-form-urlencoded
or multipart/form-data
.
Adds an image signature to the database.
-
url or image (required)
The image to add to the database. It may be provided as a URL via
url
or as amultipart/form-data
file upload viaimage
. -
filepath (required)
The path to save the image to in the database. If another image already exists at the given path, it will be overwritten.
-
metadata (default: None)
An arbitrary JSON object featuring meta data to attach to the image.
{
"status": "ok",
"error": [],
"method": "add",
"result": []
}
Deletes an image signature from the database.
-
filepath (required)
The path of the image signature in the database.
{
"status": "ok",
"error": [],
"method": "delete",
"result": []
}
Searches for a similar image in the database. Scores range from 0 to 100, with 100 being a perfect match.
-
url or image (required)
The image to add to the database. It may be provided as a URL via
url
or as amultipart/form-data
file upload viaimage
. -
all_orientations (default: true)
Whether or not to search for similar 90 degree rotations of the image.
{
"status": "ok",
"error": [],
"method": "search",
"result": [
{
"score": 99.0,
"filepath": "http://static.wixstatic.com/media/0149b5_345c8f862e914a80bcfcc98fcd432e97.jpg_srz_614_709_85_22_0.50_1.20_0.00_jpg_srz"
}
]
}
Compares two images, returning a score for their similarity. Scores range from 0 to 100, with 100 being a perfect match.
-
url1 or image1, url2 or image2 (required)
The images to compare. They may be provided as a URL via
url1
/url2
or as amultipart/form-data
file upload viaimage1
/image2
.
{
"status": "ok",
"error": [],
"method": "compare",
"result": [
{
"score": 99.0
}
]
}
Count the number of image signatures in the database.
{
"status": "ok",
"error": [],
"method": "list",
"result": [420]
}
Lists the file paths for the image signatures in the database.
-
offset (default: 0)
The location in the database to begin listing image paths.
-
limit (default: 20)
The number of image paths to retrieve.
{
"status": "ok",
"error": [],
"method": "list",
"result": [
"http://img.youtube.com/vi/iqPqylKy-bY/0.jpg",
"https://i.ytimg.com/vi/zbjIwBggt2k/hqdefault.jpg",
"https://s-media-cache-ak0.pinimg.com/736x/3d/67/6d/3d676d3f7f3031c9fd91c10b17d56afe.jpg"
]
}
Check for the health of the server.
{
"status": "ok",
"error": [],
"method": "ping",
"result": []
}
$ export ELASTICSEARCH_URL=https://daisy.us-west-1.es.amazonaws.com
$ make build
$ make run
$ make push
Match is based on ascribe/image-match, which is in turn based on the paper An image signature for any kind of image, Goldberg et al. There is an existing reference implementation which may be more suited to your needs.
Match itself is released under the BSD 3-Clause license. ascribe/image-match
is released under the Apache 2.0 license.