This code allows you to train the Visnet model. Visnet, trained on Flipkart's proprietary internal dataset, powers Visual Recommendations at Flipkart. On the publically available dataset, Street2Shop, Visnet achieves state-of-the-art results. Here is the link to the arXiv tech report.
In this Repo, we have open-sourced the following:
- Training prototxts of Visnet
- Triplet sampling code, to generate the training files
- A CUDA based fast K-Nearest Neighbor Search library
- Other auxillary scripts, such as code to process Street2Shop dataset, sampling triplets, etc.
We soon plan to add other useful scripts, such as:
- Our useful modifications over Caffe - the image augmentation layer, and triplet accuracy layer to aid the training of Visnet
VisNet is a Convolutional Neural Network (CNN) trained using triplet based deep ranking paradigm. It contains a deep CNN modelled after the VGG-16 network, coupled with parallel shallow convolution layers in order to capture both high-level and low-level image details simultaneously.
In order to train you need a set of triplets <q,p,n>. For compatibility with Caffe's ImageData layer, you need 3 sets of triplet files (one each for q, p and n). The lines in those files should correspond to triplets, i.e. line#i in each file should correspond to the i'th triplet.
If you wish to train Visnet on Street2Shop dataset, you need to:
-
Download the Street2Shop dataset (This contains only the image URLs)
-
Download Street2Shop images (Have a look at scripts/image_downloader.py)
-
You can then format the data using scripts/create_structured_images.py and scripts/create_wtbi_crops.py
-
Use scripts/sampler.py to sample the triplet files
-
Change visnet/train.prototxt to include the location to your triplet files
-
Run training using Caffe
We provide PyCaffe code to do Feature Extraction (scripts/feature_extractor.py), and a CUDA-based fast NN computer (scripts/cuda_knn.py).