neuraltalk/example_images at master · MLStudy/neuraltalk

History

Name		Name	Last commit message	Last commit date
parent directory ..
7EGRMwN.jpg		7EGRMwN.jpg
89pUfSc.jpg		89pUfSc.jpg
QmG3nS6.jpg		QmG3nS6.jpg
Readme.md		Readme.md
UbVIl1e.jpg		UbVIl1e.jpg
animals.jpg		animals.jpg
cat.jpg		cat.jpg
cobra.jpg		cobra.jpg
diving.jpg		diving.jpg
dogdinner.jpg		dogdinner.jpg
frog.jpg		frog.jpg
gWEHGwf.jpg		gWEHGwf.jpg
hole.jpg		hole.jpg
japanese.jpg		japanese.jpg
jump.jpg		jump.jpg
koala.jpg		koala.jpg
mic.jpg		mic.jpg
pope.jpg		pope.jpg
pose.jpg		pose.jpg
qjujW6d.png		qjujW6d.png
result.html		result.html
result_struct.json		result_struct.json
seal.jpg		seal.jpg
tasks.txt		tasks.txt
vgg_feats.mat		vgg_feats.mat
work.jpg		work.jpg

Readme.md

Here we explain how the framework can be used to predict sentences for arbitrary images.

Copy all images you want to predict for to one folder. For example, this folder contains multiple, collected from Reddit's r/photoshopbattles.
Extract the CNN features for all images with the Matlab script provided in matlab_features_reference, and as described with the Readme file in that folder. I want to eventually allow people to extract features with Python but for now it is needed to go through Matlab. The Matlab script needs to be pointed to a file tasks.txt that you should crate in the same folder. I show the example in this fodler as well: It lists the images that you wish to process in some order. The Matlab file will extract features into a file called vgg_feats.mat.
Now that we have the features we can run the prediction! Use the script predict_on_images.py. The script takes the path to a model checkpoint and the path to the folder that holds the images, the tasks.txt, and the features vgg_feats.mat. Example invocation is python predict_on_images.py lstm_model.p -r example_images. The script will write the html file result.html which you can use to visualize the results in your browser.

Note that the models are trained on a particular dataset (e.g. COCO dataset), so if you show them images they haven't seen during their training time then they will produce garbage. Along with the sentence predictions I'm also showing the log probabilities. When this is low (e.g. -10), this means that the model is confused about the image and likely won't make very good predictions. Conversely, higher numbers (such as -7) indicate that the model is relatively more confident in the outcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

example_images

example_images

Readme.md

Files

example_images

Directory actions

More options

Directory actions

More options

Latest commit

History

example_images

Folders and files

parent directory

Readme.md