You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Answers are mapped to 1000 word vocabulary, covering 87% answers across training and validation datasets.
49
49
- The LSTM+VIS model is defined in vis_lstm.py. The input tensors for training are fc7 features, Questions(Word indices upto 22 words), Answers(one hot encoding vector of size 1000). The model depicted in the figure is implemented with 2 LSTM layers by default(num_layers in configurable).
50
50
51
+
## Sample Predictions
52
+
53
+
The fun part! Try it for yourself. Make sure you have tensorflow installed. Download the data files/trained model from [this link][9] and save them in the ```Data/``` directory. Test for an image using:
54
+
```
55
+
python predict.py --image_path="Data/sample.jpg" --question="Which animal is this?" --model_path="Data/model7.ckpt"
56
+
```
57
+
| Image | Question | Top Answers (left to right) |
58
+
| ------------- |:-------------:| -----:|
59
+
|| What color is the signal? | red, green, yellow|
60
+
|| What animal is this? | giraffe, cow, horse|
61
+
|| What animal is this? | cat, dog, giraffe|
62
+
|| What color is the frisbee that is in the dog's mouth? | white, brown, red|
63
+
|| What color is the frisbee that is upside down? | red, white, blue|
64
+
|| What are they playing with? | frisbee, soccer ball, soccer|
65
+
|| What is in the standing person's hand? | bat, glove, ball|
66
+
51
67
## References
52
68
-[Exploring Models and Data for Image Question Answering][1]
0 commit comments