Image-Captioning

Image Captioning is the process of generating textual description of an image.It uses both Natural language Processing and Computer Vision to generate the captions. The image information is encoded to a vector using a resnet50 model and decoded using LSTM's with a time distributed layer.

Dataset:

https://www.kaggle.com/shadabhussain/flickr8k

This dataset consists of 8000 images which are further divided into train (6000 images), validation (1000 images) and test(1000 images) where each image consists of 5 captions with a similar meaning.

Network Architecture:

Encoder

The Convolutional Neural Network(CNN) can be thought of as an encoder. The input image is given to CNN to extract the features. The last hidden state of the CNN is connected to the Decoder. I used Resnet50 to encode the images by removing the top layers and fedding the flatten layer information to the decoder.

Decoder

The Long Short Term Memory(LSTM) which does language modelling up to the word level. The first time step receives the encoded output from the encoder and also the vector.

Training:

The output from the last hidden state of the CNN(Encoder) is given to the first time step of the decoder. We set x1 = vector and the desired label y1 = first word in the sequence. Analogously, we set x2 =word vector of the first word and expect the network to predict the second word. Finally, on the last step, xT = last word, the target label yT = token.

Testing:

The image representation is provided to the first time step of the decoder. Set x1 = vector and compute the distribution over the first word y1. We sample a word from the distribution (or pick the argmax), set its embedding vector as x2, and repeat this process until the token is generated.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.DS_Store		.DS_Store
83.h5		83.h5
Image_captioning.ipynb		Image_captioning.ipynb
README.md		README.md
deploy.py		deploy.py
final-imcap.h5		final-imcap.h5
i2w (2).p		i2w (2).p
lstm.png		lstm.png
network.png		network.png
proces_imgs.p.zip		proces_imgs.p.zip
requirements.txt		requirements.txt
resent.png		resent.png
w2i (2).p		w2i (2).p

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-Captioning

Dataset:

Network Architecture:

Encoder

Decoder

Training:

Testing:

About

Releases

Packages

Languages

Thejesh-M/Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Image-Captioning

Dataset:

Network Architecture:

Encoder

Decoder

Training:

Testing:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages