Skip to content

fishermanxx/Image-Captioning-with-Pytorch

Repository files navigation

simple-image-caption-with-Pytorch

This is a reimplementation of the basic image caption structures(CNN-RNN). CNN-(ResNet18), RNN-(LSTM), dataset(MSCOCO), Toolkit(Pytorch)

Directory


Background

Image caption is some techniques that help computers to understand the picture given to them and express the picture by nature languages.


Algorithm

  1. Extract features from the input images with convolutional neural network (in this work is pretrained Resnet18)
  • Input: batch of images with the shape(N, C, H, W)
  • Output: batch of features of shape(N, D)
    N:batch size, C:image channel(RGB), H:image height, W:image weight, D:feature dimensions(512)

just as the figure shows:
imshow

  1. Encode the sentence into vectors with a dictionary and put <start>, <end>, <pad> into sentences.
  • Input: batch of strings with shape(N, *)
  • Output: batch of vectors with shape(N, L)
    N:batch size, *:length of the sentence, L:fixed length of the vector

just as the figure shows:
imshow

  1. Use the long short-term memory(LSTM) model as the RNN to realize the generation part.
  • Input: batch of encode captions of shape (N, L, C)
  • Initial hidden layer: extracted features of shape (N, D)
  • Output : (N, L, C)
    C:dictionary size

just as the figure shows:
imshow


Example result

the Experiment metrics is as follows:
imshow

Several generation captions:
imshow


Reference


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published