Skip to content

PyTorch implementation of text recognition based on Fairseq

License

Notifications You must be signed in to change notification settings

dian3f/image-captioning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

Update 06.11, 2019: I rewrite this image captioning repo base on fairseq. Stable version refer to crnn branch, which have pre-trained model checkpoints. Current branch is under construction. Very pleasure for suggestion and cooperation in the fairseq text recognition project.

It provides reference implementations of various image captioning models, including:

Features

  • All features of Fairseq
  • Flexible to enable convolution layer, recurrent layer in CRNN
  • Positional Encoding of images

Requirements and Installation

  • PyTorch (There is a bug in nn.CTCLoss which is solve in nightly version)
  • Python version >= 3.6
  • Fairseq version >= 0.6.2
  • For training new models, you'll also need an NVIDIA GPU and NCCL

Usage

  • Navigate (cd) to the root of the toolbox [YOUR_IMAGE_CAPTIONING_ROOT].

Annotation file format

In each line in the annotation file, the format is:

img_path char1 char2 char3 char4 char5 ...

where the char is the sequence's character.

For example, there is task identifying numbers of an image, the Alphabet is "0123456789". And there is an image named "00120_00091.jpg" in folder [DATA]/images, its constant is "99353361056742", there should be a line in the [DATA]/train.txt or [DATA]/valid.txt.

00120_00091.jpg 9 9 3 5 3 3 6 1 0 5 6 7 4 2

Training

Training strategy (Attention):

python -m image_captioning.train [DATA] \
    --task image_captioning --arch decoder_attention \
    --decoder-embed-dim 384 --backbone densenet121 \
    --decoder-layers 2 --batch-size 16 --dropout 0.0 \
    --max-epoch 100 --criterion cross_entropy --num-workers 4 \
    --optimizer adam --adam-eps 1e-04 --lr 0.001 --min-lr 1e-09 \
    --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --weight-decay 0.0 \
    --no-token-crf --save-interval 1

Training strategy (CRNN):

python -m image_captioning.train [DATA] \
    --task image_captioning --arch decoder_crnn \
    --decoder-layers 2 --batch-size 16 \
    --max-epoch 50 --criterion ctc_loss --num-workers 4 \
    --optimizer adam --adam-eps 1e-04 --lr 0.001 --min-lr 1e-09 \
    --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --weight-decay 0.0 \
    --save-interval 1

Testing

Use trained model to test (Attention):

python -m image_captioning.generate [DATA] \
    --arch decoder_attention --path checkpoints/checkpoint_best.pt \
    --decoder-embed-dim 384 --backbone densenet121 \
    --task image_captioning \
    --buffer-size 4 --num-workers 4 --gen-subset valid \
    --beam 1 --batch-size 4 --quiet

Use trained model to test (CRNN):

python -m image_captioning.generate [DATA] \
    --arch decoder_crnn --path checkpoints/checkpoint_best.pt \
    --task image_captioning --criterion ctc_loss \
    --sacrebleu \
    --buffer-size 4 --num-workers 4 --gen-subset valid \
    --batch-size 4 --quiet

About

PyTorch implementation of text recognition based on Fairseq

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%