This software implements the Convolutional Recurrent Neural Network (CRNN) in pytorch in paper:
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition, Baoguang Shi, Xiang Bai, Cong Yao, PAMI 2017 [arXiv]
This code implements: (args.arch)
- DenseNet + CTCLoss (densenet_cifar, densenet121)
- ResNet + CTCLoss (resnet_cifar)
- MobileNetV2 + CTCLoss (mobilenetv2_cifar)
- ShuffleNetV2 + CTCLoss (shufflenetv2_cifar)
In order to run this toolbox you will need:
- Python3 (tested with Python 3.6+)
- PyTorch deep learning framework (tested with version 1.0.1)
The demo reads an example image and recognizes its text content. See the demo notebook for all the details.
Example image:
Expected output:
-停--下--来--,--看--着--那--些--握--着------ => 停下来,看着那些握着
- Navigate (
cd
) to the root of the toolbox[YOUR_CRNN_ROOT]
. - Resize the height of a image to 32, and the width should be divisible by 8.
Refer to YCG09's SynthText, the image size is 32x280, origin image can be downloaded from BaiduYun (pw: lu7m), untar it to directory [DATASET_ROOT_DIR]
.
In each line in the annotation file, the format is:
img_path encode1 encode2 encode3 encode4 encode5 ...
where encode
is the sequence's encode.
Altogether 5989 characters, containing Chinese characters, English letters, numbers and punctuation, can be downloaded from OneDrive or BaiduYun (pw: d654), put the downloaded file alphabet_decode_5990.txt
into directory [DATASET_ROOT_DIR]
.
Training with densenet121
architecture and pre-trained models can be found OneDrive or BaiduYun (pw: riuh). P.S. current pretrained model is rough, I hope that I have time to modify it later.
Training strategy:
python ./main.py --arch densenet121 --alphabet [DATASET_ROOT_DIR]/alphabet_decode_5990.txt --dataset-root [DATASET_ROOT_DIR] --lr 5e-5 --optimizer rmsprop --gpu-id 0 --not-pretrained
Use trained model to test:
python ./main.py --arch densenet121 --alphabet [DATASET_ROOT_DIR]/alphabet_decode_5990.txt --dataset-root [DATASET_ROOT_DIR] --lr 5e-5 --optimizer rmsprop --gpu-id 0 --resume densenet121_pretrained.pth.tar --test-only