awesome-ocr

Some awesome OCR papers.

Text spotting

2023

CVPR 2023

Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao. DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting[C]. CVPR 2023; code:[code]

2022

CVPR 2022

Xiang Zhang, Yongwen Su, Subarna Tripathi, Zhuowen Tu. Text Spotting Transformers[C]. CVPR 2022; code:[code]
Mingxin Huang, Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin, Shenggao Zhu, Nicholas Yuan, Kai Ding, Lianwen Jin. SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition[C]. CVPR 2022; code:[code]
Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar, R. Manmatha, Pietro Perona. Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer[C]. CVPR 2022;

2017

Kang C, Kim G, Yoo S I. Detection and Recognition of Text Embedded in Online Images via Neural Context Models[C]//AAAI. 2017: 4103-4110.
code:[code]
Bartz C, Yang H, Meinel C. STN-OCR: A single Neural Network for Text Detection and Text Recognition[J]. arXiv preprint arXiv:1707.08831, 2017.
code:[code]
Busta M, Neumann L, Matas J. Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework[C]//CVPR 2017: 2204-2212.[code]

2016

Gómez L, Karatzas D. Textproposals: a text-specific selective search algorithm for word spotting in the wild[J]. Pattern Recognition, 2017, 70: 60-74.[code]

2014

Almazán J, Gordo A, Fornés A, et al. Word spotting and recognition with embedded attributes[J]. IEEE transactions on pattern analysis and machine intelligence, 2014, 36(12): 2552-2566.
code:[code]
Jaderberg M, Vedaldi A, Zisserman A. Deep features for text spotting[C]//European conference on computer vision. Springer, Cham, 2014: 512-528.
code:[code]

Text Detection

AAAI 2023

Maoyuan Ye, Jing Zhang, Shanshan Zhao,Juhua Liu, Bo Du, Dacheng Tao, DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer[C}. AAAI 2023; code:[code]

2022

CVPR 2022

Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michalis Raptis. Towards End-to-End Unified Scene Text Detection and Layout Analysis[C]. CVPR 2022; code:[code]
Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao. Vision-Language Pre-Training for Boosting Scene Text Detectors[C]. CVPR 2022; code:[code]
Jingqun Tang, Wenqing Zhang, Hongye Liu, MingKun Yang, Bo Jiang, Guanglong Hu, Xiang Bai. Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection[C]. CVPR 2022; code:[code]
Xixi Xu, Zhongang Qi, Jianqi Ma, Honglun Zhang, Ying Shan, Xiaohu Qie. BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild[C]. CVPR 2022; code:[code]

2021

【CentripetalText】 PKU. Sheng, Tao, Jie Chen, and Zhouhui Lian. CentripetalText: An Efficient Text Instance Representation for Scene Text Detection, NueralIPS 2021
【MAYOR】 UCAS. Qin, Xugong, Weiping Wang et al. Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection, ACMMM 2021.
【PCR】 Dai, Pengwen, et al. Progressive Contour Regression for Arbitrary-Shape Scene Text Detection， CVPR 2021. code:[code]
【MOST】 He, Minghang, Xiang Bai et al. MOST: A Multi-Oriented Scene Text Detector with Localization Refinement, CVPR 2021.
【FCENet】Zhu, Yiqin, LianWen Jin et al. Fourier contour embedding for arbitrary-shaped text detection, CVPR 2021.
【STKM】 Wan, Qi, Haoqin Ji, and Linlin Shen. Self-Attention Based Text Knowledge Mining for Text Detection, CVPR 2021. code:code
【Video Text Detection】Feng, Wei, Cheng-lin Liu et al. Semantic-Aware Video Text Detection, CVPR 2021.
【TextSeg】 Xu, Xingqian, et al. Rethinking text segmentation: A novel dataset and a text-specific refinement approach , CVPR 2021. code:code

2019

【ALCHEMY】 CMU, PKU, MEGVII. Alchemy: Techniques for Rectification Based Irregular Scene Text Recognition, Arxiv 2019
【CRAFT】 Clova AI Research, NAVER Corp. Character Region Awareness for Text Detection, CVPR19
【TIoU-metric】South China University of Technology. Tightness-Aware Evaluation Protocol for Scene Text Detection, CVPR19
【PSENet】 Nanjing University. Shape Robust Text Detection With Progressive Scale Expansion Network, CVPR19
【Curve-Text-Detector】 Yuliang L, Lianwen J, Shuaitao Z, et al. Detecting Curve Text in the Wild: New Dataset and New Solution

2018

【TextBoxes++】Minghui Liao, Baoguang Shi and Xiang Bai{TextBoxes++}: A Single-Shot Oriented Scene Text Detector},TIP2018
【Multi-Oriented】Lyu, Pengyuan and Yao, Cong and Wu, Wenhao and Yan, Shuicheng and Bai, Xiang. Multi-oriented scene text detection via corner localization and region segmentation, CVPR 2018

2017

【TextBoxes】 Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu. TextBoxes: A Fast Text Detector with a Single Deep Neural Network
【Seglink】 Shi B, Bai X, Belongie S. Detecting Oriented Text in Natural Images by Linking Segments[J]. arXiv preprint arXiv:1703.06520, 2017.
code:[code]
【EAST】 Zhou X, Yao C, Wen H, et al. EAST: An Efficient and Accurate Scene Text Detector[J]. arXiv preprint arXiv:1704.03155, 2017.
code:[code] -【SSTD】 He P, Huang W, He T, et al. Single shot text detector with regional attention[C]//The IEEE International Conference on Computer Vision (ICCV). 2017.
code:[code;code]

2016

【CTPN】Tian Z, Huang W, He T, et al. Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 56-72.
code:[code;cuda8-caffe;offical;ocr_detection_ctpn;keras_ocr]
dataset:[ICDAR 2011; ICDAR 2013; ICDAR 2015; SWT; Multilingual dataset]

2015

Gomez L, Karatzas D. Object proposals for text extraction in the wild[C]//Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015: 206-210.[code]
Busta M, Neumann L, Matas J. Fastext: Efficient unconstrained scene text detector[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1206-1214.[code]
Zhang Z, Shen W, Yao C, et al. Symmetry-based text line detection in natural scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 2558-2567.
code:[code]

2010

Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform[C]//Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010: 2963-2970.
code:[code]

Text image recognition

2022

CVPR 2022

Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michalis Raptis. Open-Set Text Recognition via Character-Context Decoupling[C]. CVPR 2022; code:[code]
Chang Liu, Chun Yang, Xu-Cheng Yin. Pushing the Performance Limit of Scene Text Recognizer Without Human Annotation[C]. CVPR 2022; code:[code]
Canjie Luo, Lianwen Jin, Jingdong Chen. SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization[C]. CVPR 2022; code:[code]

2020

[1] [CVPR-2020] R. Litman, O. Anschel, S. Tsiper, R. Litman, S. Mazor, and R. Manmatha, “SCATTER: selective context attentional scene text recognizer,” in Proceedings of CVPR, 2020. paper

[2] [CVPR-2020] D. Yu, X. Li, C. Zhang, J. Han, J. Liu, and E. Ding, “Towards accurate scene text recognition with semantic reasoning networks,” in Proceedings of CVPR, 2020. paper

[3] [ICVGIP-2018] Gupta A, Vedaldi A, Zisserman A. "Learning to read by spelling: Towards unsupervised text recognition," in Proceedings of ICVGIP, 2018. paper

[4] [CVPR-2020] Wan Z, Zhang J, Zhang L, et al, "On Vocabulary Reliance in Scene Text Recognition," in Proceedings of CVPR, 2020. paper

[5] [ECAI-2020] Bleeker M, de Rijke M, "Bidirectional Scene Text Recognition with a Single Decoder," in Proceedings of ECAI, 2020. paper code

[6] [arXiv-2019] Bartz C, Bethge J, Yang H, et al, "KISS: Keeping It Simple for Scene Text Recognition,"CoRR abs/1911.08400, 2019. paper code

[7] [arXiv-2020] Zhang C, Xu Y, Cheng Z, et al, "SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition," CoRR abs/2005.13117, 2020. paper

[8] [arXiv-2020] Lin J, Cheng Z, Bai F, et al, "Text Recognition in Real Scenarios with a Few Labeled Samples," CoRR abs/2006.12209, 2020. paper

[9] [ECCV-2020] Zhang C, Gupta A, Zisserman A. "Adaptive Text Recognition through Visual Matching," in Proceedings of ECCV, 2020. paper code

[10] [ECCV-2020] Zhang H, Yao Q, Yang M, et al, "AutoSTR: Efficient Backbone Search for Scene Text Recognition," in Proceedings of ECCV, 2020. paper code

[11] [ECCV-2020] Yan R, Huang Y, "PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit," in Proceedings of ECCV, 2020. paper

[12] [ECCV-2020] Yue X, Kuang Z, Lin C, et al. RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition," in Proceedings of ECCV, 2020. paper

[13] [CVPR-2020] Zhi Qiao, Yu Zhou, Dongbao Yang, Yucan Zhou, and Weiping Wang. 2020. SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition. In Proceedings of CVPR. paper code

2019

[1] South China University of Technology. Aggregation Cross-Entropy for Sequence Recognition, CVPR19[C]
【MORAN】Canjie Luo, Lianwen Jin, Zenghui Sun .A Multi-Object Rectified Attention Network for Scene Text Recognition .[J] arXiv preprint arXiv:1901.03003.
[code: Canjie-Luo/MORAN_v2]
【SAR】Hui Li*, Peng Wang*, Chunhua Shen, Guyu Zhang.Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition .[C] [code: wangpengnorman / SAR-Strong-Baseline-for-Text-Recognition]

2018

2017

Wojna Z, Gorban A, Lee D S, et al. Attention-based Extraction of Structured Information from Street View Imagery[J]. arXiv preprint arXiv:1704.03549, 2017.
: code:[offical;similar]

2016

He P, Huang W, Qiao Y, et al. Reading Scene Text in Deep Convolutional Sequences[C]//AAAI. 2016: 3501-3508.
code:[code]
Raj D, SAHU S, Anand A. Learning local and global contexts using a convolutional recurrent network model for relation classification in biomedical text[C]//Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). 2017: 311-321.
code:[code]
Smith R, Gu C, Lee D S, et al. End-to-end interpretation of the french street name signs dataset[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 411-426.
code:[code]

2015

Zhong Z, Jin L, Xie Z. High performance offline handwritten chinese character recognition using googlenet and directional feature maps[C]//Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015: 846-850.
code:[code]
【CRNN】Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(11): 2298-2304.
code:【1 - offical】; 【2 - crnn.pytorch】; 【3 - unfinished】; 【4 - crnn.pytorch-chinese】; 【5 - crnn+stn-tf】; 【6 - lstm+ctc】; 【7 - ctpn+crnn-merge-cannot-train】; 【8 - crnn-mnist-keras】; 【9 - crnn-tf】; 【10 - crnn-tf-could-be-better】; 【11 - crnn.mxnet】; 【12 - crnn-tf-estimators】; 【13 - crnn-attention-tf】; 【14 - crnn.caffe】; 【15 - chinese.ocr-ctpn+crnn-tf+pytorch】; 【16 - another.crnn-attentive pooling】; 【17 - crnn-tf-music】; 【18 - crnn-tf-developing】; 【19 - crnn-torch】; 【20 - crnn-tf-developing】; 【21 - chinese-ocr-keras】; 【22 - crnn-tf-developing】; 【23 - ctpn+crnn-cannot-train-7】; 【24 - crnn-pytorch】; 【25 - cnn+lstm+ctc-tf】; 【26 - crnn-tf-resnet]】;【27 - caffe_ocr】

Synthetic Text data

2022

Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei Xiong, Hongwen Kang, Zhouhui Lian. Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring[C]. CVPR 2022; code:[code]

2020

【Synthetic 3D data】 [CVPR-2020] S. Long and C. Yao, “UnrealText: Synthesizing realistic scene text images from the unreal world,” in Proceedings of CVPR, 2020. paper

2016

【Synthetic data】Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2315-2324.
code:[offical;vgg;other]

Application with Scene Text

-Hao Wang, Junchao Liao, Tianheng Cheng, Zewen Gao, Hao Liu, Bo Ren, Xiang Bai, Wenyu Liu. Knowledge Mining With Scene Text for Fine-Grained Recognition[C]. CVPR 2022; code:[code]

Mengjun Cheng, Yipeng Sun, Longchao Wang, Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang. ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval[C]. CVPR 2022; code:[code]

Image Text Super-resolution

2022

Jianqi Ma, Zhetong Liang, Lei Zhang. A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution[C]. CVPR 2022; code:[code]

Text Style Transfer

2019

[1] Peking University. Typography with Decor: Intelligent Text Style Transfer, CVPR19
[2] Peking University. DynTypo: Example-Based Dynamic Text Effects Transfer, CVPR19

OCR + VQA

2022

Ali Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, R. Manmatha. LaTr: Layout-Aware Transformer for Scene-Text VQA[C]. CVPR 2022; code:[code]

2019

[1] Facebook. Towards VQA Models That Can Read, CVPR19

Handwritten Mathematical Expression Recognition

2022

Ye Yuan, Xiao Liu, Wondimu Dikubab, Hui Liu, Zhilong Ji, Zhongqin Wu, Xiang Bai. Syntax-Aware Network for Handwritten Mathematical Expression Recognition[C]. CVPR 2022; code:[code]

Document

2022

CVPR 2022

Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren. Neural Collaborative Graph Machines for Table Structure Recognition[C]. CVPR 2022; code:[code]

Chuhui Xue, Zichen Tian, Fangneng Zhan, Shijian Lu, Song Bai. Fourier Document Restoration for Robust Document Dewarping and Recognition[C]. CVPR 2022; code:[code]
Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, Gui-Song Xia. Revisiting Document Image Dewarping by Grid Regularization[C]. CVPR 2022; code:[code]
Zhangxuan Gu, Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang, Ming Gu, Liqing Zhang. XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding[C]. CVPR 2022; code:[code]

2017

Kil T, Seo W, Koo H I, et al. Robust Document Image Dewarping Method Using Text-Lines and Line Segments[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2017, 1: 865-870.
[code:xellows1305/Document-Image-Dewarping]

Survey

2020

[ACM Computing Surveys-2020] X. Chen, L. Jin, Y. Zhu, C. Luo, and T. Wang, “Text Recognition in the Wild: A Survey," ACM Computing Surveys (CSUR) 2020. paper code

2016

Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36.

Datasets

there are three websites that have the dataset list of some different data type:
1 - www.iapr-tc11.org
2 - tc11.cvc.uab.es
3 - rrc.cvc.uab.es

2017 COCO-Text
2017 DeTEXT
2017 DOST
2017 FSNS
2017 MLT
2017 IEHHR
2011-2015 Born-DIgitalImage
2013-2015 Focused Scene Text
2013-2015 Text in Videos
2015 Incidental Scene Text
ICDAR Chinese 2017
- more than 12,000 images. Most of the images are collected in the wild by phone cameras.
- Task: Chinese Text in the Wild.
Chinese Text in the Wild 2017
- 32,285 high resolution images, 1,018,402 character instances, 3,850 character categories, 6 kinds of attributes
Total-Text 2017
- 1555 images,11459 text instances, includes curved tex
SCUT_FORU_DB_Release 2016
- FORU contains two parts, which are Chinese2k and English2k dataset, respectively.
SynthText in the Wild Dataset 2016
- 800 thousand images, 8 million synthetic word instances.
- Each text instance is annotated with its text-string, word-level and character-level bounding-boxes.
COCO-Text (Computer Vision Group, Cornell) 2016
- 63,686 images, 173,589 text instances, 3 fine-grained text attributes.
- Task: text location and recognition
- COCO-Text API
USTB-SV1k 2014
- 1000 (500 for training and 500 for testing) street view (patch) images from 6 USA cities
Synthetic Word Dataset (Oxford, VGG) 2014
- 9 million images covering 90k English words
- Task: text recognition, segmantation
- download
IIIT 5K-Words 2012
- 5000 images from Scene Texts and born-digital (2k training and 3k testing images)
- Each image is a cropped word image of scene text with case-insensitive labels
- Task: text recognition
- download
StanfordSynth(Stanford, AI Group) 2012
- Small single-character images of 62 characters (0-9, a-z, A-Z)
- Task: text recognition
- download
MSRA Text Detection 500 Database (MSRA-TD500) 2012
- 500 natural images(resolutions of the images vary from 1296x864 to 1920x1280)
- Chinese, English or mixture of both
- Task: text detection
OSTD 2011
- cannot find the downloadlink
Traffice Guide Panel Text Dataset,TGPT 2016
- 3841 high-resolution individual images, 2315 containing traffic guide panel level annotations (1911 for training and 404 for testing, and all the testing images are manually labeled with ground truth tight text region bounding boxes), 1526 containing no traffic signs}.
Street View Text (SVT) 2010
- 350 high resolution images (average size 1260 × 860) (100 images for training and 250 images for testing)
- Only word level bounding boxes are provided with case-insensitive labels
- Task: text location
KAIST Scene_Text Database 2010
- 3000 images of indoor and outdoor scenes containing text
- Korean, English (Number), and Mixed (Korean + English + Number)
- Task: text location, segmantation and recognition
Chars74k 2009
- Over 74K images from natural images, as well as a set of synthetically generated characters
- Small single-character images of 62 characters (0-9, a-z, A-Z)
- Task: text recognition
ICDAR Benchmark Datasets

Dataset	Discription	Competition Paper
ICDAR 2015	1000 training images and 500 testing images	`paper`
ICDAR 2013	229 training images and 233 testing images	`paper`
ICDAR 2011	229 training images and 255 testing images	`paper`
ICDAR 2005	1001 training images and 489 testing images	`paper`
ICDAR 2003	181 training images and 251 testing images(word level and character level)	`paper`

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md

AprilYapingZhang/awesome-ocr

Folders and files

Latest commit

History

Repository files navigation

awesome-ocr

Some awesome OCR papers.

Text spotting

2023

CVPR 2023

2022

CVPR 2022

2017

2016

2014

Text Detection

AAAI 2023

2022

CVPR 2022

2021

2019

2018

2017

2016

2015

2010

Text image recognition

2022

CVPR 2022

2020

2019

2018

2017

2016

2015

Synthetic Text data

2022

2020

2016

Application with Scene Text

Image Text Super-resolution

2022

Text Style Transfer

2019

OCR + VQA

2022

2019

Handwritten Mathematical Expression Recognition

2022

Document

2022

CVPR 2022

2017

Survey

2020

2016

Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages