- Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao. DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting[C]. CVPR 2023; code:[code]
- Xiang Zhang, Yongwen Su, Subarna Tripathi, Zhuowen Tu. Text Spotting Transformers[C]. CVPR 2022; code:[code]
- Mingxin Huang, Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin, Shenggao Zhu, Nicholas Yuan, Kai Ding, Lianwen Jin. SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition[C]. CVPR 2022; code:[code]
- Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar, R. Manmatha, Pietro Perona. Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer[C]. CVPR 2022;
-
Kang C, Kim G, Yoo S I. Detection and Recognition of Text Embedded in Online Images via Neural Context Models[C]//AAAI. 2017: 4103-4110.
code:[code] -
Bartz C, Yang H, Meinel C. STN-OCR: A single Neural Network for Text Detection and Text Recognition[J]. arXiv preprint arXiv:1707.08831, 2017.
code:[code] -
Busta M, Neumann L, Matas J. Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework[C]//CVPR 2017: 2204-2212.[code]
- Gómez L, Karatzas D. Textproposals: a text-specific selective search algorithm for word spotting in the wild[J]. Pattern Recognition, 2017, 70: 60-74.[code]
- Almazán J, Gordo A, Fornés A, et al. Word spotting and recognition with embedded attributes[J]. IEEE transactions on pattern analysis and machine intelligence, 2014, 36(12): 2552-2566.
code:[code] - Jaderberg M, Vedaldi A, Zisserman A. Deep features for text spotting[C]//European conference on computer vision. Springer, Cham, 2014: 512-528.
code:[code]
Maoyuan Ye, Jing Zhang, Shanshan Zhao,Juhua Liu, Bo Du, Dacheng Tao, DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer[C}. AAAI 2023; code:[code]
- Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michalis Raptis. Towards End-to-End Unified Scene Text Detection and Layout Analysis[C]. CVPR 2022; code:[code]
- Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao. Vision-Language Pre-Training for Boosting Scene Text Detectors[C]. CVPR 2022; code:[code]
- Jingqun Tang, Wenqing Zhang, Hongye Liu, MingKun Yang, Bo Jiang, Guanglong Hu, Xiang Bai. Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection[C]. CVPR 2022; code:[code]
- Xixi Xu, Zhongang Qi, Jianqi Ma, Honglun Zhang, Ying Shan, Xiaohu Qie. BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild[C]. CVPR 2022; code:[code]
-
【CentripetalText】 PKU. Sheng, Tao, Jie Chen, and Zhouhui Lian. CentripetalText: An Efficient Text Instance Representation for Scene Text Detection, NueralIPS 2021
-
【MAYOR】 UCAS. Qin, Xugong, Weiping Wang et al. Mask is all you need: Rethinking mask r-cnn for dense and arbitrary-shaped scene text detection, ACMMM 2021.
-
【PCR】 Dai, Pengwen, et al. Progressive Contour Regression for Arbitrary-Shape Scene Text Detection, CVPR 2021. code:[code]
-
【MOST】 He, Minghang, Xiang Bai et al. MOST: A Multi-Oriented Scene Text Detector with Localization Refinement, CVPR 2021.
-
【FCENet】Zhu, Yiqin, LianWen Jin et al. Fourier contour embedding for arbitrary-shaped text detection, CVPR 2021.
-
【STKM】 Wan, Qi, Haoqin Ji, and Linlin Shen. Self-Attention Based Text Knowledge Mining for Text Detection, CVPR 2021. code:code
-
【Video Text Detection】Feng, Wei, Cheng-lin Liu et al. Semantic-Aware Video Text Detection, CVPR 2021.
-
【TextSeg】 Xu, Xingqian, et al. Rethinking text segmentation: A novel dataset and a text-specific refinement approach , CVPR 2021. code:code
- 【ALCHEMY】 CMU, PKU, MEGVII. Alchemy: Techniques for Rectification Based Irregular Scene Text Recognition, Arxiv 2019
- 【CRAFT】 Clova AI Research, NAVER Corp. Character Region Awareness for Text Detection, CVPR19
- 【TIoU-metric】South China University of Technology. Tightness-Aware Evaluation Protocol for Scene Text Detection, CVPR19
- 【PSENet】 Nanjing University. Shape Robust Text Detection With Progressive Scale Expansion Network, CVPR19
- 【Curve-Text-Detector】 Yuliang L, Lianwen J, Shuaitao Z, et al. Detecting Curve Text in the Wild: New Dataset and New Solution
- 【TextBoxes++】Minghui Liao, Baoguang Shi and Xiang Bai{TextBoxes++}: A Single-Shot Oriented Scene Text Detector},TIP2018
- 【Multi-Oriented】Lyu, Pengyuan and Yao, Cong and Wu, Wenhao and Yan, Shuicheng and Bai, Xiang. Multi-oriented scene text detection via corner localization and region segmentation, CVPR 2018
-
【TextBoxes】 Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu. TextBoxes: A Fast Text Detector with a Single Deep Neural Network
-
【Seglink】 Shi B, Bai X, Belongie S. Detecting Oriented Text in Natural Images by Linking Segments[J]. arXiv preprint arXiv:1703.06520, 2017.
code:[code] -
【EAST】 Zhou X, Yao C, Wen H, et al. EAST: An Efficient and Accurate Scene Text Detector[J]. arXiv preprint arXiv:1704.03155, 2017.
code:[code] -【SSTD】 He P, Huang W, He T, et al. Single shot text detector with regional attention[C]//The IEEE International Conference on Computer Vision (ICCV). 2017.
code:[code;code]
- 【CTPN】Tian Z, Huang W, He T, et al. Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 56-72.
code:[code;cuda8-caffe;offical;ocr_detection_ctpn;keras_ocr]
dataset:[ICDAR 2011; ICDAR 2013; ICDAR 2015; SWT; Multilingual dataset]
- Gomez L, Karatzas D. Object proposals for text extraction in the wild[C]//Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015: 206-210.[code]
- Busta M, Neumann L, Matas J. Fastext: Efficient unconstrained scene text detector[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1206-1214.[code]
- Zhang Z, Shen W, Yao C, et al. Symmetry-based text line detection in natural scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 2558-2567.
code:[code]
- Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform[C]//Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010: 2963-2970.
code:[code]
- Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michalis Raptis. Open-Set Text Recognition via Character-Context Decoupling[C]. CVPR 2022; code:[code]
- Chang Liu, Chun Yang, Xu-Cheng Yin. Pushing the Performance Limit of Scene Text Recognizer Without Human Annotation[C]. CVPR 2022; code:[code]
- Canjie Luo, Lianwen Jin, Jingdong Chen. SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization[C]. CVPR 2022; code:[code]
[1] [CVPR-2020] R. Litman, O. Anschel, S. Tsiper, R. Litman, S. Mazor, and R. Manmatha, “SCATTER: selective context attentional scene text recognizer,” in Proceedings of CVPR, 2020. paper
[2] [CVPR-2020] D. Yu, X. Li, C. Zhang, J. Han, J. Liu, and E. Ding, “Towards accurate scene text recognition with semantic reasoning networks,” in Proceedings of CVPR, 2020. paper
[3] [ICVGIP-2018] Gupta A, Vedaldi A, Zisserman A. "Learning to read by spelling: Towards unsupervised text recognition," in Proceedings of ICVGIP, 2018. paper
[4] [CVPR-2020] Wan Z, Zhang J, Zhang L, et al, "On Vocabulary Reliance in Scene Text Recognition," in Proceedings of CVPR, 2020. paper
[5] [ECAI-2020] Bleeker M, de Rijke M, "Bidirectional Scene Text Recognition with a Single Decoder," in Proceedings of ECAI, 2020. paper code
[6] [arXiv-2019] Bartz C, Bethge J, Yang H, et al, "KISS: Keeping It Simple for Scene Text Recognition,"CoRR abs/1911.08400, 2019. paper code
[7] [arXiv-2020] Zhang C, Xu Y, Cheng Z, et al, "SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition," CoRR abs/2005.13117, 2020. paper
[8] [arXiv-2020] Lin J, Cheng Z, Bai F, et al, "Text Recognition in Real Scenarios with a Few Labeled Samples," CoRR abs/2006.12209, 2020. paper
[9] [ECCV-2020] Zhang C, Gupta A, Zisserman A. "Adaptive Text Recognition through Visual Matching," in Proceedings of ECCV, 2020. paper code
[10] [ECCV-2020] Zhang H, Yao Q, Yang M, et al, "AutoSTR: Efficient Backbone Search for Scene Text Recognition," in Proceedings of ECCV, 2020. paper code
[11] [ECCV-2020] Yan R, Huang Y, "PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit," in Proceedings of ECCV, 2020. paper
[12] [ECCV-2020] Yue X, Kuang Z, Lin C, et al. RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition," in Proceedings of ECCV, 2020. paper
[13] [CVPR-2020] Zhi Qiao, Yu Zhou, Dongbao Yang, Yucan Zhou, and Weiping Wang. 2020. SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition. In Proceedings of CVPR. paper code
- [1] South China University of Technology. Aggregation Cross-Entropy for Sequence Recognition, CVPR19[C]
- 【MORAN】Canjie Luo, Lianwen Jin, Zenghui Sun .A Multi-Object Rectified Attention Network for Scene Text Recognition .[J] arXiv preprint arXiv:1901.03003.
[code: Canjie-Luo/MORAN_v2] - 【SAR】Hui Li*, Peng Wang*, Chunhua Shen, Guyu Zhang.Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition .[C] [code: wangpengnorman / SAR-Strong-Baseline-for-Text-Recognition]
- Wojna Z, Gorban A, Lee D S, et al. Attention-based Extraction of Structured Information from Street View Imagery[J]. arXiv preprint arXiv:1704.03549, 2017.
: code:[offical;similar]
- He P, Huang W, Qiao Y, et al. Reading Scene Text in Deep Convolutional Sequences[C]//AAAI. 2016: 3501-3508.
code:[code] - Raj D, SAHU S, Anand A. Learning local and global contexts using a convolutional recurrent network model for relation classification in biomedical text[C]//Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). 2017: 311-321.
code:[code] - Smith R, Gu C, Lee D S, et al. End-to-end interpretation of the french street name signs dataset[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 411-426.
code:[code]
- Zhong Z, Jin L, Xie Z. High performance offline handwritten chinese character recognition using googlenet and directional feature maps[C]//Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015: 846-850.
code:[code] - 【CRNN】Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(11): 2298-2304.
code:【1 - offical】; 【2 - crnn.pytorch】; 【3 - unfinished】; 【4 - crnn.pytorch-chinese】; 【5 - crnn+stn-tf】; 【6 - lstm+ctc】; 【7 - ctpn+crnn-merge-cannot-train】; 【8 - crnn-mnist-keras】; 【9 - crnn-tf】; 【10 - crnn-tf-could-be-better】; 【11 - crnn.mxnet】; 【12 - crnn-tf-estimators】; 【13 - crnn-attention-tf】; 【14 - crnn.caffe】; 【15 - chinese.ocr-ctpn+crnn-tf+pytorch】; 【16 - another.crnn-attentive pooling】; 【17 - crnn-tf-music】; 【18 - crnn-tf-developing】; 【19 - crnn-torch】; 【20 - crnn-tf-developing】; 【21 - chinese-ocr-keras】; 【22 - crnn-tf-developing】; 【23 - ctpn+crnn-cannot-train-7】; 【24 - crnn-pytorch】; 【25 - cnn+lstm+ctc-tf】; 【26 - crnn-tf-resnet]】;【27 - caffe_ocr】
-
Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei Xiong, Hongwen Kang, Zhouhui Lian. Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring[C]. CVPR 2022; code:[code]
- 【Synthetic 3D data】 [CVPR-2020] S. Long and C. Yao, “UnrealText: Synthesizing realistic scene text images from the unreal world,” in Proceedings of CVPR, 2020. paper
- 【Synthetic data】Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2315-2324.
code:[offical;vgg;other]
-Hao Wang, Junchao Liao, Tianheng Cheng, Zewen Gao, Hao Liu, Bo Ren, Xiang Bai, Wenyu Liu. Knowledge Mining With Scene Text for Fine-Grained Recognition[C]. CVPR 2022; code:[code]
- Mengjun Cheng, Yipeng Sun, Longchao Wang, Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang. ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval[C]. CVPR 2022; code:[code]
- Jianqi Ma, Zhetong Liang, Lei Zhang. A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution[C]. CVPR 2022; code:[code]
- [1] Peking University. Typography with Decor: Intelligent Text Style Transfer, CVPR19
- [2] Peking University. DynTypo: Example-Based Dynamic Text Effects Transfer, CVPR19
- Ali Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, R. Manmatha. LaTr: Layout-Aware Transformer for Scene-Text VQA[C]. CVPR 2022; code:[code]
- [1] Facebook. Towards VQA Models That Can Read, CVPR19
- Ye Yuan, Xiao Liu, Wondimu Dikubab, Hui Liu, Zhilong Ji, Zhongqin Wu, Xiang Bai. Syntax-Aware Network for Handwritten Mathematical Expression Recognition[C]. CVPR 2022; code:[code]
Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren. Neural Collaborative Graph Machines for Table Structure Recognition[C]. CVPR 2022; code:[code]
- Chuhui Xue, Zichen Tian, Fangneng Zhan, Shijian Lu, Song Bai. Fourier Document Restoration for Robust Document Dewarping and Recognition[C]. CVPR 2022; code:[code]
- Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, Gui-Song Xia. Revisiting Document Image Dewarping by Grid Regularization[C]. CVPR 2022; code:[code]
- Zhangxuan Gu, Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang, Ming Gu, Liqing Zhang. XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding[C]. CVPR 2022; code:[code]
- Kil T, Seo W, Koo H I, et al. Robust Document Image Dewarping Method Using Text-Lines and Line Segments[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2017, 1: 865-870.
[code:xellows1305/Document-Image-Dewarping]
- [ACM Computing Surveys-2020] X. Chen, L. Jin, Y. Zhu, C. Luo, and T. Wang, “Text Recognition in the Wild: A Survey," ACM Computing Surveys (CSUR) 2020. paper code
- Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36.
there are three websites that have the dataset list of some different data type:
1 - www.iapr-tc11.org
2 - tc11.cvc.uab.es
3 - rrc.cvc.uab.es
-
2017 COCO-Text
2017 DeTEXT
2017 DOST
2017 FSNS
2017 MLT
2017 IEHHR
2011-2015 Born-DIgitalImage
2013-2015 Focused Scene Text
2013-2015 Text in Videos
2015 Incidental Scene Text
-
ICDAR Chinese
2017
- more than 12,000 images. Most of the images are collected in the wild by phone cameras.
- Task: Chinese Text in the Wild.
-
- 32,285 high resolution images, 1,018,402 character instances, 3,850 character categories, 6 kinds of attributes
-
Total-Text
2017
- 1555 images,11459 text instances, includes curved tex
-
SCUT_FORU_DB_Release
2016
- FORU contains two parts, which are Chinese2k and English2k dataset, respectively.
-
SynthText in the Wild Dataset
2016
- 800 thousand images, 8 million synthetic word instances.
- Each text instance is annotated with its text-string, word-level and character-level bounding-boxes.
-
COCO-Text (Computer Vision Group, Cornell)
2016
- 63,686 images, 173,589 text instances, 3 fine-grained text attributes.
- Task: text location and recognition
COCO-Text API
-
USTB-SV1k
2014
- 1000 (500 for training and 500 for testing) street view (patch) images from 6 USA cities
-
Synthetic Word Dataset (Oxford, VGG)
2014
- 9 million images covering 90k English words
- Task: text recognition, segmantation
download
-
IIIT 5K-Words
2012
- 5000 images from Scene Texts and born-digital (2k training and 3k testing images)
- Each image is a cropped word image of scene text with case-insensitive labels
- Task: text recognition
download
-
StanfordSynth(Stanford, AI Group)
2012
- Small single-character images of 62 characters (0-9, a-z, A-Z)
- Task: text recognition
download
-
MSRA Text Detection 500 Database (MSRA-TD500)
2012
- 500 natural images(resolutions of the images vary from 1296x864 to 1920x1280)
- Chinese, English or mixture of both
- Task: text detection
-
OSTD
2011
- cannot find the downloadlink
-
Traffice Guide Panel Text Dataset,TGPT
2016
- 3841 high-resolution individual images, 2315 containing traffic guide panel level annotations (1911 for training and 404 for testing, and all the testing images are manually labeled with ground truth tight text region bounding boxes), 1526 containing no traffic signs}.
-
- 350 high resolution images (average size 1260 × 860) (100 images for training and 250 images for testing)
- Only word level bounding boxes are provided with case-insensitive labels
- Task: text location
-
KAIST Scene_Text Database
2010
- 3000 images of indoor and outdoor scenes containing text
- Korean, English (Number), and Mixed (Korean + English + Number)
- Task: text location, segmantation and recognition
-
Chars74k
2009
- Over 74K images from natural images, as well as a set of synthetically generated characters
- Small single-character images of 62 characters (0-9, a-z, A-Z)
- Task: text recognition
-
ICDAR Benchmark Datasets
Dataset | Discription | Competition Paper |
---|---|---|
ICDAR 2015 | 1000 training images and 500 testing images | paper |
ICDAR 2013 | 229 training images and 233 testing images | paper |
ICDAR 2011 | 229 training images and 255 testing images | paper |
ICDAR 2005 | 1001 training images and 489 testing images | paper |
ICDAR 2003 | 181 training images and 251 testing images(word level and character level) | paper |