X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsens…

Python 1,031 111 Updated Feb 27, 2023

WenmuZhou / OCR_DataSet

收集并整理有关OCR的数据集并统一标注格式，以便实验需要

Python 892 191 Updated Nov 28, 2023

zhang0jhon / AttentionOCR

Scene text recognition

Python 837 260 Updated Dec 11, 2019

lukemelas / PyTorch-Pretrained-ViT

Vision Transformer (ViT) in PyTorch

Python 804 127 Updated Mar 2, 2022

ethz-asl / hfnet

From Coarse to Fine: Robust Hierarchical Localization at Large Scale with HF-Net (https://arxiv.org/abs/1812.03506)

Python 799 187 Updated Jul 25, 2024

jiangxiluning / FOTS.PyTorch

FOTS Pytorch Implementation

Python 644 193 Updated Feb 14, 2023

MhLiao / MaskTextSpotterV3

The code of "Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting"

Python 629 121 Updated Jan 20, 2022

aimagelab / mammoth

An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning

Python 605 110 Updated Dec 27, 2024

jzbjyb / FLARE

Forward-Looking Active REtrieval-augmented generation (FLARE)

Python 599 55 Updated Nov 20, 2023

raoyongming / DenseCLIP

[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Python 523 41 Updated Sep 15, 2023

stanford-futuredata / ARES

Automated Evaluation of RAG Systems

Python 522 54 Updated Nov 4, 2024

Bartzi / stn-ocr

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Python 498 137 Updated Jan 2, 2018

A-bone1 / Attention-ocr-Chinese-Version

Attention OCR Based On Tensorflow

Python 432 141 Updated May 14, 2018

JamesQFreeman / LoRA-ViT

Low rank adaptation for Vision Transformer

Python 374 19 Updated Mar 18, 2024

NVlabs / geomapnet

Geometry-Aware Learning of Maps for Camera Localization (CVPR2018)

Python 350 79 Updated Jul 27, 2021

ShoufaChen / AdaptFormer

[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"

Python 335 20 Updated Sep 16, 2022