Skip to content
View OleehyO's full-sized avatar
🐏
🐏

Highlights

  • Pro

Block or report OleehyO

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

ocr

19 repositories

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…

Python 47,016 8,040 Updated Mar 6, 2025

Tesseract Open Source OCR Engine (main repository)

C++ 65,072 9,720 Updated Feb 12, 2025

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Python 13,672 1,090 Updated Jan 18, 2025

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,300 601 Updated Feb 21, 2025

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Python 390 48 Updated Feb 15, 2025

A synthetic data generator for text recognition

Python 3,415 998 Updated Jul 18, 2024

Official Implementation of SynthTIGER (Synthetic Text Image Generator), ICDAR 2021

Python 509 103 Updated Jun 14, 2024

Handwriting Synthesis with RNNs ✏️

Python 4,449 614 Updated Jan 11, 2024

收集并整理有关OCR的数据集并统一标注格式,以便实验需要

Python 897 194 Updated Nov 28, 2023

DocBank: A Benchmark Dataset for Document Layout Analysis

Python 598 72 Updated Aug 12, 2024

This repo is used to release the ArxivFormula dataset.

Python 24 2 Updated Nov 12, 2024

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

Python 276 27 Updated Dec 26, 2024

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Python 6,939 469 Updated Jan 3, 2025

This repository contains a paper collection of the methods for document image processing, including appearance enhancement, deshadowing, dewarping, deblurring, binarization and so on.

218 12 Updated Feb 12, 2025

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python 27,416 2,112 Updated Mar 4, 2025

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,077 622 Updated Feb 10, 2025

Convert PDF to markdown + JSON quickly with high accuracy

Python 21,769 1,341 Updated Mar 4, 2025

An extremely fast LaTeX formatter written in Rust

Rust 424 26 Updated Feb 10, 2025