Document
ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation
AI tool to build charts based on text input
SuperSonic is the next-generation AI+BI platform that unifies Chat BI (powered by LLM) and Headless BI (powered by semantic layer) paradigms.
PyTorch deep learning models for document classification
DocLLM: A layout-aware generative language model for multimodal document understanding
An easy way to extract information from documents
This repository contains demos I made with the Transformers library by HuggingFace.
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
TF-ID: Table/Figure IDentifier for academic papers
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Quick exploration into fine tuning florence 2
Data processing with ML, LLM and Vision LLM
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
A system for agentic LLM-powered data processing and ETL
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Vision infrastructure to turn complex documents into RAG/LLM-ready data
Detect and extract tables to markdown and csv
A simple screen parsing tool towards pure vision based GUI agent
Open-source platform for extracting structured data from documents using AI.
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.