Auto-Table-Extract: A System To Identify And Extract Tables From PDF To Excel

Read the published Paper: Research Paper on Auto-Table-Extract

Introduction

The auto-table-extract system is capable of identifying tablular data within PDF documents and extracting all the tabular information into an excel file.

Table detection is the process of identifying tables from a document, extracting the cells contained in a table.

The auto-table-extract system consists of three main modules: 1) Document conversion 2) Layout Analysis 3) Table detection and extraction.

The two methods used for identification and extraction are:

Table_with_Border ( For tables with fully recognizable borders) The Table_with_Border method is used to determine the tables with the help of coordinates of text lines, characters, and text boxes provided by the PDFMiner.
Table_without_Border (For partially bordered or borderless tables ). The Table_without_Border method uses the clustering method and coordinates of the text line to determine the table and extract its contents.

Further, a Pandas DataFrame consisting of extracted data is created, which is used to make the excel sheet containing the data. The output of the auto-table-extract system is an Excel document with the table’s information extracted from the PDF.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
auto-table-extract		auto-table-extract
excel		excel
static		static
templates		templates
upload		upload
LICENSE		LICENSE
README.md		README.md
apply_password.py		apply_password.py
input_pdf.pdf		input_pdf.pdf
main.py		main.py
math_log.txt		math_log.txt
requirements.txt		requirements.txt
table_extract.py		table_extract.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto-Table-Extract: A System To Identify And Extract Tables From PDF To Excel

Introduction

About

Releases

Packages

Languages

License

rohit-sahoo/auto-table-extract

Folders and files

Latest commit

History

Repository files navigation

Auto-Table-Extract: A System To Identify And Extract Tables From PDF To Excel

Introduction

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages