Table-with-broken-lines-OCR

This a Python notebook to detect a table and extract it from a scanned pdf or image

For scanned documents, especially old ones, some table lines can be broken, and that can be very hard for typical table OCR libraries to read! in this repository I deal with this issue, by first detecting the table using layout parser library, then preform OCR on the whole document , detect the contours and cells of the table, and finally assign the texts using their coordinates to their respective cells. I found that this method works much better then applying OCR to each detected contour cell

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
Table_recognition_then_OCR_Boujlida.ipynb		Table_recognition_then_OCR_Boujlida.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table-with-broken-lines-OCR

This a Python notebook to detect a table and extract it from a scanned pdf or image

About

Releases

Packages

Languages

codgas/Table-with-broken-lines-OCR

Folders and files

Latest commit

History

Repository files navigation

Table-with-broken-lines-OCR

This a Python notebook to detect a table and extract it from a scanned pdf or image

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages