For scanned documents, especially old ones, some table lines can be broken, and that can be very hard for typical table OCR libraries to read! in this repository I deal with this issue, by first detecting the table using layout parser library, then preform OCR on the whole document , detect the contours and cells of the table, and finally assign the texts using their coordinates to their respective cells. I found that this method works much better then applying OCR to each detected contour cell
-
Notifications
You must be signed in to change notification settings - Fork 1
codgas/Table-with-broken-lines-OCR
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This a Python notebook to detect a table and extract it from a scanned pdf or image
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published