Carpenter

Creates tables out of images

Carpenter takes images that contain tables as pixels and tries to convert them to HTML tables (e.g. HTML tables).

If you need to extract tables out of text PDFs, have a look at Tabula.

Installation

You need OpenCV and the OpenCV python bindings. On OS X a brew install opencv installs OpenCV, but you have to make the Python library available (look at the last lines of brew's output). Copying the cv*.(so|py) to you virtual env's site-packages folder is enough.

Carpenter Make Tables (Command line)

python make_tables.py [options] imagefile.png > output.html

Carpenter Workshop (Web Interface)

The Carpenter Workshop has the goal to make it easy to extract the same kinds of tables out of multiple PDF files by detecting table layouts and applying predefined extraction steps. It's not yet there, though.

It requires libpoppler with the pdftohtml/pdfimages commands and ImageMagick with convert. You also have to install the web dependencies with

pip install -r pip-requirements.txt

Start the workshop with

python open_workshop.py

Also run the Carpenter task worker with a Celery worker queue of your choosing:

rabbitmq-server &
celeryd -l INFO -I carpenter.tasks

Workshop configuration's defaults are in carpenter.default_settings and can be overridden with something like

export CARPENTER_SETTINGS=my_settings.cfg

Carpenter Tools (Python library)

There are a couple of modules inside carpenter that can be used more or less independently to accomplish some carpentry tasks:

carpenter.bench: Takes a PDF file and extracts pages and images
carpenter.ruler: Detects horizontal and vertical lines in an image
carpenter.paper: Takes horizontal and vertical lines and creates a table structure out of them
carpenter.cutter: Cuts table cells out of images
carpenter.plane: Runs OCR on extracted table cell images

For usage see make_tables.py, carpenter.tasks and the code itself.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
carpenter		carpenter
etc/tesseract		etc/tesseract
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
make_tables.py		make_tables.py
open_workshop.py		open_workshop.py
pip-requirements.txt		pip-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Carpenter

Creates tables out of images

Installation

Carpenter Make Tables (Command line)

Carpenter Workshop (Web Interface)

Carpenter Tools (Python library)

About

Releases

Packages

Languages

License

zhnext/carpenter

Folders and files

Latest commit

History

Repository files navigation

Carpenter

Creates tables out of images

Installation

Carpenter Make Tables (Command line)

Carpenter Workshop (Web Interface)

Carpenter Tools (Python library)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages