Excalibur is a web interface to extract tabular data from PDFs, written in Python 3! It is powered by Camelot.
Note: Excalibur only works with text-based PDFs and not scanned documents. (As Tabula explains, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
After installation with pip, you can initialize the metadata database using:
$ excalibur initdb
And then start the webserver using:
$ excalibur webserver
That's it! Now you can go to http://localhost:5000 and extract data tables from your PDFs using the web interface! Check out the usage section of the documentation for step-by-step instructions.
Note: You can also download executables for Windows and Linux from the releases page!
- Excalibur gives you complete control over your data. All file storage and processing happens on your own local or remote machine.
- Excalibur can be configured with MySQL and Celery for parallel and distributed workloads. By default, sqlite and multiprocessing are used for sequential workloads.
- You can save table extraction rules as presets and apply them on different PDFs to extract tables with similar structures. (in v0.3.0)
- You can extract tables from multiple PDFs in one go using an extraction rule by starting jobs. (in v0.4.0)
Excalibur uses Camelot under the hood. You can check out its comparison with other PDF table extraction libraries and tools.
If Excalibur solves your PDF table extraction needs, please consider supporting its development by becoming a patron!
After installing ghostscript, which is one of the requirements for Camelot (See install instructions), you can simply use pip to install Excalibur:
$ pip install excalibur-py
After installing ghostscript, clone the repo using:
$ git clone https://www.github.com/camelot-dev/excalibur
and install Excalibur using pip:
$ cd excalibur $ pip install .
Great documentation is available at http://excalibur-py.readthedocs.io/.
The Contributor's Guide has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.
You can check the latest sources with:
$ git clone https://www.github.com/camelot-dev/excalibur
You can install the development dependencies easily, using pip:
$ pip install excalibur-py[dev]
After installation, you can run tests using:
$ python setup.py test
Excalibur uses Semantic Versioning. For the available versions, see the tags on this repository. For the changelog, you can check out HISTORY.md.
This project is licensed under the MIT License, see the LICENSE file for details.