Sample using OCR in python with PDFs
Samples are from:
https://idrh.ku.edu/sites/idrh.ku.edu/files/files/tutorials/pdf/Text-searchable.pdf
https://idrh.ku.edu/sites/idrh.ku.edu/files/files/tutorials/pdf/Non-text-searchable.pdf
- Clone the repo.
- Go inside the folder.
cd pythonOCR
- Create a virtual environment (tested with python 3.6).
python3.6 -m venv venv
- Activate the virtual env.
source venv/bin/activate
- Install the requirements.
pip install -r requirements.txt
- Install tesseract-ocr. Go to the official repo and follow the instructions for your OS.
- Put the pdfs in the samples folder.
- Run it!
python Main.py