Extract COVID related info from Images, PDFs and Texts using Google's Tesseract-OCR, PDFMiner.six and Regular Expressions.
- For building APIs
- For creating bots (Telegram, Twitter...)
- For Automatting COVID Data extraction, and more!
python3 image_to_text.py <path_to_image_file>
python3 image_to_text.py
- Python 3.9.4
- OpenCV (Contrib) [4.5.1.48]
- PyTesseract [0.3.7]
- Pillow [8.2.0]
- Numpy [1.20.2]
pip3 install -r requirements.txt
Install Tesseract Training Data (english) according to your OS
pacman -S tesseract-data-eng
Click here to Download eng.traineddata and configure the Python script accordingly.
- Oxygen (if Oxygen, Cylinder, Cans, Concentrator or Refill is in text)
- Verified (if source is verified)
- Plasma (if keyword is used in text)
- Email (all emails)
- Age (extracts age)
- Blood-Groups (All blood groups)
- Phone-Numbers (All phone numbers)
- Required (If Required keyword is used in text)
- Help (If Required keyword is used in text)
- Food (breakfast, lunch, dinner)
- Urgent (If urgent or required keyword is used in text)
- ICU-Beds (if ICU or Bed keyword is used in text)
- Ventilator (if Ventilator keyword is used in text)
- Ambulance (if Ambulance keyword is used in text)
- Without (if without keyword is used in text)
- Free (if Free keyword is used in text)
- Report (if Report keyword is used in text)
- Fabiflu (if Fabi-flu or Fabiflu is in text)
- Medicine (if Medicine in text)
- Vaccine (if vaccine or vaccination is in text)
- Remdesivir (if Remdesivir is in text)