COVID-Text-Extractor

Extract COVID related info from Images, PDFs and Texts using Google's Tesseract-OCR, PDFMiner.six and Regular Expressions.

How it can be used?

For building APIs
For creating bots (Telegram, Twitter...)
For Automatting COVID Data extraction, and more!

USAGE:

python3 image_to_text.py <path_to_image_file>

For Extracting Text on a random image from test directory:

python3 image_to_text.py

Requirements:

Python 3.9.4
OpenCV (Contrib) [4.5.1.48]
PyTesseract [0.3.7]
Pillow [8.2.0]
Numpy [1.20.2]

Installing requirements...

pip3 install -r requirements.txt

IMPORTANT Dependency:

Install Tesseract Training Data (english) according to your OS

For Arch Linux Users (mine is Manjaro):

pacman -S tesseract-data-eng

For Other Users:

Click here to Download eng.traineddata and configure the Python script accordingly.

What it Currently Extracts in Texts and Images:

Oxygen (if Oxygen, Cylinder, Cans, Concentrator or Refill is in text)
Verified (if source is verified)
Plasma (if keyword is used in text)
Email (all emails)
Age (extracts age)
Blood-Groups (All blood groups)
Phone-Numbers (All phone numbers)
Required (If Required keyword is used in text)
Help (If Required keyword is used in text)
Food (breakfast, lunch, dinner)
Urgent (If urgent or required keyword is used in text)
ICU-Beds (if ICU or Bed keyword is used in text)
Ventilator (if Ventilator keyword is used in text)
Ambulance (if Ambulance keyword is used in text)
Without (if without keyword is used in text)
Free (if Free keyword is used in text)
Report (if Report keyword is used in text)
Fabiflu (if Fabi-flu or Fabiflu is in text)
Medicine (if Medicine in text)
Vaccine (if vaccine or vaccination is in text)
Remdesivir (if Remdesivir is in text)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
images		images
test		test
.gitignore		.gitignore
README.md		README.md
image_to_text.py		image_to_text.py
pdf_to_text.py		pdf_to_text.py
regex_lookup.json		regex_lookup.json
requirements.txt		requirements.txt
test.py		test.py
text_functions.py		text_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-Text-Extractor

How it can be used?

USAGE:

For Extracting Text on a random image from test directory:

Requirements:

Installing requirements...

IMPORTANT Dependency:

For Arch Linux Users (mine is Manjaro):

For Other Users:

What it Currently Extracts in Texts and Images:

About

Contributors 2

Languages

gagangulyani/COVID-Text-Extractor

Folders and files

Latest commit

History

Repository files navigation

COVID-Text-Extractor

How it can be used?

USAGE:

For Extracting Text on a random image from test directory:

Requirements:

Installing requirements...

IMPORTANT Dependency:

For Arch Linux Users (mine is Manjaro):

For Other Users:

What it Currently Extracts in Texts and Images:

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages