Skip to content

This project uses OpenCV and Tesseract OCR to extract and annotate text from passport images. It preprocesses images, extracts key details (passport number, name, nationality, etc.), and handles errors. The script outputs the extracted information in JSON format.

License

Notifications You must be signed in to change notification settings

4TechSadiq/Passport-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Passport Scraper

This project uses OpenCV and Tesseract OCR to extract and annotate textual information from Indian passport images. The script processes the image to enhance text recognition accuracy and extracts key details such as passport number, full name, surname, nationality, country code, date of birth, date of issue, date of expiry, gender, place of birth, and place of issue.

Features:

  • Image Preprocessing: Converts the image to grayscale, sharpens it, and applies denoising to improve text detection accuracy.
  • Text Detection and Extraction: Utilizes Tesseract OCR to detect and extract text from the preprocessed image.
  • Information Parsing: Uses regular expressions to identify and extract specific information such as passport number, dates, and names from the detected text.
  • Error Handling: Incorporates comprehensive error handling to manage and report potential issues during text detection and parsing.
  • Annotated Output: Draws rectangles around detected text and annotates the image with the recognized text for visual verification.

Installation

  1. Install Dependencies:

    • Ensure you have OpenCV and Tesseract installed.
    • Install required Python packages using pip:
      pip install opencv-python pytesseract numpy
  2. Tesseract Installation:

    • Download and install Tesseract OCR from here.

Usage

  1. Run the Script:

    • Place your passport image in the specified path.
    • Modify the image_path variable to point to your image.
    • Run the script:
      python main.py
  2. Output:

    • The script will print the extracted passport information in JSON format.

Example

image_path = "path/to/your/passport/image.jpg"
passport_info_json = extract_passport_info(image_path)
print(passport_info_json)

About

This project uses OpenCV and Tesseract OCR to extract and annotate text from passport images. It preprocesses images, extracts key details (passport number, name, nationality, etc.), and handles errors. The script outputs the extracted information in JSON format.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages