Project Title: Image to Text Converter
Project Author: sonu gupta
Date: 30 March 2024
The "Image to Text Converter" project aims to develop a Python program that can extract text from images using Optical Character Recognition (OCR) techniques. This report provides an overview of the project objectives, functionality, implementation details, and future enhancements.
The project utilizes Python libraries such as PIL (Python Imaging Library) and pytesseract to process images and perform text extraction. The primary functionality includes loading images from URLs or local file paths, applying OCR to extract text, and displaying the extracted text.
The key functionalities of the project include:
- Loading images from URLs or local file paths.
- Utilizing pytesseract to perform OCR and extract text from images.
- Displaying the extracted text to the console.
The project relies on the following Python libraries:
- PIL (Python Imaging Library): For image processing tasks.
- pytesseract: A Python wrapper for Tesseract-OCR Engine.
- requests: For making HTTP requests to download images.
- io: For working with streams and binary data.
Dependencies:
- Tesseract-OCR Engine: Required for performing OCR on images.
The project consists of a single Python script (img2text.ipynb
) containing the following components:
- Functions for loading images from URLs or local file paths.
- Function for extracting text from images using OCR.
- Iteration over a list of image URLs for processing multiple images.
- Sample image URLs stored in the
links
list for testing.
- Connecting to drive
import io
from google.colab import drive
drive.mount("/content/drive/")
- Implement a function to extract text from an image.
from PIL import Image
import pytesseract
import requests
from io import BytesIO
def load_image(image_url):
if image_url.startswith("https://"):
# Download the image from the URL
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))
else:
# Assuming image_url is a local file path
img = Image.open(image_url)
return img
def extract_text_from_image_url(image_url):
img = load_image(image_url)
# Configure pytesseract to use the Google Colab Tesseract installation path
pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract'
# Extract text from the image
text = pytesseract.image_to_string(img)
print(text)
- Iterate over the links and call the extract_text_from_image_url function for each link
# List of links
links = [
"/content/drive/MyDrive/TextImg/sample.png",
"/content/drive/MyDrive/TextImg/kangaroo-reading.jpg",
"https://replicate.delivery/pbxt/Jj87qg6dTft3R5kFIzda2vorF3epnzwJpv96PsKcgkdZipLV/figure-65.png",
"https://i0.wp.com/reliancevitamin.com/wp-content/uploads/2021/02/Reliance-Vitamin-PreWorkout-SFP.png?resize=534%2C607&ssl=1"
]
for link in links:
print("\nOUTPUT: "+link+" \n")
extract_text_from_image_url(link)
The project includes the following test cases to verify the functionality:
- Test Case 1: Local file path to a sample image.
- Test Case 2: URL to an image with complex content.
- Test Case 3: URL to an image with poor quality.
- Test Case 4: URL to a non-text image.
- Test Case 5: URL to an image with handwritten text.
-
Input: Image URLs stored in a list (
links
). -
img 1 (local storage)
- img 2 (public url)
- img 3 (Handwritten image)
-
Output: Extracted text from each image, printed to the console.
-
sample output of img 1
OUTPUT: /content/drive/MyDrive/TextImg/kangaroo-reading.jpg
The Kangaroo
There are four different types of kangaroo species — the red
kangaroo, the eastern gray kangaroo, the western gray
kangaroo and the antilopine kangaroo. They are the only large
animal that uses hopping as their primary method of locomotion.
Its long tail provides balance as they hop. They can hop around
quickly on two legs or walk around slowly on all four legs.
Kangaroos can jump very high, sometimes three times their own
height and they can swim. On land they move their hind legs
together, however, in water they kick each leg independently to
swim. They eat grass, leaves, herbs and roots. They need little
water and can go for months without drinking.
Kangaroosare social animals which stay in groups of at least three
or four individuals. Some groups can have up to one hundred
kangaroos. Most of them are active at night.
They usually have one young annually. Females have a pouch in
which babies live and drink milk for eight months. Kangaroos
usually live about six years in the wild. There are more kangaroos
than humans in Australia.
© wwwenglishunite.com
- sample output of img 2
OUTPUT: https://replicate.delivery/pbxt/Jj87qg6dTft3R5kFIzda2vorF3epnzwJpv96PsKcgkdZipLV/figure-65.png
It was the best of
times, it was the worst
of times, it was the age
of wisdom, it was the
age of foolishness...
- sample output of img 1
OUTPUT: /content/drive/MyDrive/TextImg/sample-handwritten-2.png
Hello,
Simply Natect has Sevelapect
iperesl re fe PRroypcitary robot se
Technolgy te Bete. your message
a enveloyes er ee genni ne
real Pen. zr a Comp (ef ely
swab stirgntthable. Kittin & temcane
ken aot wy.
Tey us today!
Singh “Note!
- Compatibility and installation issues with Tesseract-OCR Engine.
- Handling different types of images with varying quality and languages.
- Dealing with network-related errors during image download.
- Clear instructions provided for installing and configuring Tesseract-OCR Engine.
- Techniques such as image preprocessing and adjusting OCR parameters implemented to improve accuracy.
- Error handling mechanisms added to handle network-related errors.
- Metrics such as accuracy, processing time, and resource usage evaluated for text extraction process.
- Testing conducted with diverse image datasets to assess robustness and reliability.
- Integration with machine learning models for advanced preprocessing and recognition techniques.
- Support for additional image formats and sources.
- Implementation of a user-friendly interface for enhanced usability.
The "Image to Text Converter" project demonstrates the effectiveness of OCR techniques in extracting text from images. With further enhancements and optimizations, the project has the potential to serve as a valuable tool for various applications, contributing to increased productivity and efficiency.
-
PIL (Python Imaging Library)
- Official documentation: PIL Documentation
- Tutorial on using PIL for image processing: PIL Tutorial
-
pytesseract
- GitHub repository: pytesseract GitHub
- Official documentation: pytesseract Documentation
- Tutorial on using pytesseract for OCR: pytesseract Tutorial
-
requests
- Official documentation: requests Documentation
- Tutorial on making HTTP requests with requests library: requests Tutorial
-
io:
- Official documentation: io Documentation
- Tutorial on working with streams and binary data using io module: io Tutorial
-
Tesseract-OCR Engine
- GitHub repository: Tesseract-OCR GitHub
- Official documentation: Tesseract Documentation
- Installation guide for Tesseract-OCR: Tesseract Installation