Skip to content

prahalad12345/UDScan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UDScan

UDScan or Unstructured Data Scan is an OCR System to convert Images, Streams and PDF to printed text..

Glossary

Term Description
OCR(Optical Character Recognition It is a technology that recognizes text within a digital image. Target typewritten text, only Glyph or character at a time.
OWR(Optical word Reconition) Target typewritten text, only word at a time.
ICR(Intelligent Character Recognition) Target typewritten printed script or cursive text one glyph or character at a time. Usually involves machine learning.
IWR(Intelligent Word Recognition) Target typewritten printed script or cursive text word at a time. This is especially useful for language where glyphs are not separated in cursive script

Problem solved by UDScan

  • UDScan is a framework used for digitizing printed text so that it can be electronically edited, searched, stored more compactly, displayed online and used in machine processes such as Machine learning, Text to Speech and Text Mining .

  • Converts an unscanned image, pdf or video into text format

UDScan Architecture

UDScan Architecture

Ground level Data flow Diagram

UDScan level0

Level 1 Data flow Diagram

UDScan Architecture

Prerequisites

Make sure you have installed all of the following prerequisites on your development machine:

3rd Party Library Dependencies

YaDV uses following 3rd party tools/libraries:

3rd Party Reference Link
OpenCV https://opencv.org/
Megamimes https://github.com/trumpowen/MegaMimes
Tesseract-OCR https://github.com/tesseract-ocr/tesseract
katna https://katna.readthedocs.io/en/latest/
pdf2image https://pypi.org/project/pdf2image/

Installation

Library Installation for the working of UDScan

C++ Installations:
1) OpenCV:
   Linux:https://docs.opencv.org/master/d7/d9f/tutorial_linux_install.html
   Windows:https://learnopencv.com/install-opencv-on-windows/
   Macos:https://learnopencv.com/install-opencv-4-on-macos/
   
2) Megamimes

3) Tesseract-OCR:
   https://tesseract-ocr.github.io/tessdoc/Compiling-–-GitInstallation.html
Python Installations:
pip install katna
pip install opencv-python
pip install pdf2image

Running UDScan

UDScan can be executed used following command:

1)mkdir ./src/page
2)change the input file in src/driver.cpp
3)make run

License

UDScan is completely free and open-source

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published