UDScan

UDScan or Unstructured Data Scan is an OCR System to convert Images, Streams and PDF to printed text..

Glossary

Term	Description
OCR(Optical Character Recognition	It is a technology that recognizes text within a digital image. Target typewritten text, only Glyph or character at a time.
OWR(Optical word Reconition)	Target typewritten text, only word at a time.
ICR(Intelligent Character Recognition)	Target typewritten printed script or cursive text one glyph or character at a time. Usually involves machine learning.
IWR(Intelligent Word Recognition)	Target typewritten printed script or cursive text word at a time. This is especially useful for language where glyphs are not separated in cursive script

Problem solved by UDScan

UDScan is a framework used for digitizing printed text so that it can be electronically edited, searched, stored more compactly, displayed online and used in machine processes such as Machine learning, Text to Speech and Text Mining .
Converts an unscanned image, pdf or video into text format

UDScan Architecture

Ground level Data flow Diagram

Level 1 Data flow Diagram

Prerequisites

Make sure you have installed all of the following prerequisites on your development machine:

Git - Download & Install Git. OSX and Linux machines typically have this already installed.
Python - Download & Install Python3 - Minimum requirement 3.8.x
C++ -[Install g++ if on Windows.] (https://www3.cs.stonybrook.edu/~alee/g++/g++.html)

3rd Party Library Dependencies

YaDV uses following 3rd party tools/libraries:

3rd Party	Reference Link
OpenCV	https://opencv.org/
Megamimes	https://github.com/trumpowen/MegaMimes
Tesseract-OCR	https://github.com/tesseract-ocr/tesseract
katna	https://katna.readthedocs.io/en/latest/
pdf2image	https://pypi.org/project/pdf2image/

Installation

Library Installation for the working of UDScan

C++ Installations:
1) OpenCV:
   Linux:https://docs.opencv.org/master/d7/d9f/tutorial_linux_install.html
   Windows:https://learnopencv.com/install-opencv-on-windows/
   Macos:https://learnopencv.com/install-opencv-4-on-macos/
   
2) Megamimes

3) Tesseract-OCR:
   https://tesseract-ocr.github.io/tessdoc/Compiling-–-GitInstallation.html

Python Installations:
pip install katna
pip install opencv-python
pip install pdf2image

Running UDScan

UDScan can be executed used following command:

1)mkdir ./src/page
2)change the input file in src/driver.cpp
3)make run

License

UDScan is completely free and open-source

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
images		images
include		include
src		src
Makefile		Makefile
README.md		README.md
billl.jpg		billl.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UDScan

Glossary

Problem solved by UDScan

UDScan Architecture

Ground level Data flow Diagram

Level 1 Data flow Diagram

Prerequisites

3rd Party Library Dependencies

Installation

Running UDScan

License

About

Releases

Packages

Languages

prahalad12345/UDScan

Folders and files

Latest commit

History

Repository files navigation

UDScan

Glossary

Problem solved by UDScan

UDScan Architecture

Ground level Data flow Diagram

Level 1 Data flow Diagram

Prerequisites

3rd Party Library Dependencies

Installation

Running UDScan

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages