OCRDocumentClassifier

This dataset represents the output of the OCR stage of the data pipeline. The word order for the dataset comes directly from the OCR layer. Trained a document classification model using Random forest, Linear SVC and Sequential NN using TF-IDF Features

The given CSV file contains raw OCR data read from PDF soft copies. The document consists of a document type in the first column and followed by hashed OCR delimited by space. Each hashed word is assigned to a unique word and it is assumed that they are all ordered. A sample line looks like below:

CANCELLATION NOTICE,641356219cbc f95d0bea231b ... [lots more words] ... 52102c70348d b32153b8b30c

1.Trained a document classification model. 2. Deployed my model to a public cloud platform (AWS) as a webservice.

Accuracies of the experimented algorithm:

Linear SVC: 0.816023166023166
Logistic Regression: 0.7010939510939511
Seq Neural Network : 0.8566

Chose the best performing to model to create a RESTful API using Flask and also created a UI template for real time document analysis.

Webservice specifications:

RESTful API
Respect content-type header (application/json and text/html minimum other bonus)
Discoverable from root path
URL encoded GET parameter "words" returns predicted document type (confidence is a bonus) in field "prediction" and "confidence"
HTML pages should be readable by a human and allow for action, aka input field and submit buttons etc.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
web_app		web_app
Document_Classifier.ipynb		Document_Classifier.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCRDocumentClassifier

About

Releases

Packages

Languages

kartiikthakur/DataChallenge2--ML-DevOps

Folders and files

Latest commit

History

Repository files navigation

OCRDocumentClassifier

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages