NLP Frequency Analysis

NLP Frequency Analysis is a web application that allows a user to upload a file and view how often the most frequent 25 words are used in it. The user can also analyse the previous ten submissions with frequency analyses data (original text, stop words setting, and resulting word frequencies).

Overview

NLP Frequency Analysis switch between the FrequencyCount view and FrequencyAnalysis view using a navigation bar. On the FrequencyCount Tab, they can upload a file and then process it to view the 25 most frequent words with their counts, excluding stopwords. On the Frequency Analysis Tab, they can switch in between several menus displaying the ten most recent frequency analyses.

View Deployed Version

There is a Heroku version of the web app running on:

https://humanpracticefrequencyanalysis.herokuapp.com/

Here are the page views:

FrequencyCount Page

Word frequencies Page

FrequencyAnalysis Page

How to run

First, install all the requirements needed from the requirements.txt file

pip install -r requirements.txt

Run:

python server.py

Load your browser and enter the link:

http://127.0.0.1:5000/

You can then use the navigation bar and different menus to navigate

👍 _ Ready!! to process some files

Libraries/frameworks you used

Flask Backend

I used Flask to write the backend for the Web App. With Flask, I used mongodb so as to be able to have persisted analysis by fetching the last N records entered into the mongodb database. To use mongodb with Python, I used Pymongo.

Database

mongodb with key-value relationship which was very effecient in this case.

Web App Frontend

I used a mix of HTML, Bootstrap, JQuery and Javascript to display the results on the frontend.

Text Processing

I used the ntlk toolkit to tokenize the contents of the file uploaded, to find the 25 most common words and to remove stopwords when needed.

Stemming

After normalizing the data, I used Porter's stemming algorithm based on the original paper: http://tartarus.org/martin/PorterStemmer. I simplified it since we only need to cater for two cases:

Regularly conjugated english verbs. For example, consider "talk", "talks", "talking", and "talked" to all be forms of "talk”, and “passes”, “passed”, and “passing” to all be forms of “pass”.
Regularly pluralized english nouns. For example, consider "cat" and "cats" to be forms of "cat".

These two cases are covered by Step 1 of Porter's algorithm implemented in server.py.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
__pycache__		__pycache__
src		src
static		static
templates		templates
.DS_Store		.DS_Store
Procfile		Procfile
README.md		README.md
b5quotes.txt		b5quotes.txt
bakebred.txt		bakebred.txt
bnbascii.txt		bnbascii.txt
boarchil.txt		boarchil.txt
example.txt		example.txt
nltk.txt		nltk.txt
requirements.txt		requirements.txt
server.py		server.py
stemmer.py		stemmer.py
txtfx.txt		txtfx.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Frequency Analysis

Overview

View Deployed Version

FrequencyCount Page

Word frequencies Page

FrequencyAnalysis Page

How to run

Libraries/frameworks you used

Flask Backend

Database

Web App Frontend

Text Processing

Stemming

About

Releases

Packages

Contributors 2

Languages

GideonCheruiyot/FrequencyAnalysis

Folders and files

Latest commit

History

Repository files navigation

NLP Frequency Analysis

Overview

View Deployed Version

FrequencyCount Page

Word frequencies Page

FrequencyAnalysis Page

How to run

Libraries/frameworks you used

Flask Backend

Database

Web App Frontend

Text Processing

Stemming

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages