Skip to content

kylemassimilian/lab3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lab3

For this lab, we essentially considered two cases when the program has to analyze a text. In the first case, we have already analyzed a given text. In the second case, it is the first time the program is analyzing a text. In order to check which case I'm dealing with, I use the md5 documentation to synchronously hash the name of the file.

If that hash already exists in the SQL table, which I check using the db.get command for SQLite3, then I simply have to return values that I have already calculated, which can occur in linear speed rather than the lengthy process of tokenizing. If the hash already exists in the table, I simply pass into the callback the SQL values for words, sentences, characters, and the two readability indices which are already stored in the table.

However, in the second case in which the hash does not exist in the SQL table, I have some analyzing to do. Using the tokenize and tokenize-english documentation, I find certain values. I use the regular expressions for numbers and letters as suggested in the hints to extract only the numbers and only the letters, and then I store the lengths of those lists of characters to compute how many numbers and letters there are. I then find the amount of characters by adding these two values. Next, I find the amount of words using tokenize. Finally, I use split and join to avoid counting new lines as new sentences. Using the amounts of words, sentences, letters, and numbers, I call the colemanLiau and automatedReadabilityIndex functions to compute these values. I then pass all the values into the callback so that I can generate a report on this text. Finally, I insert all of these values into the SQL table in a new row since this text has not been analyzed before.

In the callback, I concatenate a long string that contains all of the counts of words, characters, and sentences as well as the readability indices also passed into the callback.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published