Skip to content

rakesh-lagare/NLP-Based-Data-Preprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data-Preprocessing on Research-Papers

Data Normaliszation

Data Pre-processing involves transformation of data into useful information for knownledge gain through classifying, sorting, merging, retrieving, transmitting or recording. Data preprocessing can be done manually or computer based and it also can be automated.

One such form of data preprocessing is data cleaning. Here the following steps are applied to get preprocessed data :

  • Remove Square brackets
  • Remove non-ASCII characters from list of tokenized words
  • Convert all characters to lowercase from list of tokenized words
  • Remove Stopwords
  • Remove punctuation from list of tokenized words

The above steps are applied for files from the poll directory. Poll the directory for files and preprocess the contents in each file.
Post preprocessing , write the content to new file in same or different directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published