You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The stopword module is removing stopwords or in other words blacklisting words.
redlist is the opposite. A list of words that you don't want to "go extinct". It is not generic but connected to a text corpus. It's manually maintained and used automatically.
When using stopword-trainer you have a source of text. If this text corpus is not static, but growing, you can retrain this list when you get new text added to corpus.
For every source of text some words that you wouldn't consider a stopword, but it may end up defined as one. To keep it permanently out of the blacklist when retraining you add it to the redlist.
The combination of raw stopword data with all words in corpus + every words stopwordiness, the redlist and cutOff-number (how many words to use as stopwords) makes a stopword list for a given text corpus.
The text was updated successfully, but these errors were encountered:
stopword
module is removing stopwords or in other words blacklisting words.redlist
is the opposite. A list of words that you don't want to "go extinct". It is not generic but connected to a text corpus. It's manually maintained and used automatically.stopword-trainer
you have a source of text. If this text corpus is not static, but growing, you can retrain this list when you get new text added to corpus.The text was updated successfully, but these errors were encountered: