Skip to content

Latest commit

 

History

History
 
 

text_classification

Neural Networks for Text Classification

This is an example of text classification using typical neural networks. This code can switch choices below:

  • LSTM
  • CNN + MLP
  • BoW + MLP
  • Character-based variant models of those

And also, dataset is switchable among below:

  • DBPedia Ontology dataset (dbpedia): Predict its ontology class from the abstract of an Wikipedia article.
  • IMDB Movie Review Dataset (imdb.binary, imdb.fine): Predict its sentiment from a review about a movie. .binary's classes are positive/negative. .fine's classes are ratings [0-1]/[2-3]/[7-8]/[9-10].
  • TREC Question Classification (TREC): Predict the type of its answer from a factoid question.
  • Stanford Sentiment Treebank (stsa.binary, stsa.fine): Predict its sentiment from a review about a movie. .binary's classes are positive/negative. .fine's classes are [negative]/[somewhat negative]/[neutral]/[somewhat positive]/[positive].
  • Customer Review Datasets (custrev): Predict its sentiment (positive/negative) from a review about a product.
  • MPQA Opinion Corpus (mpqa): Predict its opinion polarity from a phrase.
  • Scale Movie Review Dataset (rt-polarity): Predict its sentiment (positive/negative) from a review about a movie.
  • Subjectivity datasets (subj): Predict subjectivity (subjective/objective) from a sentnece about a movie.

Some of datasets are downloaded from @harvardnlp's repository. Thank you.

How to Run

To train a model:

python train_text_classifier.py -g 0 --dataset stsa.binary --model cnn

The output directory result contains:

  • best_model.npz: a model snapshot, which won the best accuracy for validation data during training
  • vocab.json: model's vocabulary dictionary as a json file
  • args.json: model's setup as a json file, which also contains paths of the model and vocabulary

To apply the saved model to your sentences, feed the sentences through stdin:

cat sentences_to_be_classifed.txt | python run_text_classifier.py -g 0 --model-setup result/args.json

The classification result is given by stdout.