Skip to content

Latest commit

 

History

History

amazon_reviews_textrank

Amazon reviews classification using tfidf and Topic Modeling

Here is a relatively quick attempt to build TextRankusing networkx Pagerank. I compare the results with the proper implementation here

As with most of the code throughout this repo, the code is not meant to be production-ready, but readable so one can see what is happening. You might find some of the helper function useful for your tasks

The order of the .py scripts is:

  1. prepare_data.py : simple manipulation and sentence tokenization
  2. sentence_vectors.py: build sentence vectors averaging word vectors
  3. reviews_summary.py : summarize reviews using the class Summarizer at summarize.py

Easy.

As one might expect, the SummaNLP implementation works better than mine. There are explanatory notebooks in the notebooks dir.