Skip to content

A bit of everything about text and nlp [IN PROGRESS]

Notifications You must be signed in to change notification settings

jrzaurin/nlp-stuff

Repository files navigation

NLP stuff

I add here stuff related to NLP.

Within each directory there should be a README file to help guiding you through the code. So far, this is what I have included:

  1. amazon_reviews_classification_without_DL

    Predicting the review score for Amazon reviews (Shoes, Clothes and jewelery). using tf-idf, LDA and EnsembleTopics along with lightGBM and hyperopt for the final classification and hyper-parameter optimization. I placed special emphasis in the text preprocessing.

  2. amazon_reviews_classification_with_EDA

    Amazon Reviews classification using tf-idf and EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks (Jason Wei and Kai Zou 2019) along with lightGBM and hyperopt for the final classification and hyper-parameter optimization. Following the philosophy of the previous exercise, I placed some emphasis in the text preprocessing, in particular in the use of certain tokenizers.

  3. amazon_reviews_classification_HAN

    Amazon Reviews classification (score prediction) using Hierarchical Attention Networks (Zichao Yang, et al., 2016). I have also used a number of Dropout mechanisms from the work Regularizing and Optimizing LSTM Language Models (Stephen Merity, Nitish Shirish Keskar and Richard Socher, 2017).

  4. amazon_reviews_textrank

    The simplest text summarization approach using the Pagerank algorithm via the networkx package and comparing the results with the properTextrank implementation Variations of the Similarity Function of TextRank for Automated Summarization (Federico Barrios et al., 2016).

  5. rnn_character_tagging

    Tagging at character level using RNNs with the aim of differentiating for example, different coding languages or writing styles. The code here is based in a post by Nadbor.

  6. 20_newsgroup_classification_cnn_tf

    This is a dir with very old Tensorflow code. My aim back then was is simply to illustrate 3 different ways of building a Convolutional neural network for text classification using Tensorflow. Last time I checked (October 2019) The code still run, but if you run it you will get every possible warning to upgrade. This dir is mostly for me to keep track of the things I do more than any other thing.

Any comments or suggestions please: [email protected] or even better open an issue.