I add here stuff related to NLP.
Within each directory there should be a README file to help guiding you through the code. So far, this is what I have included:
-
amazon_reviews_classification_without_DL
Predicting the review score for Amazon reviews (Shoes, Clothes and jewelery). using tf-idf, LDA and EnsembleTopics along with
lightGBM
andhyperopt
for the final classification and hyper-parameter optimization. I placed special emphasis in the text preprocessing. -
amazon_reviews_classification_with_EDA
Amazon Reviews classification using tf-idf and EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks (Jason Wei and Kai Zou 2019) along with
lightGBM
andhyperopt
for the final classification and hyper-parameter optimization. Following the philosophy of the previous exercise, I placed some emphasis in the text preprocessing, in particular in the use of certain tokenizers. -
amazon_reviews_classification_HAN
Amazon Reviews classification (score prediction) using Hierarchical Attention Networks (Zichao Yang, et al., 2016). I have also used a number of Dropout mechanisms from the work Regularizing and Optimizing LSTM Language Models (Stephen Merity, Nitish Shirish Keskar and Richard Socher, 2017).
-
amazon_reviews_textrank
The simplest text summarization approach using the
Pagerank
algorithm via the networkx package and comparing the results with the properTextrank
implementation Variations of the Similarity Function of TextRank for Automated Summarization (Federico Barrios et al., 2016). -
rnn_character_tagging
Tagging at character level using RNNs with the aim of differentiating for example, different coding languages or writing styles. The code here is based in a post by Nadbor.
-
20_newsgroup_classification_cnn_tf
This is a dir with very old Tensorflow code. My aim back then was is simply to illustrate 3 different ways of building a Convolutional neural network for text classification using Tensorflow. Last time I checked (October 2019) The code still run, but if you run it you will get every possible warning to upgrade. This dir is mostly for me to keep track of the things I do more than any other thing.
Any comments or suggestions please: [email protected] or even better open an issue.