Topic modeling is a fundamental text analysis technique since asking "What are these documents about?" is a fundamental question. In this assignment, you will build an NMF model, an LDA topic model, and an LSA topic model. You will compare the resulting topic allocations. In this assignment, we will work with the Brown University corpus in nltk
. The documents are in categories already, so you can compare your models to the official classification as well.
-
Create a repository under your GitHub account from this template: https://github.com/roozbehsadeghian/ads-tm-topic-modeling. Instructions can be found here. Make your repository public or add your instructor’s Github account as a collaborator.
-
The notebook “Topic Models.ipynb” holds detailed instructions for the assignment. In that notebook, you are asked to do the following:
- Run pre-written code exploring the Brown corpus.
- Fit a NMF model and interpret it.
- Fit an LSA model and interpret it.
- Fit an LDA model and interpret it.
-
Work through the notebook, performing the steps asked of you. Use and extend the code from the chapters of your textbook.
- Topic Modeling Repository
Deliverables:
- When you have finished your code, print your notebook as a PDF and upload this document to Canvas.
- Commit your code and push the changes to GitHub so your instructor has access to the ipynb notebook file and any other code you create.