LDA Topic Modeling of PERC Papers
This project focuses on using Latent Dirichlet Allocation to thematically analyze the physics education research literature, in the form of the PERC Physics Education Research Conference (PERC) Proceedings 2001-2018.
The code in this repository is described in a paper submitted for publication: Tor Ole B. Odden and Alessandro Marin, Marcos D. Caballero, Thematic Analysis of 18 Years of PERC Proceedings using Natural Language Processing (2020). The paper is available in arXiv: arxiv.org/abs/2001.10753.
To run the main notebook PERC_TopicModeling.ipynb install the required packages:
pip install -r requirements.txt --user
The required packages include Gensim (unsupervised semantic modelling on text), NLTK (Natural Language Tool Kit), LDAVis (interactive topic model visualization), scikit-learn, along with standard data analysis libraries such as pandas, numpy, and matplotlib.
Questions can be directed to Tor Ole Odden