This project is inspired by the ICLR2021-OpenReviewData initiative. My aim is to crawl, visualize, and analyze data from the ICLR 2024 OpenReview platform. The goal is to provide a comprehensive understanding of the trends and dynamics in cutting-edge machine learning research. This analysis will help in identifying emerging topics, the overall direction of the field, and notable shifts in research focus.
The primary source of data will be the ICLR 2024 OpenReview website. I plan to extract detailed information about the submitted papers, focusing primarily on paper keywords, ratings, and final decisions. This data will offer insights into the most discussed topics, the quality of research, and the acceptance trends in the conference.
- Install requirements
pip install argparse selenium pandas wordcloud nltk pandas imageio selenium tqdm
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('stopwords')
- Run
crawl_paperlist.py
to crawl the list of papers (~0.5h).
python crawl_paperlist.py
- Run
crawl_reviews.py
to crawl the reviews of papers (~0.5h).
python crawl_reviews.py
The extracted paper list and corresponding ratings are as follows: + paperlist_2024.tsv (2,401 submissions in total) + ratings.tsv (2,401 submissions in total)
-
Run
visualization.ipynb
andbuild_keyword_graph.ipynb
to build the keyword graph and visualize it. -
Run
python Interactive.py
to interact with the keyword graph on the CLI.