In this repository, we compile existing datasets in peer review. This repository is associated with the paper "What Can Natural Language Processing Do for Peer Review?"
The number of scientific articles produced every year is growing steadily, making quality control crucial for scientists and the public good. Peer review, a widely used process in which each submission is evaluated by several independent experts in the field, is hard, time-consuming, and prone to error. As the artifacts involved in peer review, such as manuscripts, reviews, and discussions, are largely text-based, natural language processing (NLP) has great potential to improve the reviewing process. However, it is essential to identify where help is needed, where NLP can assist, and where it should stand aside. The paper aims to provide a foundation for the future efforts in NLP for peer reviewing assistance. We discuss peer review as a general process, exemplified particularly by reviewing at AI conferences. We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance, illustrated by existing work. This repository aims to serve as a jumping-off point for researchers with a list of available datasets and a brief summary of each.
If you have any suggestions for additional datasets, please submit a pull request or email us at [email protected].
Year | Title | Data URL | Summary |
---|---|---|---|
2023 | MOPRD: A multidisciplinary open peer review dataset | Data | Multidisciplinary open peer review dataset encompassing every stage of the process. |
2023 | ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews | Data | New small-scale corpus of peer review comments with paper edits made in response to them. |
2023 | Testing for Reviewer Anchoring in Peer Review: A Randomized Controlled Trial | Data | Controlled peer-review experiment on reviewer anchoring in rebuttals. |
2023 | Automatic Analysis of Substantiation in Scientific Peer Reviews | Data | Peer reviews with claims and associated evidences for detecting lack of substantiation. |
2023 | Overview of PragTag-2023: Low-Resource Multi-Domain Pragmatic Tagging of Peer Reviews | Data | Shared task extending NLPEER with pragmatic tag annotations for studying tagging across domains. |
2023 | When Reviewers Lock Horns: Finding Disagreements in Scientific Peer Reviews | Data | Identifying contradictions between reviewers. |
2023 | PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews | Data | Politeness classifier for peer review to develop politeness indicators. |
2023 | NLPEER: A Unified Resource for the Computational Study of Peer Review | Data | Benchmark dataset combining new and existing datasets for cross-domain peer review assistance. |
2023 | ArgSciChat (Argumentative Dialogues on Scientific Papers) | Data | Argumentative dialogues for studying dialogue agents and information needs from abstracts. |
2023 | Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation | Data | Meta-review as a summary with conflicting information and full thread input. |
2023 | A Dataset on Malicious Paper Bidding in Peer Review | Data | Controlled peer-review experiment on collusion ring behavior. |
2023 | ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing | Data | Small-scale data to evaluate LLM reviewing. |
2023 | Exploring Jiu-Jitsu Argumentation for Writing Peer Review Rebuttals | Data | Dataset of review statements with attitude roots, themes, and canonical rebuttals. |
2023 | A Gold Standard Dataset for the Reviewer Assignment Problem | Data | Researchers' own evaluations of their expertise in reviewing papers. |
2022 | Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond | Data | Dataset with papers from ACL anthology revised by professional editors. |
2022 | ReAct (A Review Comment Dataset for Actionability) | Data | Review comment actionability classification. |
2022 | HedgePeer (Uncertainty Detection) | Data | Uncertainty detection in peer review texts using hedge cues and spans. |
2022 | Can we automate scientific reviewing? | Data | Insights into applications and review quality with challenges for review generation. |
2022 | DISAPERE: A Dataset for Discourse Structure in Peer Review Discussions | Data | Multi-layer-annotated dataset for studying discourse structure between rebuttals and reviews. |
2022 | Revise and Resubmit: An Intertextual Model of Text-based Collaboration in Peer Review | Data | Annotated reviews with pragmatic tags, links, and paper version alignment for intertextual study. |
2022 | arXivEdits: Understanding the Human Revision Process in Scientific Writing | Data | Corpus of arxiv preprints with aligned edits labeled with intentions. |
2021 | COMPARE: A Taxonomy and Dataset of Comparison Discussions in Peer Reviews | Data | Identifying comparison sentences in peer reviews and taxonomy of comparison discussions. |
2021 | Argument Mining Driven Analysis of Peer-Reviews | Data,Data | Argument mining for extracting relevant info from reviews. |
2020 | Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment | Data | Controlled peer-review experiment on strategic reviewing. |
2020 | APE: Argument Pair Extraction from Peer Review and Rebuttal via Multi-task Learning | Data | Annotated dataset linking arguments in review reports to author rebuttals. |
2019 | Argument Mining for Understanding Peer Reviews | Data | Peer reviews annotated by argumentative units and their types. |
2019 | Does my rebuttal matter? | Data | Dataset of anonymized review scores before and after rebuttal from ACL 2018. |
2018 | A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications | Data | One of the first datasets of peer reviews from the NLP community. |
2017 | AAMAS Bidding Data | Data | Anonymized bidding data from the AAMAS conference. |
@misc{kuznetsov2024natural,
title={What Can Natural Language Processing Do for Peer Review?},
author={Ilia Kuznetsov and Osama Mohammed Afzal and Koen Dercksen and Nils Dycke and Alexander Goldberg and Tom Hope and Dirk Hovy and Jonathan K. Kummerfeld and Anne Lauscher and Kevin Leyton-Brown and Sheng Lu and Mausam and Margot Mieskes and Aurélie Névéol and Danish Pruthi and Lizhen Qu and Roy Schwartz and Noah A. Smith and Thamar Solorio and Jingyan Wang and Xiaodan Zhu and Anna Rogers and Nihar B. Shah and Iryna Gurevych},
year={2024},
eprint={2405.06563},
archivePrefix={arXiv},
primaryClass={cs.CL}
}