Skip to content

OAfzal/nlp-for-peer-review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 

Repository files navigation

Datasets in Peer Review

In this repository, we compile existing datasets in peer review. This repository is associated with the paper "What Can Natural Language Processing Do for Peer Review?"

The number of scientific articles produced every year is growing steadily, making quality control crucial for scientists and the public good. Peer review, a widely used process in which each submission is evaluated by several independent experts in the field, is hard, time-consuming, and prone to error. As the artifacts involved in peer review, such as manuscripts, reviews, and discussions, are largely text-based, natural language processing (NLP) has great potential to improve the reviewing process. However, it is essential to identify where help is needed, where NLP can assist, and where it should stand aside. The paper aims to provide a foundation for the future efforts in NLP for peer reviewing assistance. We discuss peer review as a general process, exemplified particularly by reviewing at AI conferences. We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance, illustrated by existing work. This repository aims to serve as a jumping-off point for researchers with a list of available datasets and a brief summary of each.

If you have any suggestions for additional datasets, please submit a pull request or email us at [email protected].

Resource List

Year Title Data URL Summary
2023 MOPRD: A multidisciplinary open peer review dataset Data Multidisciplinary open peer review dataset encompassing every stage of the process.
2023 ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews Data New small-scale corpus of peer review comments with paper edits made in response to them.
2023 Testing for Reviewer Anchoring in Peer Review: A Randomized Controlled Trial Data Controlled peer-review experiment on reviewer anchoring in rebuttals.
2023 Automatic Analysis of Substantiation in Scientific Peer Reviews Data Peer reviews with claims and associated evidences for detecting lack of substantiation.
2023 Overview of PragTag-2023: Low-Resource Multi-Domain Pragmatic Tagging of Peer Reviews Data Shared task extending NLPEER with pragmatic tag annotations for studying tagging across domains.
2023 When Reviewers Lock Horns: Finding Disagreements in Scientific Peer Reviews Data Identifying contradictions between reviewers.
2023 PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews Data Politeness classifier for peer review to develop politeness indicators.
2023 NLPEER: A Unified Resource for the Computational Study of Peer Review Data Benchmark dataset combining new and existing datasets for cross-domain peer review assistance.
2023 ArgSciChat (Argumentative Dialogues on Scientific Papers) Data Argumentative dialogues for studying dialogue agents and information needs from abstracts.
2023 Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation Data Meta-review as a summary with conflicting information and full thread input.
2023 A Dataset on Malicious Paper Bidding in Peer Review Data Controlled peer-review experiment on collusion ring behavior.
2023 ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing Data Small-scale data to evaluate LLM reviewing.
2023 Exploring Jiu-Jitsu Argumentation for Writing Peer Review Rebuttals Data Dataset of review statements with attitude roots, themes, and canonical rebuttals.
2023 A Gold Standard Dataset for the Reviewer Assignment Problem Data Researchers' own evaluations of their expertise in reviewing papers.
2022 Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond Data Dataset with papers from ACL anthology revised by professional editors.
2022 ReAct (A Review Comment Dataset for Actionability) Data Review comment actionability classification.
2022 HedgePeer (Uncertainty Detection) Data Uncertainty detection in peer review texts using hedge cues and spans.
2022 Can we automate scientific reviewing? Data Insights into applications and review quality with challenges for review generation.
2022 DISAPERE: A Dataset for Discourse Structure in Peer Review Discussions Data Multi-layer-annotated dataset for studying discourse structure between rebuttals and reviews.
2022 Revise and Resubmit: An Intertextual Model of Text-based Collaboration in Peer Review Data Annotated reviews with pragmatic tags, links, and paper version alignment for intertextual study.
2022 arXivEdits: Understanding the Human Revision Process in Scientific Writing Data Corpus of arxiv preprints with aligned edits labeled with intentions.
2021 COMPARE: A Taxonomy and Dataset of Comparison Discussions in Peer Reviews Data Identifying comparison sentences in peer reviews and taxonomy of comparison discussions.
2021 Argument Mining Driven Analysis of Peer-Reviews Data,Data Argument mining for extracting relevant info from reviews.
2020 Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment Data Controlled peer-review experiment on strategic reviewing.
2020 APE: Argument Pair Extraction from Peer Review and Rebuttal via Multi-task Learning Data Annotated dataset linking arguments in review reports to author rebuttals.
2019 Argument Mining for Understanding Peer Reviews Data Peer reviews annotated by argumentative units and their types.
2019 Does my rebuttal matter? Data Dataset of anonymized review scores before and after rebuttal from ACL 2018.
2018 A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications Data One of the first datasets of peer reviews from the NLP community.
2017 AAMAS Bidding Data Data Anonymized bidding data from the AAMAS conference.

Citation

@misc{kuznetsov2024natural,
      title={What Can Natural Language Processing Do for Peer Review?}, 
      author={Ilia Kuznetsov and Osama Mohammed Afzal and Koen Dercksen and Nils Dycke and Alexander Goldberg and Tom Hope and Dirk Hovy and Jonathan K. Kummerfeld and Anne Lauscher and Kevin Leyton-Brown and Sheng Lu and Mausam and Margot Mieskes and Aurélie Névéol and Danish Pruthi and Lizhen Qu and Roy Schwartz and Noah A. Smith and Thamar Solorio and Jingyan Wang and Xiaodan Zhu and Anna Rogers and Nihar B. Shah and Iryna Gurevych},
      year={2024},
      eprint={2405.06563},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •