Authors: Ziming Luo*, Zonglin Yang*, Zexin Xu, Wei Yang, Xinya Du
This is a repository for organizing papres, codes and other resources related to large language models for the scientific research process.
Schematic overview of the scientific research pipeline covered in this survey. This cyclical process begins with scientific hypothesis discovery, followed by experiment planning and implementation, paper writing, and finally peer reviewing of papers. The experiment planning stage consists of optimizing experiment design and executing research tasks, while the paper writing stage consists of citation text generation, related work generation, and drafting & writing. Those papers contain both task-specific methods and evaluation benchmarks. Note that there might be some duplicated papers in the two categories.
If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. Just letting us know the title of papers can also be a great contribution to us. You can do this by open issue or contact us directly via email.
- LLMs for Scientific Hypothesis Discovery
- LLMs for Experiment Planning and Implementation
- LLMs for Scientific Paper Writing
- LLMs for Peer Reviewing
-
SciMON SciMON: Scientific Inspiration Machines Optimized for Novelty (May. 23, 2023; ACL 2024)
-
MOOSE Large Language Models for Automated Open-domain Scientific Hypotheses Discovery (Sep. 6, 2023; ICML AI4Science Workshop Best Poster Award; ACL 2024)
-
MCR Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design (Oct. 22, 2023; EMNLP 2023)
-
Large language models are zero shot hypothesis proposers (Nov. 10, 2023; COLM 2024)
-
FunSearch Mathematical discoveries from program search with large language models (Dec. 14, 2023; Nature)
-
ChemReasoner ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback (Feb. 15, 2024; ICML 2024)
-
SGA LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery (May. 16, 2024; ICML 2024)
-
AIScientist The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (Aug. 12, 2024)
-
MLR-Copilot MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents (Aug. 26, 2024)
-
IGA Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers (Sep. 6, 2024)
-
SciAgents SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning (Sep. 9, 2024)
-
Scideator Scideator: Human-LLM Scientific Idea Generation Grounded in Research-Paper Facet Recombination (Sep. 23, 2024)
-
MOOSE-Chem MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses (Oct. 9, 2024; ICLR 2025)
-
VirSci Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation (Oct. 12, 2024)
-
CoI Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents (Oct. 17, 2024)
-
Nova Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas (Oct. 18, 2024)
-
Coscientist Autonomous chemical research with large language models (Dec. 20, 2023)
-
ChemCrow Augmenting large language models with chemistry tools (May. 08, 2024)
-
CRISPR-GPT CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments (Arp. 27, 2024)
-
Navigating Complexity Navigating Complexity: Orchestrated Problem Solving with Multi-Agent LLMs (Jul. 10, 2024)
-
HuggingGPT HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face (Dec. 03, 2024)
-
AutoGen AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework (Oct. 03, 2023)
-
LLM-RDF An automatic end-to-end chemical synthesis development platform powered by large language models (Nov. 23, 2024)
-
Simulating Expert Discussions with Multi-agent for Enhanced Scientific Problem Solving (Jan. 23, 2024)
-
Data-Juicer Data-Juicer: A One-Stop Data Processing System for Large Language Models (Dec. 20, 2023)
-
Jellyfish Jellyfish: A Large Language Model for Data Preprocessing (Oct. 28, 2024)
-
Can Large Language Models Transform Computational Social Science? (Feb. 26, 2024)
-
CAAFE Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering (Sep. 28, 2023)
-
Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks. (Jun. 19, 2023)
-
Training Socially Aligned Language Models in Simulated Human Society (Oct. 28, 2023)
-
ESM-1b Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences (Dec. 16, 2020)
-
ESM-2 Evolutionary-scale prediction of atomic-level protein structure with a language model (Mar. 16, 2023)
-
Controllable protein design with language models (Aug. 22, 2022)
-
PALM-H3 De novo generation of SARS-CoV-2 antibody CDRH3 with a pre-trained generative large language model (Aug. 10, 2024
-
Coscientist Autonomous chemical research with large language models (Dec. 20, 2023)
-
ChemCrow Augmenting large language models with chemistry tools (May. 08, 2024)
-
Efficient Evolutionary Search Over Chemical Space with Large Language Models (Jul. 02 2024)
-
ChatDrug Conversational Drug Editing Using Retrieval and Domain Feedback (May. 29, 2023)
-
DrugAssist DrugAssist: A Large Language Model for Molecule Optimization (Dec. 28, 2023)
-
Bayesian Optimization of Catalysts With In-context Learning (Apr. 18, 2024)
-
Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design (Oct. 22, 2023)
-
ChemReasoner CHEMREASONER: Heuristic Search over a Large Language Model’s Knowledge Space using Quantum-Chemical Feedback (Dec. 09, 2024)
-
Automated Statistical Model Discovery with Language Models (Jun. 22, 2024)
-
MentaLLaMA MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models (Feb. 04, 2024)
-
Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis (Feb. 02, 2024)
-
Opening a conversation on responsible environmental data science in the age of large language models (May. 09, 2024)
-
DSBench DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? (Sep. 12, 2024)
-
AutoGen AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework (Oct. 03, 2023)
-
LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis (Oct. 23, 2023)
-
SUPER SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories (Sep. 11, 2024) [
]((https://arxiv.org/abs/2409.07440)
-
MLE-bench MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering (Dec. 20, 2023)
-
ScienceAgentBench ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery (Oct. 07, 2024)
-
Spider2-V Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? (Jul. 15, 2024)
-
MLAgentBench MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation (Oct. 05, 2023)
-
DiscoveryWorld DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents (Jun. 10, 2024)
-
DSBench DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? (Sep. 12, 2024)
-
DS-1000 DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation (Nov. 18, 2022)
-
LAB-Bench LAB-Bench: Measuring Capabilities of Language Models for Biology Research (Jul. 14, 2024)
-
AgentBench AgentBench: Evaluating LLMs as Agents (Aug. 07, 2023)
-
TaskBench TaskBench: Benchmarking Large Language Models for Task Automation (Nov. 30, 2023)
-
CORE-Bench CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark (Sep. 17, 2024)
-
Automatic Generation of Citation Texts in Scholarly Papers: A Pilot Study (July. 30, 2020)
-
Explaining Relationships Among Research Papers (Feb. 20, 2024)
-
AutoCite AutoCite: Multi-Modal Representation Fusion for Contextual Citation Generation (Mar. 08, 2021)
-
BACO BACO: A Background Knowledge- and Content-Based Framework for Citing Sentence Generation (Aug. 1, 2021)
-
Controllable Citation Sentence Generation with Language Models (Nov. 14, 2022)
-
Intent-Controllable Citation Text Generation (May. 21, 2022)
-
Shallow Synthesis of Knowledge in GPT-Generated Texts: A Case Study in Automatic Related Work Composition (Feb. 19, 2024)
-
Leveraging Large Language Models for Literature Review Tasks - A Case Study Using ChatGPT (Dec. 20, 2023)
-
LitLLM LitLLM: A Toolkit for Scientific Literature Review (Fe. 02, 2024)
-
HiReview HiReview: Hierarchical Taxonomy-Driven Automatic Literature Review Generation (Oct. 02, 2024)
-
Towards a Unified Framework for Reference Retrieval and Related Work Generation (Dec. 06, 2023)
-
Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning (Apr. 08, 2024)
-
Reinforced Subject-Aware Graph Neural Network for Related Work Generation (Jul. 26, 2024)
-
Toward Structured Related Work Generation with Novelty Statements (Jul. 26, 2024)
-
Generating Scientific Definitions with Controllable Complexity (May. 22, 2022)
-
SciCap SciCap: Generating Captions for Scientific Figures (Nov. 07, 2021)
-
CoAuthor CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities (Apr. 29, 2022)
-
Autonomous LLM-driven research from data to human-verifiable research papers (Apr. 24, 2024)
-
PaperRobot PaperRobot: Incremental Draft Generation of Scientific Ideas (Jun. 28, 2019)
-
AutoSurvey AutoSurvey: Large Language Models Can Automatically Write Surveys (Jun. 10, 2024)
-
AI Scientist The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (Aug. 12, 2024)
-
CycleResearcher CycleResearcher: Improving Automated Research via Automated Review (Oct. 28, 2024)
-
Enabling Large Language Models to Generate Text with Citations (Dec. 06, 2023)
-
CiteBench: A Benchmark for Scientific Citation Text Generation (Dec. 06, 2023)
-
SciGen SciGen: a Dataset for Reasoning-Aware Text Generation from Scientific Tables (May. 23, 2024)
-
SciXGen SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation (Nov. 7, 2021)
-
LLM-Review-Sys The Emergence of Large Language Models (LLM) as a Tool in Literature Reviews: An LLM Automated Systematic Review (Sep. 6, 2024)
-
NLP-for-Peer-Review What Can Natural Language Processing Do for Peer Review? (May. 10, 2024)
-
A Friend of a Foe? Artificial Intelligence in Scientific Writing: A Friend or a Foe? (Apr. 20, 2024)
-
Increasing-Use-of-LLMs Mapping the Increasing Use of LLMs in Scientific Papers (Apr. 1, 2024)
-
Monitoring AI-Modified Content Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews (Mar. 11, 2024)
-
Emerging Plagiarism Emerging Plagiarism in Peer-Review Evaluation Reports: A Tip of the Iceberg? (Feb. 29, 2024)
-
Substantiation-Analysis Automatic Analysis of Substantiation in Scientific Peer Reviews (Nov. 20, 2023)
-
PR4PR Peer Reviews of Peer Reviews: A Randomized Controlled Trial and Other Experiments (Nov. 16, 2023)
-
Can-LLM-Provide-Useful-Feedback? Can large language models provide useful feedback on research papers? A large-scale empirical analysis (Oct. 3, 2023)
-
GPT4-Review-Study GPT-4 is Slightly Helpful for Peer-Review Assistance: A Pilot Study (Jun. 16, 2023)
-
SEA Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis (Jul. 9, 2024)
-
SWIF2T Automated Focused Feedback Generation for Scientific Writing Assistance (May. 30, 2024)
-
CGI2 Scientific Opinion Summarization: Paper Meta-Review Generation Dataset, Methods, and Evaluation (May. 24, 2024)
-
LLM-MetaReview Prompting LLMs to Compose Meta-Review Drafts from Peer-Review Narratives of Scholarly Manuscripts (Feb. 23, 2024)
-
Reviewer2 Reviewer2: Optimizing Review Generation Through Prompt Generation (Feb. 16, 2024)
-
MARG MARG: Multi-Agent Review Generation for Scientific Papers (Jan. 8, 2024)
-
ReviewRobot ReviewRobot: Explainable Paper Review Generation Based on Knowledge Synthesis (INLG(ACL)2020)
[
](https://github.com/EagleW/Review
-
AI-Mediated Peer Review A Critical Examination of the Ethics of AI-Mediated Peer Review (Sep. 2, 2024)
-
AgentReview AGENTREVIEW: Exploring Peer Review Dynamics with LLM Agents (Jun. 18, 2024)
-
ReviewerGPT ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing (Jun. 1, 2024)
-
ReviewFlow ReviewFlow: Intelligent Scaffolding to Support Academic Peer Reviewing (Feb. 5, 2024)
-
HumanInTheLoop-AI-Reviewing Human-in-the-loop AI Reviewing: Feasibility, Opportunities, and Risks (Jan. 1, 2024)
-
CocoSciSum CocoSciSum: A Scientific Summarization Toolkit with Compositional Controllability (EMNLP2023)
-
PaperMage PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents (ACL2023)
-
PaperQA2 Language agents achieve superhuman synthesis of scientific knowledge (Sep. 10, 2023)
-
ChatGPT-Journal-Reviews ChatGPT and the Future of Journal Reviews (Sep. 29, 2023)
-
CARE CARE: Collaborative AI-Assisted Reading Environment (Feb. 24, 2023)
-
CritiqueReview LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing (Jun. 24, 2024)
-
ORSUM Scientific Opinion Summarization: Paper Meta-Review Generation Dataset, Methods, and Evaluation (May. 24, 2024)
-
RR-MCQ Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks (ACL2024)
-
Reviewer2 Reviewer2: Optimizing Review Generation Through Prompt Generation (Feb. 16, 2024)
-
ASAP-Review Can We Automate Scientific Reviewing? (Jan. 30, 2024)
-
PeerSum Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation (May. 2, 2023)
-
MOPRD MOPRD: A Multidisciplinary Open Peer Review Dataset (Dec. 9, 2022)
-
NLPeer NLPeer: A Unified Resource for the Computational Study of Peer Review (Nov. 12, 2022)
-
MReD MReD: A Meta-Review Dataset for Structure-Controllable Text Generation (Findings(ACL)2022)
-
PeerRead A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications (Apr. 25, 2018)
If you find this code useful in your research, please consider citing:
@misc{luo2025llm4srsurveylargelanguage,
title={LLM4SR: A Survey on Large Language Models for Scientific Research},
author={Ziming Luo and Zonglin Yang and Zexin Xu and Wei Yang and Xinya Du},
year={2025},
eprint={2501.04306},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.04306},
}