LLM-Agents-Papers

✍️ Description

Last Updated Time: 2024/5/25

A repo lists papers related to LLM based agent. Includes

Survey
Planning, Feedback&Reflection, Memory Mechanism
Role Playing, Game Playing, Tool Usage&Human-Agent Interaction
Benchmark&Evaluation, Environment&Platform
Agent Framework, Multi-Agent System
Agent Fine-tuning

💛 Recommendation

For more comprehensive reading, we also recommend other paper lists:

zjunlp/LLMAgentPapers: Must-read Papers on Large Language Model Agents.
teacherpeterpan/self-correction-llm-papers: This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.
Paitesanshi/LLM-Agent-Survey: A Survey on LLM-based Autonomous Agents.
woooodyy/llm-agent-paper-list: Must-read papers for LLM-based agents.
git-disl/awesome-LLM-game-agent-papers: Must-read papers for LLM-based Game agents.

📰 Papers

Survey

[2024/05/16] Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents | [paper] | [code]
[2024/04/17] The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey | [paper] | [code]
[2024/04/17] Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions | [paper] | [code]
[2024/04/03] Empowering Biomedical Discovery with AI Agents | [paper] | [code]
[2024/04/02] A Survey on Large Language Model-Based Game Agents | [paper] | [code]
[2024/03/26] Large Language Models for Human-Robot Interaction: Opportunities and Risks | [paper] | [code]
[2024/03/07] Promising and worth-to-try future directions for advancing state-of-the-art surrogates methods of agent-based models in social and health computational sciences | [paper] | [code]
[2024/02/28] Large Language Models and Games: A Survey and Roadmap | [paper] | [code]
[2024/02/28] A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems | [paper] | [code]
[2024/02/07] Can Large Language Model Agents Simulate Human Trust Behaviors? | [paper] | [code]
[2024/02/06] Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science | [paper] | [code]
[2024/02/05] Understanding the planning of LLM agents: A survey | [paper] | [code]
[2024/02/02] Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions | [paper] | [code]
[2024/01/01] If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents | [paper] | [code]
[2023/12/31] A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots | [paper] | [code]
[2023/12/19] Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives | [paper] | [code]
[2023/09/14] The Rise and Potential of Large Language Model Based Agents: A Survey | [paper] | [code]
[2023/08/22] A Survey on Large Language Model based Autonomous Agents | [paper] | [code]
[2023/06/27] Next Steps for Human-Centered Generative AI: A Technical Perspective | [paper] | [code]
[2023/04/06] Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions | [paper] | [code]

Planning

[2024/04/28] Logic Agent: Enhancing Validity with Logic Rule Invocation | [paper] | [code]
[2024/04/21] Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following | [paper] | [code]
[2024/03/13] AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents | [paper] | [code]
[2024/03/12] AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production | [paper] | [code]
[2024/03/11] Strength Lies in Differences! Towards Effective Non-collaborative Dialogues via Tailored Strategy Planning | [paper] | [code]
[2024/03/10] TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision | [paper] | [code]
[2024/03/05] KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | [paper] | [code]
[2024/03/05] Language Guided Exploration for RL Agents in Text Environments | [paper] | [code]
[2024/02/29] PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval | [paper] | [code]
[2024/02/28] Data Interpreter: An LLM Agent For Data Science | [paper] | [code]
[2024/02/18] What's the Plan? Evaluating and Developing Planning-Aware Techniques for LLMs | [paper] | [code]
[2024/02/18] PreAct: Predicting Future in ReAct Enhances Agent's Planning Ability | [paper] | [code]
[2024/02/16] When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | [paper] | [code]
[2024/02/09] Introspective Planning: Guiding Language-Enabled Agents to Refine Their Own Uncertainty | [paper] | [code]
[2024/02/06] RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | [paper] | [code]
[2024/02/02] TravelPlanner: A Benchmark for Real-World Planning with Language Agents | [paper] | [code]
[2024/01/10] AUTOACT: Automatic Agent Learning from Scratch via Self-Planning | [paper] | [code]
[2023/11/19] TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | [paper] | [code]
[2023/10/09] Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena | [paper] | [code]
[2023/08/07] TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents | [paper] | [code]
[2023/05/26] AdaPlanner: Adaptive Planning from Feedback with Language Models | [paper] | [code]
[2023/05/24] Reasoning with Language Model is Planning with World Model | [paper] | [code]
[2023/05/24] Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning | [paper] | [code]
[2023/03/29] Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks | [paper] | [code]
[2023/02/03] Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | [paper] | [code]
[2022/12/08] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | [paper] | [code]

Feedback&Reflection

[2024/03/18] QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction | [paper] | [code]
[2024/03/17] Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback | [paper] | [code]
[2024/03/08] ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues | [paper] | [code]
[2024/03/04] Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents | [paper] | [code]
[2024/02/27] Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization | [paper] | [code]
[2024/02/26] SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection | [paper] | [code]
[2024/02/24] Empowering Large Language Model Agents through Action Learning | [paper] | [code]
[2024/02/22] Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning | [paper] | [code]
[2024/02/19] A Critical Evaluation of AI Feedback for Aligning Large Language Models | [paper] | [code]
[2024/02/06] AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | [paper] | [code]
[2024/02/02] StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | [paper] | [code]
[2024/02/01] Generation, Distillation and Evaluation of Motivational Interviewing-Style Reflections with a Foundational Language Model | [paper] | [code]
[2023/12/18] CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update | [paper] | [code]
[2023/11/14] The ART of LLM Refinement: Ask, Refine, and Trust | [paper] | [code]
[2023/10/31] Learning From Mistakes Makes LLM Better Reasoner | [paper] | [code]
[2023/08/01] SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | [paper] | [code]
[2023/07/27] PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | [paper] | [code]
[2023/05/26] AdaPlanner: Adaptive Planning from Feedback with Language Models | [paper] | [code]
[2023/05/22] Making Language Models Better Tool Learners with Execution Feedback | [paper] | [code]
[2023/04/11] Teaching Large Language Models to Self-Debug | [paper] | [code]
[2023/03/30] Self-Refine: Iterative Refinement with Self-Feedback | [paper] | [code]
[2023/02/03] Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | [paper] | [code]

Memory Mechanism

[2024/04/15] Memory Sharing for Large Language Model based Agents | [paper] | [code]
[2024/02/27] Evaluating Very Long-Term Conversational Memory of LLM Agents | [paper] | [code]
[2024/02/19] Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations | [paper] | [code]
[2024/02/07] InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory | [paper] | [code]
[2024/02/06] RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | [paper] | [code]
[2023/12/22] Empowering Working Memory for Large Language Model Agents | [paper] | [code]
[2023/12/22] Evolving Large Language Model Assistant with Long-Term Conditional Memory | [paper] | [code]
[2023/11/10] JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models | [paper] | [code]
[2023/10/16] CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization | [paper] | [code]
[2023/06/06] ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory | [paper] | [code]
[2023/05/31] Monotonic Location Attention for Length Generalization | [paper] | [code]
[2023/05/26] Randomized Positional Encodings Boost Length Generalization of Transformers | [paper] | [code]
[2023/05/25] Landmark Attention: Random-Access Infinite Context Length for Transformers | [paper] | [code]
[2023/05/24] Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration | [paper] | [code]
[2023/05/24] Adapting Language Models to Compress Contexts | [paper] | [code]
[2023/05/23] RET-LLM: Towards a General Read-Write Memory for Large Language Models | [paper] | [code]
[2023/05/22] RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text | [paper] | [code]
[2023/05/19] ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings | [paper] | [code]
[2023/05/17] MemoryBank: Enhancing Large Language Models with Long-Term Memory | [paper] | [code]
[2023/05/15] Small Models are Valuable Plug-ins for Large Language Models | [paper] | [code]
[2023/05/02] Unlimiformer: Long-Range Transformers with Unlimited Length Input | [paper] | [code]
[2023/05/01] Learning to Reason and Memorize with Self-Notes | [paper] | [code]
[2023/04/27] ChatLog: Recording and Analyzing ChatGPT Across Time | [paper] | [code]
[2023/04/26] Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System | [paper] | [code]
[2023/04/21] Emergent and Predictable Memorization in Large Language Models | [paper] | [code]
[2023/03/17] CoLT5: Faster Long-Range Transformers with Conditional Computation | [paper] | [code]

Role Playing

[2024/05/12] Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design | [paper] | [code]
[2024/05/10] LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play | [paper] | [code]
[2024/05/06] Large Language Models (LLMs) as Agents for Augmented Democracy | [paper] | [code]
[2024/05/02] GAIA: A General AI Assistant for Intelligent Accelerator Operations | [paper] | [code]
[2024/05/01] "Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time | [paper] | [code]
[2024/04/30] PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games | [paper] | [code]
[2024/04/30] Large Language Model Agent for Fake News Detection | [paper] | [code]
[2024/04/27] CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments | [paper] | [code]
[2024/04/26] Large Language Model Agent as a Mechanical Designer | [paper] | [code]
[2024/04/25] Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents | [paper] | [code]
[2024/04/22] How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO | [paper] | [code]
[2024/04/19] Cooperative Sentiment Agents for Multimodal Sentiment Analysis | [paper] | [code]
[2024/04/19] Towards Human-centered Proactive Conversational Agents | [paper] | [code]
[2024/04/13] LLMSat: A Large Language Model-Based Goal-Oriented Agent for Autonomous Space Exploration | [paper] | [code]
[2024/04/10] Apollonion: Profile-centric Dialog Agent | [paper] | [code]
[2024/04/09] SurveyAgent: A Conversational System for Personalized and Efficient Research Survey | [paper] | [code]
[2024/03/31] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model | [paper] | [code]
[2024/03/29] DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries | [paper] | [code]
[2024/03/23] EduAgent: Generative Student Agents in Learning | [paper] | [code]
[2024/03/22] CACA Agent: Capability Collaboration based AI Agent | [paper] | [code]
[2024/03/19] Characteristic AI Agents via Large Language Models | [paper] | [code]
[2024/03/15] VideoAgent: Long-form Video Understanding with Large Language Model as Agent | [paper] | [code]
[2024/03/05] ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary | [paper] | [code]
[2024/03/05] SimuCourt: Building Judicial Decision-Making Agents with Real-world Judgement Documents | [paper] | [code]
[2024/03/02] SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code | [paper] | [code]
[2024/02/29] On the Decision-Making Abilities in Role-Playing using Large Language Models | [paper] | [code]
[2024/02/28] Prospect Personalized Recommendation on Large Language Model-based Agent Platform | [paper] | [code]
[2024/02/28] Data Interpreter: An LLM Agent For Data Science | [paper] | [code]
[2024/02/27] BASES: Large-scale Web Search User Simulation with Large Language Model based Agents | [paper] | [code]
[2024/02/26] Language Agents as Optimizable Graphs | [paper] | [code]
[2024/02/26] Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation | [paper] | [code]
[2024/02/25] Understanding Public Perceptions of AI Conversational Agents: A Cross-Cultural Analysis | [paper] | [code]
[2024/02/25] Bootstrapping Cognitive Agents with a Large Language Model | [paper] | [code]
[2024/02/23] On the Multi-turn Instruction Following for Conversational Web Agents | [paper] | [code]
[2024/02/22] Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation | [paper] | [code]
[2024/02/21] Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent | [paper] | [code]
[2024/02/20] Can Large Language Models be Used to Provide Psychological Counselling? An Analysis of GPT-4-Generated Responses Using Role-play Dialogues | [paper] | [code]
[2024/02/20] Soft Self-Consistency Improves Language Model Agents | [paper] | [code]
[2024/02/20] CHATATC: Large Language Model-Driven Conversational Agents for Supporting Strategic Air Traffic Flow Management | [paper] | [code]
[2024/02/19] Polarization of Autonomous Generative AI Agents Under Echo Chambers | [paper] | [code]
[2024/02/19] LLM Agents for Psychology: A Study on Gamified Assessments | [paper] | [code]
[2024/02/19] Stick to your Role! Stability of Personal Values Expressed in Large Language Models | [paper] | [code]
[2024/02/19] WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment | [paper] | [code]
[2024/02/18] Modelling Political Coalition Negotiations Using LLM-based Agents | [paper] | [code]
[2024/02/17] Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents | [paper] | [code]
[2024/02/15] Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients | [paper] | [code]
[2024/02/13] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | [paper] | [code]
[2024/02/06] Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies | [paper] | [code]
[2024/02/06] Can Generative Agents Predict Emotion? | [paper] | [code]
[2024/02/05] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models | [paper] | [code]
[2024/02/05] GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models | [paper] | [code]
[2024/02/04] NavHint: Vision and Language Navigation Agent with a Hint Generator | [paper] | [code]
[2024/02/02] TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution | [paper] | [code]
[2024/02/01] Executable Code Actions Elicit Better LLM Agents | [paper] | [code]
[2024/01/31] LLMs Simulate Big Five Personality Traits: Further Evidence | [paper] | [code]
[2024/01/29] Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues | [paper] | [code]
[2024/01/09] Agent Alignment in Evolving Social Norms | [paper] | [code]
[2023/12/28] Experiential Co-Learning of Software-Developing Agents | [paper] | [code]
[2023/12/27] Automating Knowledge Acquisition for Content-Centric Cognitive Agents Using LLMs | [paper] | [code]
[2023/12/21] ChatGPT as a commenter to the news: can LLMs generate human-like opinions? | [paper] | [code]
[2023/12/19] Can ChatGPT be Your Personal Medical Assistant? | [paper] | [code]
[2023/12/06] LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem | [paper] | [code]
[2023/11/28] War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars | [paper] | [code]
[2023/11/23] Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach | [paper] | [code]
[2023/11/10] Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations | [paper] | [code]
[2023/10/01] RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models | [paper] | [code]
[2023/09/08] Unleashing the Power of Graph Learning through LLM-based Autonomous Agents | [paper] | [code]
[2023/09/05] Cognitive Architectures for Language Agents | [paper] | [code]
[2023/08/22] Towards an On-device Agent for Text Rewriting | [paper] | [code]
[2023/08/14] ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate | [paper] | [code]
[2023/08/10] LLM As DBA | [paper] | [code]
[2023/07/24] To Infinity and Beyond: SHOW-1 and Showrunner Agents in Multi-Agent Simulations | [paper] | [code]
[2023/06/28] Inferring the Goals of Communicating Agents from Actions and Instructions | [paper] | [code]
[2023/05/30] Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate | [paper] | [code]
[2023/05/27] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks | [paper] | [code]
[2023/05/26] Training Socially Aligned Language Models in Simulated Human Society | [paper] | [code]
[2023/05/25] Role-Play with Large Language Models | [paper] | [code]
[2023/05/17] Tree of Thoughts: Deliberate Problem Solving with Large Language Models | [paper] | [code]
[2023/05/09] TidyBot: Personalized Robot Assistance with Large Language Models | [paper] | [code]
[2023/05/02] The Role of Summarization in Generative Agents: A Preliminary Perspective | [paper] | [code]
[2023/04/26] Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | [paper] | [code]
[2023/04/24] ChatLLM Network: More brains, More intelligence | [paper] | [code]
[2023/04/21] Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback | [paper] | [code]
[2023/04/19] Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models | [paper] | [code]
[2023/04/15] Self-collaboration Code Generation via ChatGPT | [paper] | [code]
[2023/04/07] Generative Agents: Interactive Simulacra of Human Behavior | [paper] | [code]
[2023/03/31] CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society | [paper] | [code]
[2022/12/08] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | [paper] | [code]

Game Playing

[2024/05/23] Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication | [paper] | [code]
[2024/05/08] LLMs with Personalities in Multi-issue Negotiation Games | [paper] | [code]
[2024/04/03] Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game | [paper] | [code]
[2024/03/26] Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies | [paper] | [code]
[2024/02/19] LLM Agents for Psychology: A Study on Gamified Assessments | [paper] | [code]
[2024/02/13] Large Language Models as Minecraft Agents | [paper] | [code]
[2024/02/12] Large Language Models as Agents in Two-Player Games | [paper] | [code]
[2024/02/07] Can Large Language Model Agents Simulate Human Trust Behaviors? | [paper] | [code]
[2024/02/04] Enhance Reasoning for Large Language Models in the Game Werewolf | [paper] | [code]
[2024/02/02] PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language Models | [paper] | [code]
[2023/12/29] Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game | [paper] | [code]
[2023/11/10] JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models | [paper] | [code]
[2023/10/31] Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models | [paper] | [code]
[2023/09/29] Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4 | [paper] | [code]
[2023/09/18] MindAgent: Emergent Gaming Interaction | [paper] | [code]
[2023/09/10] An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents | [paper] | [code]
[2023/09/09] Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf | [paper] | [code]
[2023/08/23] Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis | [paper] | [code]
[2023/05/31] Recursive Metropolis-Hastings Naming Game: Symbol Emergence in a Multi-agent System based on Probabilistic Generative Models | [paper] | [code]
[2023/05/26] Playing repeated games with Large Language Models | [paper] | [code]
[2023/05/25] Voyager: An Open-Ended Embodied Agent with Large Language Models | [paper] | [code]
[2023/05/25] Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory | [paper] | [code]
[2023/05/19] Examining the Inter-Consistency of Large Language Models: An In-depth Analysis via Debate | [paper] | [code]
[2023/05/17] Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback | [paper] | [code]
[2023/05/08] Knowledge-enhanced Agents for Interactive Text Games | [paper] | [code]
[2023/03/29] Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks | [paper] | [code]
[2023/02/03] Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | [paper] | [code]

Tool Usage&Human-Agent Interaction

[2024/05/23] Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication | [paper] | [code]
[2024/05/17] Latent State Estimation Helps UI Agents to Reason | [paper] | [code]
[2024/05/02] CACTUS: Chemistry Agent Connecting Tool-Usage to Science | [paper] | [code]
[2024/05/01] Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning | [paper] | [code]
[2024/05/01] "Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time | [paper] | [code]
[2024/04/23] Aligning LLM Agents by Learning Latent Preference from User Edits | [paper] | [code]
[2024/04/16] Search Beyond Queries: Training Smaller Language Models for Web Interactions via Reinforcement Learning | [paper] | [code]
[2024/04/09] SurveyAgent: A Conversational System for Personalized and Efficient Research Survey | [paper] | [code]
[2024/04/04] AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent | [paper] | [code]
[2024/03/12] AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production | [paper] | [code]
[2024/03/05] InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | [paper] | [code]
[2024/03/05] Android in the Zoo: Chain-of-Action-Thought for GUI Agents | [paper] | [code]
[2024/02/27] BASES: Large-scale Web Search User Simulation with Large Language Model based Agents | [paper] | [code]
[2024/02/26] Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models | [paper] | [code]
[2024/02/23] On the Multi-turn Instruction Following for Conversational Web Agents | [paper] | [code]
[2024/02/20] Large Language Model-based Human-Agent Collaboration for Complex Task Solving | [paper] | [code]
[2024/02/20] AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning | [paper] | [code]
[2024/02/18] SciAgent: Tool-augmented Language Models for Scientific Reasoning | [paper] | [code]
[2024/02/18] Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models | [paper] | [code]
[2024/02/17] Human-AI Interactions in the Communication Era: Autophagy Makes Large Models Achieving Local Optima | [paper] | [code]
[2024/02/16] ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages | [paper] | [code]
[2024/02/14] Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications | [paper] | [code]
[2024/02/09] CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models | [paper] | [code]
[2024/02/08] UFO: A UI-Focused Agent for Windows OS Interaction | [paper] | [code]
[2024/02/06] AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | [paper] | [code]
[2024/01/11] EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction | [paper] | [code]
[2024/01/03] GPT-4V(ision) is a Generalist Web Agent, if Grounded | [paper] | [code]
[2023/12/21] Team Flow at DRC2023: Building Common Ground and Text-based Turn-taking in a Travel Agent Spoken Dialogue System | [paper] | [code]
[2023/12/21] AppAgent: Multimodal Agents as Smartphone Users | [paper] | [code]
[2023/12/18] CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update | [paper] | [code]
[2023/12/14] CogAgent: A Visual Language Model for GUI Agents | [paper] | [code]
[2023/11/19] TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | [paper] | [code]
[2023/10/18] MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models | [paper] | [code]
[2023/10/13] AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems | [paper] | [code]
[2023/10/12] A Zero-Shot Language Agent for Computer Control with Structured Reflection | [paper] | [code]
[2023/09/02] ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models | [paper] | [code]
[2023/08/07] TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents | [paper] | [code]
[2023/06/05] When Large Language Model based Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm | [paper] | [code]

Benchmark&Evaluation

[2024/05/23] ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation | [paper] | [code]
[2024/05/23] AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents | [paper] | [code]
[2024/05/16] Speaker Verification in Agent-Generated Conversations | [paper] | [code]
[2024/05/13] AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | [paper] | [code]
[2024/05/01] WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting | [paper] | [code]
[2024/04/23] Evaluating Tool-Augmented Agents in Remote Sensing Platforms | [paper] | [code]
[2024/04/15] MMInA: Benchmarking Multihop Multimodal Internet Agents | [paper] | [code]
[2024/04/11] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments | [paper] | [code]
[2024/04/09] AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | [paper] | [code]
[2024/03/20] RoleInteract: Evaluating the Social Interaction of Role-Playing Agents | [paper] | [code]
[2024/03/18] How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments | [paper] | [code]
[2024/03/18] Tur[k]ingBench: A Challenge Benchmark for Web Agents | [paper] | [code]
[2024/03/13] Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation | [paper] | [code]
[2024/03/05] InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | [paper] | [code]
[2024/02/27] Benchmarking Data Science Agents | [paper] | [code]
[2024/02/27] OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web | [paper] | [code]
[2024/02/18] Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation | [paper] | [code]
[2024/02/18] MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization | [paper] | [code]
[2024/01/02] CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation | [paper] | [code]
[2023/12/28] How Far Are We from Believable AI Agents? A Framework for Evaluating the Believability of Human Behavior Simulation | [paper] | [code]
[2023/12/26] RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models | [paper] | [code]
[2023/11/17] Testing Language Model Agents Safely in the Wild | [paper] | [code]
[2023/11/16] ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks | [paper] | [code]
[2023/11/15] ToolTalk: Evaluating Tool-Usage in a Conversational Setting | [paper] | [code]
[2023/10/24] FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions | [paper] | [code]
[2023/10/09] Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena | [paper] | [code]
[2023/10/02] SmartPlay : A Benchmark for LLMs as Intelligent Agents | [paper] | [code]
[2023/09/18] MindAgent: Emergent Gaming Interaction | [paper] | [code]
[2023/08/11] BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents | [paper] | [code]
[2023/08/07] AgentBench: Evaluating LLMs as Agents | [paper] | [code]
[2023/07/31] HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution | [paper] | [code]

Environment&Platform

[2024/05/23] AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents | [paper] | [code]
[2024/04/01] Rapid Mobile App Development for Generative AI Agents on MIT App Inventor | [paper] | [code]
[2024/03/28] MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs | [paper] | [code]
[2024/03/26] Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies | [paper] | [code]
[2023/03/14] CB2: Collaborative Natural Language Interaction Research Platform | [paper] | [code]

Agent Framework

[2024/04/11] Behavior Trees Enable Structured Programming of Language Model Agents | [paper] | [code]
[2024/04/05] Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents | [paper] | [code]
[2024/03/29] ITCMA: A Generative Agent Based on a Computational Consciousness Structure | [paper] | [code]
[2024/03/18] QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction | [paper] | [code]
[2024/02/26] RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation | [paper] | [code]
[2024/02/26] Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering | [paper] | [code]
[2024/02/22] Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering | [paper] | [code]
[2024/02/17] KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph | [paper] | [code]
[2024/01/05] AFSPP: Agent Framework for Shaping Preference and Personality with Large Language Models | [paper] | [code]
[2023/11/02] ProAgent: From Robotic Process Automation to Agentic Process Automation | [paper] | [code]
[2023/09/29] Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency | [paper] | [code]
[2023/09/14] Agents: An Open-source Framework for Autonomous Language Agents | [paper] | [code]
[2023/08/22] ProAgent: Building Proactive Cooperative AI with Large Language Models | [paper] | [code]
[2023/06/09] Mind2Web: Towards a Generalist Agent for the Web | [paper] | [code]

Multi-Agent System

[2024/05/23] CityGPT: Towards Urban IoT Learning, Analysis and Interaction with Multi-Agent System | [paper] | [code]
[2024/05/20] (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts | [paper] | [code]
[2024/05/17] LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions | [paper] | [code]
[2024/05/07] Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework | [paper] | [code]
[2024/05/06] Conformity, Confabulation, and Impersonation: Persona Inconstancy in Multi-Agent LLM Collaboration | [paper] | [code]
[2024/05/05] Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation | [paper] | [code]
[2024/04/28] ComposerX: Multi-Agent Symbolic Music Composition with LLMs | [paper] | [code]
[2024/04/25] Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents | [paper] | [code]
[2024/04/23] BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis | [paper] | [code]
[2024/04/23] CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning | [paper] | [code]
[2024/04/14] Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation | [paper] | [code]
[2024/04/12] Leveraging Multi-AI Agents for Cross-Domain Knowledge Discovery | [paper] | [code]
[2024/04/10] MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education | [paper] | [code]
[2024/04/09] Large Language Models to the Rescue: Deadlock Resolution in Multi-Robot Systems | [paper] | [code]
[2024/04/08] 360{\deg}REA: Towards A Reusable Experience Accumulation with 360{\deg} Assessment for Multi-Agent System | [paper] | [code]
[2024/04/06] MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems | [paper] | [code]
[2024/04/03] Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game | [paper] | [code]
[2024/04/02] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization | [paper] | [code]
[2024/04/02] CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models | [paper] | [code]
[2024/04/01] TraveLER: A Multi-LMM Agent Framework for Video Question-Answering | [paper] | [code]
[2024/03/28] MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation | [paper] | [code]
[2024/03/26] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution | [paper] | [code]
[2024/03/21] Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering | [paper] | [code]
[2024/03/20] Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior | [paper] | [code]
[2024/03/19] Embodied LLM Agents Learn to Cooperate in Organized Teams | [paper] | [code]
[2024/03/18] How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments | [paper] | [code]
[2024/03/12] Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations | [paper] | [code]
[2024/03/02] AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks | [paper] | [code]
[2024/02/28] Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key? | [paper] | [code]
[2024/02/26] Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation | [paper] | [code]
[2024/02/26] Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering | [paper] | [code]
[2024/02/26] LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments | [paper] | [code]
[2024/02/21] LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain | [paper] | [code]
[2024/02/20] What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents | [paper] | [code]
[2024/02/18] Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation | [paper] | [code]
[2024/02/18] LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration | [paper] | [code]
[2024/02/15] TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation | [paper] | [code]
[2024/02/03] More Agents Is All You Need | [paper] | [code]
[2024/02/02] A Multi-Agent Conversational Recommender System | [paper] | [code]
[2024/01/27] ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning | [paper] | [code]
[2024/01/11] Combating Adversarial Attacks with Multi-Agent Debate | [paper] | [code]
[2024/01/08] MARG: Multi-Agent Review Generation for Scientific Papers | [paper] | [code]
[2024/01/08] SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems | [paper] | [code]
[2024/01/08] Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet | [paper] | [code]
[2023/12/20] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation | [paper] | [code]
[2023/12/01] Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games | [paper] | [code]
[2023/10/31] Multi-Agent Consensus Seeking via Large Language Models | [paper] | [code]
[2023/10/25] MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning | [paper] | [code]
[2023/10/10] MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents | [paper] | [code]
[2023/10/03] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View | [paper] | [code]
[2023/09/22] Learning to Coordinate with Anyone | [paper] | [code]
[2023/09/18] MindAgent: Emergent Gaming Interaction | [paper] | [code]
[2023/08/21] AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents | [paper] | [code]
[2023/08/03] InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent | [paper] | [code]
[2023/08/01] MetaGPT: Meta Programming for Multi-Agent Collaborative Framework | [paper] | [code]
[2023/07/16] Communicative Agents for Software Development | [paper] | [code]
[2023/07/11] Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration | [paper] | [code]
[2023/07/05] Building Cooperative Embodied Agents Modularly with Large Language Models | [paper] | [code]
[2023/06/05] Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents | [paper] | [code]

Agent Fine-tuning

[2024/05/16] Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning | [paper] | [code]
[2024/05/01] Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning | [paper] | [code]
[2024/04/17] Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent | [paper] | [code]
[2024/04/16] Search Beyond Queries: Training Smaller Language Models for Web Interactions via Reinforcement Learning | [paper] | [code]
[2024/04/05] Social Skill Training with Large Language Models | [paper] | [code]
[2024/04/02] CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models | [paper] | [code]
[2024/03/29] Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning | [paper] | [code]
[2024/03/21] ReAct Meets ActRe: Autonomous Annotation of Agent Trajectories for Contrastive Self-Training | [paper] | [code]
[2024/03/19] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | [paper] | [code]
[2024/03/18] EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents | [paper] | [code]
[2024/02/23] AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning | [paper] | [code]
[2024/02/21] Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent | [paper] | [code]
[2024/02/19] A Critical Evaluation of AI Feedback for Aligning Large Language Models | [paper] | [code]
[2024/02/18] Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents | [paper] | [code]
[2024/02/17] Training Language Model Agents without Modifying Language Models | [paper] | [code]
[2024/01/10] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training | [paper] | [code]
[2024/01/10] Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk | [paper] | [code]
[2024/01/05] From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models | [paper] | [code]
[2023/12/22] Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning | [paper] | [code]
[2023/12/20] Machine Mindset: An MBTI Exploration of Large Language Models | [paper] | [code]
[2023/11/28] Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld | [paper] | [code]
[2023/10/19] AgentTuning: Enabling Generalized Agent Abilities for LLMs | [paper] | [code]
[2023/10/09] FireAct: Toward Language Agent Fine-tuning | [paper] | [code]
[2023/10/01] Adapting LLM Agents Through Communication | [paper] | [code]

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
papers		papers
parsed		parsed
parsed_v4		parsed_v4
.gitignore		.gitignore
README.md		README.md
config.json		config.json
download_pdf.py		download_pdf.py
papers_v4.json		papers_v4.json
script_v3.py		script_v3.py
script_v4_step1.py		script_v4_step1.py
script_v4_step2.py		script_v4_step2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Agents-Papers

✍️ Description

💛 Recommendation

📰 Papers

Survey

Planning

Feedback&Reflection

Memory Mechanism

Role Playing

Game Playing

Tool Usage&Human-Agent Interaction

Benchmark&Evaluation

Environment&Platform

Agent Framework

Multi-Agent System

Agent Fine-tuning

Others

⭐ Star History

About

Releases

Packages

Languages

tiandiao123/LLM-Agents-Papers

Folders and files

Latest commit

History

Repository files navigation

LLM-Agents-Papers

✍️ Description

💛 Recommendation

📰 Papers

Survey

Planning

Feedback&Reflection

Memory Mechanism

Role Playing

Game Playing

Tool Usage&Human-Agent Interaction

Benchmark&Evaluation

Environment&Platform

Agent Framework

Multi-Agent System

Agent Fine-tuning

Others

⭐ Star History

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages