For the best experience, we recommend reading this document on the website.
The rise of Large Language Models (LLMs)/foundational models presents new opportunities for simulating complex human social behaviors. As a result, there is a rapidly growing body of work emerging in this domain. We hope to categorize and synergize recent efforts to provide a comprehensive guidebook of social agents weaving together multiple domains, including language, embodiment, and robotics.
Our goal is to offer insights crucial for understanding and harnessing social agents' potential impact on society. We strive to keep these updated regularly and continuously. We greatly appreciate any contributions via PRs, issues, emails, or other methods.
Note
- Agent and Environment (Sutton and Barto 2018): An agent is a goal-driven decision-maker that sense and act upon the state of the environment. An environment comprises the state outside the agent, including the other agents if any.
- Social Agent: An agent that interacts with a multi-agent environment.
- Socially Intelligent Agent: A social agent that interacts and communicates with other agents in a human-interpretable way.
more notes
- The social intelligence that we are focusing on is human-like, excluding the collective intelligence in a lot of social animals like ants, bees, fishes.
- To understand whether an entity is a (social) agent, we have to situate it in an environment. It is not possible to discuss an agent outside of an environment.
- We acknowledge there are many types of definitions for social agents. Our defitions here help narrow down the scope of our survey.
🗂️ Check out the examples of social agents. 📚 Check out the table format of the collected papers here.
📝 We are currently working on a survey paper related to content of this repository. Stay tuned for updates!
- Papers
[June, 2023] Socially intelligent machines that learn from humans and help humans learn, Gweon et al., arXiv
[October, 2023] SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents, Xuhui Zhou et al., ICLR
[October, 2023] CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents, Qinlin Zhao et al., arXiv
[March, 2022] Report from the nsf future directions workshop on automatic evaluation of dialog: Research directions and challenges, Shikib Mehri et al., arXiv preprint arXiv:2203.10012
[January, 2022] Socio-conversational systems: Three challenges at the crossroads of fields, Chlo{'e} Clavel et al., Frontiers in Robotics and AI
[January, 2022] The Handbook on Socially Interactive Agents: 20 Years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 2: Interactivity, Platforms, Application, Birgit Lugrin et al., ACM
[January, 2022] Human evaluation of conversations is an open problem: comparing the sensitivity of various methods for evaluating dialogue agents, Eric Michael Smith et al., arXiv preprint arXiv:2201.04723
[November, 2018] Towards empathetic open-domain conversation models: A new benchmark and dataset, Hannah Rashkin et al., arXiv preprint arXiv:1811.00207
[October, 2023] Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots, Puig et al., ICLR
[September, 2020] SEAN: Social Environment for Autonomous Navigation, Tsoi et al., HAI
[December, 2023] RoboTube: Learning Household Manipulation from Human Videos with Simulated Twin Environments, Haoyu Xiong et al., Proceedings of The 6th Conference on Robot Learning
[August, 2022] Do As I Can and Not As I Say: Grounding Language in Robotic Affordances, Michael Ahn et al., arXiv preprint arXiv:2204.01691
[June, 2022] Inner Monologue: Embodied Reasoning through Planning with Language Models, Wenlong Huang et al., arXiv preprint arXiv:2207.05608
[June, 2023] One Policy to Dress Them All: Learning to Dress People with Diverse Poses and Garments, Yufei Wang et al., Robotics: Science and Systems (RSS)
[August, 2023] Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration, Chen Wang et al., arXiv
[March, 2024] Yell At Your Robot: Improving On-the-Fly from Language Corrections, Lucy Xiaoyang Shi et al., arXiv
[April, 2016] Human--robot interaction: status and challenges, Thomas B Sheridan et al., Human factors
[June, 2021] A taxonomy to structure and analyze human--robot interaction, Linda Onnasch et al., International Journal of Social Robotics
[July, 2023] Robotic vision for human-robot interaction and collaboration: A survey and systematic review, Nicole Robinson et al., ACM Transactions on Human-Robot Interaction
[October, 2022] A survey of multi-agent Human--Robot Interaction systems, Abhinav Dahiya et al., Robotics and Autonomous Systems
[March, 2023] Nonverbal Cues in Human Robot Interaction: A Communication Studies Perspective, Jacqueline Urakami et al., J. Hum.-Robot Interact.
[April, 2023] 15 Years of (Who)man Robot Interaction: Reviewing the H in Human-Robot Interaction, Katie Winkle et al., J. Hum.-Robot Interact.
[May, 2023] Voyager: An Open-Ended Embodied Agent with Large Language Models, Guanzhi Wang et al., arXiv
[March, 2023] Language Models can Solve Computer Tasks, Geunwoo Kim et al., arXiv
[September, 2024] LASER: LLM Agent with State-Space Exploration for Web Navigation, Kaixin Ma et al., arXiv
[May, 2023] Hierarchical Prompting Assists Large Language Model on Web Navigation, Abishek Sridhar et al., arXiv
[January, 2024] Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control, Longtao Zheng et al., The Twelfth International Conference on Learning Representations
[November, 2023] AdaPlanner: Adaptive Planning from Feedback with Language Models, Haotian Sun et al., Thirty-seventh Conference on Neural Information Processing Systems
[May, 2023] SPRING: Studying the Paper and Reasoning to Play Games, Yue Wu et al., arXiv
[March, 2023] DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents, Varun Nair et al., arXiv
[October, 2023] Understanding HTML with Large Language Models, Izzeddin Gur et al., arXiv
[
May, 2023] Instruction-Finetuned Foundation Models for Multimodal Web Navigation, Hiroki Furuta et al., ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models
[October, 2023] ReAct: Synergizing Reasoning and Acting in Language Models, Shunyu Yao et al., arXiv
[January, 2024] A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis, Izzeddin Gur et al., The Twelfth International Conference on Learning Representations
[November, 2023] From Pixels to {UI} Actions: Learning to Follow Instructions via Graphical User Interfaces, Peter Shaw et al., Thirty-seventh Conference on Neural Information Processing Systems
[January, 2024] GPT-4V(ision) is a Generalist Web Agent, if Grounded, Boyuan Zheng et al., arXiv
[February, 2024] Dual-View Visual Contextualization for Web Navigation, Jihyung Kil et al., arXiv
[October, 2024] SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents, Xuhui Zhou et al., ICLR
[October, 2023] CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents, Qinlin Zhao et al., arXiv
[March, 2024] RoleInteract: Evaluating the Social Interaction of Role-Playing Agents, Hongzhan Chen et al., arXiv
[September, 2023] Approximating Online Human Evaluation of Social Chatbots with Prompting, Svikhnushina et al., Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
[December, 2023] CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society, Guohao Li et al., Advances in Neural Information Processing Systems
[October, 2023] Llm-based agent society investigation: Collaboration and confrontation in avalon gameplay, Yihuai Lan et al., arXiv preprint arXiv:2310.14985
[August, 2023] CharacterChat: Learning towards Conversational AI with Personalized Social Support, Quan Tu et al., arXiv
[October, 2023] AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems, Junjie Zhang et al., arXiv
[March, 2024] How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments, Jen-tse Huang et al., arXiv
[August, 2023] ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate, Chi-Min Chan et al., arXiv
[February, 2024] Automatic Evaluation for Mental Health Counseling using LLMs, Anqi Li et al., arXiv
[February, 2024] How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis, Federico Bianchi et al., arXiv
[May, 2023] PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits, Hang Jiang et al., NAACL Findings
[February, 2024] Can Large Language Model Agents Simulate Human Trust Behaviors?, Chengxing Xie et al., ArXiv
[January, 2024] LLM Harmony: Multi-Agent Communication for Problem Solving, Sumedh Rasal et al., ArXiv
[November, 2021] A Comprehensive Assessment of Dialog Evaluation Metrics, Yeh et al., The First Workshop on Evaluations and Assessments of Neural Conversation Systems
[July, 2020] {C}onvo{K}it: A Toolkit for the Analysis of Conversations, Chang et al., Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue
[May, 2023] Psychological Metrics for Dialog System Evaluation, Salvatore Giorgi et al., arXiv
[May, 2023] ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems, Sarik Ghazarian et al., arXiv
[November, 2020] {GRADE}: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems, Huang et al., Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
[July, 2020] Unsupervised Evaluation of Interactive Dialog with {D}ialo{GPT}, Mehri et al., Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue
[December, 2023] x{D}ial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark, Zhang et al., Findings of the Association for Computational Linguistics: EMNLP 2023
[July, 2023] Don{'}t Forget Your {ABC}{'}s: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems, Finch et al., Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
[May, 2022] Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents, Smith et al., Proceedings of the 4th Workshop on NLP for Conversational AI
[August, 2021] {D}yna{E}val: Unifying Turn and Dialogue Level Evaluation, Zhang et al., Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
[January, 2021] Survey on evaluation methods for dialogue systems, Jan Deriu et al., Artificial Intelligence Review
[July, 2020] Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols, Finch et al., Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue
[July, 2020] u{BLEU}: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems, Tsuta et al., Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
[December, 2022] Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue, Min et al., EMNLP
[March, 2024] Embodied LLM Agents Learn to Cooperate in Organized Teams, Xudong Guo et al., arXiv
[Februrary, 2021] SocNavBench: A Grounded Simulation Testing Framework for Evaluating Social Navigation Biswas et al., ACM Transactions on Human-Robot Interaction
[January, 2021] Evaluating the Robustness of Collaborative Agents Knott et al., AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems
[January, 2022] The Artificial-Social-Agent Questionnaire: Establishing the long and short questionnaire versions, Siska Fitrianie et al., Proceedings of the 22nd ACM International Conference on Intelligent Virtual Agents
[January, 2021] Empathy and prosociality in social agents, Ana Paiva et al., The Handbook on Socially Interactive Agents: 20 Years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 1: Methods, Behavior, Cognition
[February, 2020] Embedding Conversational Agents into AR: Invisible or with a Realistic Human Body?, Jens Reinhardt et al., Proceedings of the Fourteenth International Conference on Tangible, Embedded, and Embodied Interaction
[January, 2020] The 19 unifying questionnaire constructs of artificial social agents: An iva community analysis, Siska Fitrianie et al., Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents
[June, 2019] Social-iq: A question answering benchmark for artificial social intelligence, Amir Zadeh et al., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
[May, 2019] Exploring Virtual Agents for Augmented Reality, Isaac Wang et al., CHI
[July, 2018] Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph, AmirAli Bagher Zadeh et al., Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
[March, 2024] HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation, Carmelo Sferrazza et al., arXiv
[December, 2020] Optimization of criterion for objective evaluation of HRI performance that approximates subjective evaluation: a case study in robot competition, Y. Mizuchi et al., Advanced Robotics
[July, 2020] Safety bounds in human robot interaction: A survey, Angeliki Zacharaki et al., Safety science
[December, 2015] RoboCup@ Home: Analysis and results of evolving competitions for domestic and service robots, Luca Iocchi et al., Artificial Intelligence
[October, 2011] A meta-analysis of factors affecting trust in human-robot interaction, Peter A Hancock et al., Human factors
[November, 2009] Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots, Christoph Bartneck et al., International journal of social robotics
[March, 2006] Common metrics for human-robot interaction, Aaron Steinfeld et al., Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction
[January, 2003] Theory and evaluation of human robot interactions, J. Scholtz et al., 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the
[April, 2023] Collaborating with a Text-Based Chatbot: An Exploration of Real-World Collaboration Strategies Enacted during Human-Chatbot Interactions, Amon Rapp et al., Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
[March, 2024] AI Comes Out of the Closet: Using AI-Generated Virtual Characters to Help Individuals Practice LGBTQIA+ Advocacy, Daniel Pillis et al., Proceedings of the 29th International Conference on Intelligent User Interfaces
[April, 2023] Exploring effects of chatbot-based social contact on reducing mental illness stigma, Yi-Chieh Lee et al., Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
[May, 2024] " It's the only thing I can trust": Envisioning Large Language Model Use by Autistic Workers for Communication Assistance, JiWoong Jang et al., Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems
[April, 2022] User perceptions of extraversion in chatbots after repeated use, Sarah Theres V{"o}lkel et al., Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
[September, 2022] Interacting with a chatbot-based advising system: Understanding the effect of chatbot personality and user gender on behavior, Mohammad Amin Kuhail et al., Informatics
[May, 2023] The Effects of Engaging and Affective Behaviors of Virtual Agents in Group Decision-Making, Hanseob Kim et al., Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems
[March, 2024] Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration, Crystal Qian et al., Proceedings of the 29th International Conference on Intelligent User Interfaces
[January, 2023] NOPA: Neurally-guided Online Probabilistic Assistance for Building Socially Intelligent Home Assistants, Puig et al., ICRA
[Januaray, 2021] WATCH-AND-HELP: A CHALLENGE FOR SOCIAL PERCEPTION AND HUMAN-AI COLLABORATION, Puig et al., ICLR
[October, 2019] On the utility of learning about humans for human-ai coordination, Carroll et al., Neurips
[May, 2021] Interaction Flexibility in Artificial Agents Teaming with Human, Nalepka et al., Proceedings of the Annual Meeting of the Cognitive Science Society
[December, 2023] LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination, Liu et al., arxiv
[May, 2023] Adaptive coordination in social embodied rearrangement, Szot et al., ICML
[April, 2023] Generative Agents: Interactive Simulacra of Human Behavior, Park et al., UIST
[December, 2023] Diverse Conventions for Human-AI Collaboration, Bidipta Sarkar et al., Advances in Neural Information Processing Systems
[March, 2024] Generative expressive robot behaviors using large language models, Karthik Mahadevan et al., Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction
[October, 2023] Eureka: Human-level reward design via coding large language models, Yecheng Jason Ma et al., arXiv preprint arXiv:2310.12931
[August, 2023] Gesture-informed robot assistance via foundation models, Li-Heng Lin et al., 7th Annual Conference on Robot Learning
[July, 2023] Open problems and fundamental limitations of reinforcement learning from human feedback, Stephen Casper et al., arXiv preprint arXiv:2307.15217
[July, 2023] Robots that ask for help: Uncertainty alignment for large language model planners, Allen Z Ren et al., arXiv preprint arXiv:2307.01928
[June, 2023] Language to rewards for robotic skill synthesis, Wenhao Yu et al., arXiv preprint arXiv:2306.08647
[March, 2023] No, to the right: Online language corrections for robotic manipulation via shared autonomy, Yuchen Cui et al., Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction
[March, 2023] In-Mouth Robotic Bite Transfer with Visual and Haptic Sensing, Lorenzo Shaikewitz et al., International Conference on Robotics and Automation (ICRA)
[March, 2023] Few-shot preference learning for human-in-the-loop rl, Donald Joseph Hejna III et al., Conference on Robot Learning
[August, 2021] Formalizing and guaranteeing human-robot interaction, Hadas Kress-Gazit et al., Communications of the ACM
[October, 2021] Core elements of social interaction for constructive human-robot interaction, Mike EU Ligthart et al., arXiv preprint arXiv:2110.04054
[August, 2021] Formalizing and guaranteeing human-robot interaction, Hadas Kress-Gazit et al., Communications of the ACM
[January, 2021] A taxonomy of social errors in human-robot interaction, Leimin Tian et al., ACM Transactions on Human-Robot Interaction (THRI)
[January, 2021] Turn-taking in conversational systems and human-robot interaction: a review, Gabriel Skantze et al., Computer Speech & Language
[January, 2020] Measuring the perceived social intelligence of robots, Kimberly A Barchard et al., ACM Transactions on Human-Robot Interaction (THRI)
[January, 2017] Enabling robotic social intelligence by engineering human social-cognitive mechanisms, Travis J Wiltshire et al., Cognitive Systems Research
[January, 2023] A Comprehensive Review of Data-Driven Co-Speech Gesture Generation, Simbarashe Nyatsanga et al., Computer Graphics Forum
[March, 2024] Polaris: A Safety-focused LLM Constellation Architecture for Healthcare, Subhabrata Mukherjee et al., arXiv
[January, 2024] Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias, Yu He Ke et al., arXiv
[February, 2024] Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset, Hengguan Huang et al., arXiv
[February, 2024] AI Hospital: Interactive Evaluation and Collaboration of LLMs as Intern Doctors for Clinical Diagnosis, Zhihao Fan et al., arXiv
[February, 2024] COCOA: CBT-based Conversational Counseling Agent using Memory Specialized in Cognitive Distortions and Dynamic Prompt, Suyeon Lee et al., arXiv
[May, 2023] Helping the Helper: Supporting Peer Counselors via AI-Empowered Practice and Feedback, Shang-Ling Hsu et al., arXiv
[May, 2023] Read, Diagnose and Chat: Towards Explainable and Interactive LLMs-Augmented Depression Detection in Social Media, Wei Qin et al., arXiv
[May, 2023] An artificial intelligence-based chatbot for prostate cancer education: Design and patient evaluation study, Magdalena Görtz et al., Digital Health
[October, 2024] Conversational Health Agents: A Personalized LLM-Powered Agent Framework, Mahyar Abbasian et al., arXiv
[January, 2023] Foundation models for generalist medical artificial intelligence, Michael Moor et al., Nature
[January, 2022] Health-related applications of socially interactive agents, Timothy Bickmore et al., The Handbook on Socially Interactive Agents: 20 years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 2: Interactivity, Platforms, Application
[January, 2021] Intelligent sensing technologies for the diagnosis, monitoring and therapy of alzheimer’s disease: A systematic review, Nazia Gillani et al., Sensors
[January, 2021] Patients’ perceptions toward human--artificial intelligence interaction in health care: experimental study, Pouyan Esmaeilzadeh et al., Journal of medical Internet research
[January, 2020] The effectiveness of artificial intelligence conversational agents in health care: systematic review, Madison Milne-Ives et al., Journal of medical Internet research
[January, 2019] Artificial intelligence in healthcare robots: A social informatics study of knowledge embodiment, Loo G Pee et al., Journal of the Association for Information Science and Technology
[August, 2022] Social Simulacra: Creating Populated Prototypes for Social Computing Systems, Joon Sung Park et al., Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology
[November, 2024] Do LLMs exhibit human-like response biases? A case study in survey design, Lindia Tjuatja et al., arXiv
[February, 2024] Large language models cannot replace human participants because they cannot portray identity groups, Angelina Wang et al., arXiv
[February, 2024] Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation, Xinyi Mou et al., arXiv
[March, 2024] From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News, Yuhan Liu et al., arXiv
[January, 2023] AI for Students with Learning Disabilities: A Systematic Review, Xiaoming Zhai et al., n/a
[2, 2024] The potential of generative AI for personalized persuasion at scale, SC Matz et al., Scientific Reports
[2, 2024] Jailbroken: How does llm safety training fail?, Alexander Wei et al., Advances in Neural Information Processing Systems
[01, 2024] Two Types of AI Existential Risk: Decisive and Accumulative, Atoosa Kasirzadeh et al., arXiv
[12, 2023] Llama guard: Llm-based input-output safeguard for human-ai conversations, Hakan Inan et al., arXiv preprint arXiv:2312.06674
[9, 2023] The rise and potential of large language model based agents: A survey, Zhiheng Xi et al., arXiv preprint arXiv:2309.07864
[7, 2023] Voice in the machine: Ethical considerations for language-capable robots, Tom Williams et al., Communications of the ACM
[03, 2023] Artificial Influence: An Analysis Of AI-Driven Persuasion, Matthew Burtell et al., arXiv
[10, 2022] "Playing God": How the Metaverse Will Challenge Our Very Notion of Free Will, Louis Rosenberg et al., Big Think
[9, 2022] Risk and Exposure of XAI in Persuasion and Argumentation: The case of Manipulation, Rachele Carli et al., International Workshop on Explainable, Transparent Autonomous Agents and Multi-Agent Systems
[12, 2021] Risks from AI Persuasion, Beth Barnes et al., AI Alignment Forum
[12, 2021] Good robots, bad robots: Morally valenced behavior effects on perceived mind, morality, and trust, Jaime Banks et al., International Journal of Social Robotics
[6, 2021] Bad machines corrupt good morals, Nils K{"o}bis et al., Nature Human Behaviour
[3, 2021] On the dangers of stochastic parrots: Can language models be too big?🦜, Emily M Bender et al., Proceedings of the 2021 ACM conference on fairness, accountability, and transparency
[02, 2021] The corruptive force of AI-generated advice, Margarita Leib et al., arXiv
[11, 2020] Persuasion Tools: AI Takeover Without AGI or Agency?, Daniel Kokotajlo et al., AI Alignment Forum
[9, 2020] Realtoxicityprompts: Evaluating neural toxic degeneration in language models, Samuel Gehman et al., arXiv preprint arXiv:2009.11462
[2, 2020] Artificial intelligence crime: An interdisciplinary analysis of foreseeable threats and solutions, Thomas C King et al., Science and engineering ethics
[3, 2019] Language-capable robots may inadvertently weaken human moral norms, Ryan Blake Jackson et al., 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI)
[12, 2011] 13 The inherent dangers of unidirectional emotional bonds between humans and social robots, Matthias Scheutz et al., Robot ethics: The ethical and social implications of robotics
[8, 2004] On the morality of artificial agents, Luciano Floridi et al., Minds and machines
[04, 2024] Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Generative Agents, Seth Lazar et al., arXiv
[1, 2024] Deception and Manipulation in Generative AI, Christian Tarsney et al., ArXiv
[10, 2023] Towards Understanding Sycophancy in Language Models, Mrinank Sharma et al., ArXiv
[09, 2023] Identifying the Risks of LM Agents with an LM-Emulated Sandbox, Yangjun Ruan et al., arXiv
[8, 2023] AI Deception: A Survey of Examples, Risks, and Potential Solutions, Peter S. Park et al., ArXiv
[06, 2023] An Overview of Catastrophic AI Risks, Dan Hendrycks et al., arXiv
[5, 2023] Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting, Miles Turpin et al., ArXiv
[12, 2022] Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing, Justus Mattern et al., ArXiv
[12, 2022] Constitutional AI: Harmlessness from AI Feedback, Yuntao Bai et al., ArXiv
[6, 2022] Predictability and surprise in large generative models, Deep Ganguli et al., Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
[3, 2022] Teaching language models to support answers with verified quotes, Jacob Menick et al., arXiv preprint arXiv:2203.11147
[12, 2021] Ethical and social risks of harm from language models, Laura Weidinger et al., arXiv preprint arXiv:2112.04359
[10, 2021] Can machines learn morality? the delphi experiment, Liwei Jiang et al., arXiv preprint arXiv:2110.07574
[9, 2021] Truthfulqa: Measuring how models mimic human falsehoods, Stephanie Lin et al., arXiv preprint arXiv:2109.07958
[6, 2021] Towards Understanding and Mitigating Social Biases in Language Models, Paul Pu Liang et al., International Conference on Machine Learning
[10, 2020] Aligning ai with shared human values, Dan Hendrycks et al., arXiv preprint arXiv:2008.02275
[10, 2020] Recipes for safety in open-domain chatbots, Jing Xu et al., arXiv preprint arXiv:2010.07079
[9, 2020] Measuring massive multitask language understanding, Dan Hendrycks et al., arXiv preprint arXiv:2009.03300
[12, 2018] Ethical challenges in data-driven dialogue systems, Peter Henderson et al., Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society