Stars
basically all the things I used for this article
Benchmarking LLMs' Psychological Portrayal
Benchmarking LLMs' Gaming Ability in Multi-Agent Environments
Benchmarking LLMs' Emotional Alignment with Humans
MTTM: Metamorphic Testing for Textual Content Moderation Software
Multilingual safety benchmark for Large Language Models
A collection of resources on applications of multi-modal learning in medical imaging.
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
Classic papers for beginners, and impact scope for authors.
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"