Stars
MTTM: Metamorphic Testing for Textual Content Moderation Software
basically all the things I used for this article
Code and data for our paper "On the Resilience of Multi-Agent Systems with Malicious Agents"
A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Benchmarking LLMs' Gaming Ability in Multi-Agent Environments
TransferAttack is a pytorch framework to boost the adversarial transferability for image classification.
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
SpyGame: An interactive multi-agent framework to evaluate intelligence with large language models :D
Multilingual safety benchmark for Large Language Models
Benchmarking LLMs' Psychological Portrayal
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs
👨💻 An awesome and curated list of best code-LLM for research.
A framework to evaluate the generalization capability of safety alignment for LLMs
Benchmarking LLMs' Emotional Alignment with Humans
Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.
[TACL 2024] MAPS enables LLMs🤖 to mimic the human😁 translation process.
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
MAD: The first work to explore Multi-Agent Debate with Large Language Models :D
The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.
A preliminary evaluation of ChatGPT/GPT-4 for machine translation.
This is the tool released in ICSE 2022 paper "Static Inference Meets Deep Learning: A Hybrid Type Inference Approach for Python"
Must-read papers on graph neural networks (GNN)
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)