Code for our NeurIPS2023 accepted paper: RADAR: Robust AI-Text Detection via Adversarial Learning. We tested RADAR on 8 LLMs including Vicuna and LLaMA. The results show that RADAR can attain good …

Jupyter Notebook 47 3 Updated Mar 19, 2024

declare-lab / instruct-eval

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

Python 542 44 Updated Mar 10, 2024

RICommunity / TAP

TAP: An automated jailbreaking method for black-box LLMs

Python 146 23 Updated Dec 10, 2024

tomgoldstein / loss-landscape

Code for visualizing the loss landscape of neural nets

Python 2,923 409 Updated Apr 5, 2022

dair-ai / ml-visuals

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

14,277 1,428 Updated Feb 13, 2023

arobey1 / smooth-llm

Python 91 11 Updated Nov 13, 2023

SheltonLiu-N / AutoDAN

[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".

Python 296 46 Updated Jan 22, 2025

michiyasunaga / BIFI

[ICML 2021] Break-It-Fix-It: Unsupervised Learning for Program Repair

Python 113 26 Updated Apr 20, 2023

yaodongC / awesome-instruction-dataset

A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)

1,109 58 Updated Jan 4, 2024

verazuo / jailbreak_llms

[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).

Jupyter Notebook 2,973 274 Updated Dec 24, 2024

anthropics / hh-rlhf

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1,688 134 Updated Sep 19, 2023

PrithivirajDamodaran / Parrot_Paraphraser

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull reque…

Python 887 149 Updated Jan 7, 2024

hwchase17 / adversarial-prompts

Curation of prompts that are known to be adversarial to large language models

179 10 Updated Feb 12, 2023

yunwei37 / prompt-hacker-collections

prompt attack-defense, prompt Injection, reverse engineering notes and examples | 提示词对抗、破解例子与笔记

159 20 Updated Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xiaomeng Hu GregxmHu

Achievements

Achievements

Block or report GregxmHu

Starred repositories

JailbreakBench / artifacts

GraySwanAI / nanoGCG

IBM / Gradient-Cuff

facebookresearch / MemoryMosaics

Xirider / finetune-gpt2xl

shibing624 / MedicalGPT

louieworth / awesome-rlhf

wjn1996 / Awesome-LLM-Reasoning-Openai-o1-Survey

SafeAILab / RAIN

uw-nsl / SafeDecoding

opendilab / awesome-RLHF

EdinburghNLP / awesome-hallucination-detection

meta-llama / PurpleLlama

openai / moderation-api-release

A-b-h-a-y-0-2 / radar-multilingual

ThuCCSLab / Awesome-LM-SSP

IBM / RADAR