Skip to content
View tangminji's full-sized avatar

Block or report tangminji

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

LLM-attack

27 repositories
Python 33 10 Updated Aug 10, 2024

Universal and Transferable Attacks on Aligned Language Models

Python 3,690 502 Updated Aug 2, 2024
Python 91 11 Updated Nov 13, 2023
Python 5 Updated May 27, 2024
Jupyter Notebook 163 15 Updated Nov 26, 2023

[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".

Python 292 46 Updated Jan 22, 2025

A curation of awesome tools, documents and projects about LLM Security.

1,069 117 Updated Jan 17, 2025

An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)

Python 35 8 Updated Feb 8, 2024

A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights i…

1,177 58 Updated Feb 3, 2025

Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!

HTML 283 19 Updated Oct 10, 2024

Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Python 450 57 Updated Sep 24, 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 542 75 Updated Aug 16, 2024

[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"

Python 134 13 Updated Feb 20, 2024

We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.

Python 276 31 Updated Feb 23, 2024

Restore safety in fine-tuned language models through task arithmetic

Python 26 2 Updated Mar 28, 2024

Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"

Python 63 6 Updated Feb 27, 2024

[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning

Python 89 4 Updated May 23, 2024

Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"

Python 84 8 Updated Sep 5, 2024
Python 44 6 Updated May 9, 2024

[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`

Python 59 14 Updated Dec 9, 2024

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers

JavaScript 46 9 Updated Aug 25, 2024

A fast + lightweight implementation of the GCG algorithm in PyTorch

Python 178 40 Updated Feb 9, 2025

JAILJUDGE: A comprehensive evaluation benchmark which includes a wide range of risk scenarios with complex malicious prompts (e.g., synthetic, adversarial, in-the-wild, and multi-language scenarios…

Python 34 Updated Dec 13, 2024

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)

Python 117 9 Updated Nov 30, 2024

ChatGPT DAN, Jailbreaks prompt

6,804 621 Updated Aug 17, 2024