Stars
A lightweight library for large laguage model (LLM) jailbreaking defense.
Universal and Transferable Attacks on Aligned Language Models
LLM大模型(重点)以及搜广推等 AI 算法中手写的面试题,(非 LeetCode),比如 Self-Attention, AUC等,一般比 LeetCode 更考察一个人的综合能力,又更贴近业务和基础知识一点
大模型算法岗面试题(含答案):常见问题和概念解析 "大模型面试题"、"算法岗面试"、"面试常见问题"、"大模型算法面试"、"大模型应用基础"
A series of technical report on Slow Thinking with LLM
Source code for COLING'25 paper "Monte Carlo Tree Search Based Prompt Autogeneration for Jailbreak Attacks against LLMs".
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
An easy-to-use Python framework to generate adversarial jailbreak prompts.
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
MCTS-Enhanced AI: A Monte Carlo Tree Search algorithm for iterative response refinement using large language models.
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
"他山之石、可以攻玉":复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
An open source implementation of CLIP.