Skip to content
View GregxmHu's full-sized avatar
🤝
🤝

Block or report GregxmHu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Jailbreak artifacts for JailbreakBench

54 8 Updated Nov 6, 2024

A fast + lightweight implementation of the GCG algorithm in PyTorch

Python 184 40 Updated Feb 9, 2025

Repo for NeurIPS 2024 paper "Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes"

Python 2 Updated Nov 14, 2024

Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.

Python 39 3 Updated Jan 30, 2025

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and finetune GPT-NEO (2.7 B) on a single GPU with Huggingface Transformers using DeepSpeed

Python 437 74 Updated Jun 14, 2023

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

Python 3,628 533 Updated Feb 27, 2025

An index of algorithms for reinforcement learning from human feedback (rlhf))

92 2 Updated Apr 17, 2024

The related works and background techniques about Openai o1

214 9 Updated Jan 7, 2025

[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning

Python 89 4 Updated May 23, 2024

Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

Jupyter Notebook 119 10 Updated Jul 19, 2024

A curated list of reinforcement learning with human feedback resources (continually updated)

3,756 232 Updated Feb 19, 2025

List of papers on hallucination detection in LLMs.

782 61 Updated Feb 22, 2025

Set of tools to assess and improve LLM security.

Python 2,923 486 Updated Feb 14, 2025
Jupyter Notebook 3 Updated May 15, 2024

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

1,203 76 Updated Feb 24, 2025

Code for our NeurIPS2023 accepted paper: RADAR: Robust AI-Text Detection via Adversarial Learning. We tested RADAR on 8 LLMs including Vicuna and LLaMA. The results show that RADAR can attain good …

Jupyter Notebook 47 3 Updated Mar 19, 2024

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

Python 542 44 Updated Mar 10, 2024

TAP: An automated jailbreaking method for black-box LLMs

Python 146 23 Updated Dec 10, 2024

Code for visualizing the loss landscape of neural nets

Python 2,923 409 Updated Apr 5, 2022

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

14,277 1,428 Updated Feb 13, 2023
Python 91 11 Updated Nov 13, 2023

[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".

Python 296 46 Updated Jan 22, 2025

[ICML 2021] Break-It-Fix-It: Unsupervised Learning for Program Repair

Python 113 26 Updated Apr 20, 2023

A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)

1,109 58 Updated Jan 4, 2024

[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).

Jupyter Notebook 2,973 274 Updated Dec 24, 2024

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1,688 134 Updated Sep 19, 2023

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull reque…

Python 887 149 Updated Jan 7, 2024

Curation of prompts that are known to be adversarial to large language models

179 10 Updated Feb 12, 2023

prompt attack-defense, prompt Injection, reverse engineering notes and examples | 提示词对抗、破解例子与笔记

159 20 Updated Feb 25, 2025
Next