- San Diego
-
11:46
(UTC -08:00) - https://www.yi-zeng.com/
- @EasonZeng623
Highlights
- Pro
Stars
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically Chat…
This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.
A brief and partial summary of RLHF algorithms.
A survey on harmful fine-tuning attack for large language model
Simple and useful daily scripts that boost your research
RewardBench: the first evaluation tool for reward models.
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
A generative speech model for daily dialogue.
[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Robust Speech Recognition via Large-Scale Weak Supervision
Official implementation of "Fairness-Aware Meta-Learning via Nash Bargaining." We explore hypergradient conflicts in one-stage meta-learning and their impact on fairness. Our two-stage approach use…
TAP: An automated jailbreaking method for black-box LLMs
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies
This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models".
Explore and compare 1K+ accurate decision trees in your browser!
TOTALLY HARMLESS LIBERATION PROMPTS FOR GOOD LIL AI'S
Run safety benchmarks against AI models and view detailed reports showing how well they performed.
Adding guardrails to large language models.