Skip to content

A resource repository for machine unlearning in large language models

License

Notifications You must be signed in to change notification settings

yaojin17/awesome-llm-unlearning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 

Repository files navigation

Awesome Large Language Model Unlearning

Awesome GitHub stars GitHub forks GitHub issues GitHub Last commit

This repository tracks the latest research on machine unlearning in large language models (LLMs). The goal is to offer a comprehensive list of papers, datasets, and resources relevant to the topic.

Note

If you believe your paper on LLM unlearning is not included, or if you find a mistake, typo, or information that is not up to date, please open an issue, and I will address it as soon as possible.

If you want to add a new paper, feel free to either open an issue or create a pull request.

Table of Contents

Papers

Methods

Surveys and Position Papers

Blog Posts

Datasets

  • TOFU
    • Description: A synthetic QA dataset of fictitious authors generated by GPT-4. The datasets comes with three splits of the retain/forget sets, including 99/1, 95/5, and 90/10 (in percentage). The dataset also includes questions about real authors and world facts to evaluate the loss of general knowledge after unlearning.
    • Links: arXiv, Hugging Face
  • WMDP
    • Description: A benchmark for assessing hazardous knowledge in biology, chemistry, and cybersecurity, containing about 4000 multiple-choice questions with similar style to MMLU. It also comes with corpora in the three domains.
    • Links: arXiv, Hugging Face
  • MMLU Subsets
    • Description: A task proposed along with the WMDP dataset. The goal is to unlearn (retain) three categories in the MMLU dataset: economics (econometrics and others), physics (math and others), and law (jurisprudence and others). The task requires high-precision unlearning, because the retain sets are categories closely related to the unlearning categories.
    • Links: arXiv, Hugging Face
  • arXiv, GitHub, and copyrighted books corpus
    • Description: A dataset for evaluating approximate unlearning algorithms for pre-trained LLMs. The dataset contains both forget and retain splits of each category, and comes with both in-distribution and general retain sets. The dataset is deisgned for unlearning directly on pre-trained models, as they are random samples from the pre-training dataset of Yi.
    • Links: arXiv, Hugging Face

About

A resource repository for machine unlearning in large language models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published