A curated list of awesome AI security related frameworks, attacks, tools and papers. Inspired by awesome-machine-learning
.
If you want to contribute, create a PR or contact me @ottosulin.
- NIST AI Risk Management Framework
- ISO/IEC 42001 Artificial Intelligence Management System - still under development
- ISO/IEC 23894:2023 Information technology — Artificial intelligence — Guidance on risk management
- Google Secure AI Framework
- NIST AI 100-2e2023
- AVIDML
- MITRE ATLAS
- ISO/IEC 22989:2022 Information technology — Artificial intelligence — Artificial intelligence concepts and terminology
- Malware Env for OpenAI Gym - makes it possible to write agents that learn to manipulate PE files (e.g., malware) to achieve some objective (e.g., bypass AV) based on a reward provided by taking specific manipulation actions
- Deep-pwning - a lightweight framework for experimenting with machine learning models with the goal of evaluating their robustness against a motivated adversary
- Counterfit - generic automation layer for assessing the security of machine learning systems
- DeepFool - A simple and accurate method to fool deep neural networks
- garak - security probing tool for LLMs
- Snaike-MLFlow - MLflow red team toolsuite
- HackGPT - A tool using ChatGPT for hacking
- Charcuterie - code execution techniques for ML or ML adjacent libraries
- OffsecML Playbook - A collection of offensive and adversarial TTP's with proofs of concept
- Exploring the Space of Adversarial Images
- Adversarial Machine Learning Library(Ad-lib)](https://github.com/vu-aml/adlib) - Game-theoretic adversarial machine learning library providing a set of learner and adversary modules
- EasyEdit - Modify an LLM's ground truths
- BadDiffusion - Official repo to reproduce the paper "How to Backdoor Diffusion Models?" published at CVPR 2023
- PrivacyRaven - privacy testing library for deep learning systems
- Guardrail.ai - Guardrails is a Python package that lets a user add structure, type and quality guarantees to the outputs of large language models (LLMs)
- ProtectAI's model scanner - Security scanner detecting serialized ML Models performing suspicious actions
- rebuff - Prompt Injection Detector
- langkit - LangKit is an open-source text metrics toolkit for monitoring language models. The toolkit various security related metrics that can be used to detect attacks
- StringSifter - A machine learning tool that ranks strings based on their relevance for malware analysis
- Python Differential Privacy Library
- Diffprivlib - The IBM Differential Privacy Library
- PLOT4ai - Privacy Library Of Threats 4 Artificial Intelligence A threat modeling library to help you build responsible AI
- TenSEAL - A library for doing homomorphic encryption operations on tensors
- SyMPC - A Secure Multiparty Computation companion library for Syft
- PyVertical - Privacy Preserving Vertical Federated Learning
- OWASP ML TOP 10
- OWASP LLM TOP 10
- OWASP AI Security and Privacy Guide
- OWASP WrongSecrets LLM exercise
- NIST AIRC - NIST Trustworthy & Responsible AI Resource Center
- ENISA Multilayer Framework for Good Cybersecurity Practices for AI
- The MLSecOps Top 10 by Institute for Ethical AI & Machine Learning
- High Dimensional Spaces, Deep Learning and Adversarial Examples
- Adversarial Task Allocation
- Robust Physical-World Attacks on Deep Learning Models
- The Space of Transferable Adversarial Examples
- RHMD: Evasion-Resilient Hardware Malware Detectors
- Generic Black-Box End-to-End Attack against RNNs and Other API Calls Based Malware Classifiers
- Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks
- Can you fool AI with adversarial examples on a visual Turing test?
- Explaining and Harnessing Adversarial Examples
- Delving into adversarial attacks on deep policies
- Crafting Adversarial Input Sequences for Recurrent Neural Networks
- Practical Black-Box Attacks against Machine Learning
- Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN
- Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains
- Fast Feature Fool: A data independent approach to universal adversarial perturbations
- Simple Black-Box Adversarial Perturbations for Deep Networks
- Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning
- One pixel attack for fooling deep neural networks
- FedMLSecurity: A Benchmark for Attacks and Defenses in Federated Learning and LLMs
- Jailbroken: How Does LLM Safety Training Fail?
- Bad Characters: Imperceptible NLP Attacks
- Universal and Transferable Adversarial Attacks on Aligned Language Models
- Exploring the Vulnerability of Natural Language Processing Models via Universal Adversarial Texts
- Adversarial Examples Are Not Bugs, They Are Features
- Adversarial Attacks on Tables with Entity Swap
- Stealing Machine Learning Models via Prediction APIs
- On the Risks of Stealing the Decoding Algorithms of Language Models
- Adversarial Demonstration Attacks on Large Language Models
- Looking at the Bag is not Enough to Find the Bomb: An Evasion of Structural Methods for Malicious PDF Files Detection
- Adversarial Generative Nets: Neural Network Attacks on State-of-the-Art Face Recognition
- Query Strategies for Evading Convex-Inducing Classifiers
- Adversarial Prompting for Black Box Foundation Models
- Automatically Evading Classifiers A Case Study on PDF Malware Classifiers
- Generic Black-Box End-to-End Attack against RNNs and Other API Calls Based Malware Classifiers
- Fast Feature Fool: A data independent approach to universal adversarial perturbations
- GPTs Don’t Keep Secrets: Searching for Backdoor Watermark Triggers in Autoregressive Language Models
- Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
- BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT
- Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization
- Efficient Label Contamination Attacks Against Black-Box Learning Models
- Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning
- UOR: Universal Backdoor Attacks on Pre-trained Language Models
- Analyzing And Editing Inner Mechanisms of Backdoored Language Models
- Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
- How to Backdoor Diffusion Models?
- On the Exploitability of Instruction Tuning
- Defending against Insertion-based Textual Backdoor Attacks via Attribution
- A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning
- BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements
- Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models
- BadPrompt: Backdoor Attacks on Continuous Prompts
- Extracting training data from diffusion models
- Prompt Stealing Attacks Against Text-to-Image Generation Models
- Are Diffusion Models Vulnerable to Membership Inference Attacks?
- Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
- Multi-step Jailbreaking Privacy Attacks on ChatGPT
- Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
- ProPILE: Probing Privacy Leakage in Large Language Models
- DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection
- Black Box Adversarial Prompting for Foundation Models
- Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
- Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models
- Jailbreaker: Automated Jailbreak Across Multiple Large Language Model Chatbots
- (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs
- Summoning Demons: The Pursuit of Exploitable Bugs in Machine Learning
- Automatically Evading Classifiers A Case Study on PDF Malware Classifiers
- capAI - A Procedure for Conducting Conformity Assessment of AI Systems in Line with the EU Artificial Intelligence Act
- A Study on Robustness and Reliability of Large Language Model Code Generation