Most existing defenses against data poisoning assume access to a set of clean data (referred to as the base set). While this assumption has been taken for granted, given the fast-growing research on stealthy data poisoning techniques, we find that defenders with existing methods, including manual inspections, cannot identify a clean base set within a contaminated dataset.
The above figure shows the human inspection results regarding data poisoning attacks. The labels and images marked in red depict potential manipulations under that attack category, and the green represents that the attribute remains intact. Among the three types of attacks, we report the error rate of misclassifying clean samples into poisoned ones (FPR) or poisoned ones into clean samples (FNR). The result reveals humans indeed can't identify all poisoned samples with high precision. In particular, manual inspection's performance in identifying Feature-Only (e.g., clean-label backdoor attacks) attacks is only marginally better than random selection.
With the above-identified challenge of obtaining a clean base set with high precision, we take a step further and propose META-SIFT to resolve the challenge. Our evaluation shows that META-SIFT can robustly sift out a clean base set (size 1000 or more) with 100% precision and zero variance under a wide range of poisoning attacks. The selected base set is large enough to give rise to successful defense when plugged into the existing AI-security defense techniques (e.g., robust training for mitigating label-noise attacks; trojan-net detections, backdoor removal defenses, or backdoor sample detections).
- Quickly sift out clean subsets (about 80 seconds on the CIFAR-10 with 5 GPUs)
- No need for pre-training any model
- Effective against most existing poisoning attack settings (evaluated on 16 existing label-flipping, backdoor, poisoning attacks)
- Applicable to most existing datasets (evaluated on CIFAR-10, GTSRB, PubFig, ImageNet)
- Can be adopted as n off-the-shelf toll and give rise to existing defense algorithms under settings where no clean base set access
- Python >= 3.6
- PyTorch >= 1.10.1
- Torchvision >= 0.11.2
- Imageio >= 2.9.0
Use the trojan_backdoor_detect_gtsrb.ipynb notebook for a quick start of the Meta-Sift method (demonstrated on the GTSRB dataset). The default setting running on the GTSRB dataset and attack method is BadNets.
There are a several of optional arguments in the args
:
corruption_type
: The poison methodcorruption_ratio
: The poison rates of the poison method.tar_lab
: The target label of the attack (if not targeting at all the labels)repeat_rounds
: The number of sifters to use when selecting clean subsets, default 5.warmup_round
: The number of epochs for warm-up before training sifters, default 1.
The whole process of Meta-Sift consists of two stages: the Training Stage and the Identification Stage. Multiple(m) Sifters will be included during the Identification Stage to reduce the randomness resulting from SGD and randomized sample-dilution. As such, the Training Stage will be repeated m times with different random seeds to obtain m Sifters. In each Sifter, there are two different structures working as a pair: model θ and MW-Net ψ. In one iteration of the Training Stage, there are four steps: Virtual-update of θ; Gradient Sampling using the meta-gradient-sampler Γ; Meta-update of ψ; then the Actual-update of θ. After only one iteration,Training Stage will terminate. The trained Sifters will be adopted in the Identification Stage to assign weights to the diluted data from the dataset. Finally, Meta-Sift aggregates the results from multiple Sifters, and the clean samples will be sifted by inspecting the high-value end.