This repository contains code implementation for the paper:
Delta-Influence: Unlearning Poisons via Influence Functions
You can follow the below step-by-step guideline to replicate our experiments on "cifar10+badnet" which includes all code for attack, detection, unlearning and eval.
Notebooks for other "{dataset}+{attack}" will be updated in the future, (currently we provide "cifar10+badnet", "cifar100+frequency attack, "imagenette+witches' brew") but essentially they are similar so you can definitely try some different datasets, attack methods and unlearn algorithms:)
conda create -n delta-influence-env python=3.12
conda activate delta-influence-env
pip install -e .
Credits: We utilize the Kronfluence to calculate influence matrix and the Corrective-Unlearning-Bench for unlearning, so please make sure you have them installed before moving on
"poison_dataset.ipynb" shows how to inject badnet poison into the cifar10 dataset and also provides training scripts to get the victim model
"delta_influence.ipynb" implements the delta-influence algorithm, which will return you the most responsible examples for the poisoning behavior
Besides, we also provide implementations of other popular detection methods, as well as the threshold baseline mentioned in the paper:
- Activation Clustering
- Spectral Signature
- Frequency Analysis (built based on https://github.com/YiZeng623/frequency-backdoor)
- Influence Threshold
To check the ablation studies, relavant notebooks can be found named "modify_images.ipynb" and "modify_labels.ipynb"
For each combination of "{dataset}+{attack}+{detection}", we compare the unlearning effectiveness of 5 different corrective unlearning methods: