Implementation of a privacy evaluation framework for synthetic data publishing
The module attack_models
so far includes:
-
MIAttackClassifier
is a privacy adversary that implements a generative model MIA, and can be used to evaluate the risk of linkability. Given a single synthetic dataset output by a generative model, this adversary produces a binary label that predicts whether a target record belongs to the model’s training set or not -
AttributeInferenceAttack
is a privacy adversary that learns to predict the value of an unknown sensitive attribute from a set of known attributes, and uses this knowledge to guess a target record’s sensitive value.
The module generative_models
so far includes:
IndependentHistogramModel
: An independent histogram model adapted from Data Responsibly's DataSynthesiserBayesianNetModel
: A generative model based on a Bayesian Network adapted from Data Responsibly's DataSynthesiserGaussianMixtureModel
: A simple Gaussian Mixture model taken from the sklearn libraryCTGAN
: A conditional tabular generative adversarial network that integrates the CTGAN model from CTGANPateGan
: A model that builds on the Private Aggregation of Teacher Ensembles (PATE) to achieve differential privacy for GANs adapted from PateGan
The framework and its building blocks have been developed and tested on Python 3.6 and 3.7
We recommend to create a virtual environment for installing all dependencies and running the code
python3 -m venv pyvenv3
source pyvenv3/bin/activate
pip install -r requirements.txt
The PyTorch package to install depends on the version of CUDA (if any) installed on your system. Please refer to their website to install the correct PyTorch package on your virtual environment.
The CTGAN
model depends on a fork of the original model training algorithm that can be found here
To install the correct version clone the repository above and run
cd CTGAN
make install
To test your installation try to run
import ctgan
from within your virtualenv python
To run the test suite included in tests
run
python -m unittest discover
To run an example evaluation of the expected privacy gain with respect to the risk of linkability for all five generative models you can run
python mia_cli.py -D data/germancredit -RC runconfig_mia_example.json -O .
To run an example evaluation of the expected privacy gain with respect to the risk of attribute inference for all five generative models you can run
python mleai_cli.py -D data/germancredit -RC runconfig_attr_example.json -O .