Skip to content

Commit

Permalink
Added new model (differential_privacy/pate)
Browse files Browse the repository at this point in the history
Squashed commit of the following:

commit 5a6887b50ce35df0c62fe6a1d4dddf89296b58af
Author: Ilya Mironov <[email protected]>
Date:   Tue Apr 3 13:21:32 2018 -0700

    Renaming pate2_paper to ICLR2018. Adding [email protected] to README.md.

commit 666ac8800a68a965508dec4c148134723f5d2725
Author: Ilya Mironov <[email protected]>
Date:   Tue Apr 3 13:16:54 2018 -0700

    Moving everything under research/differential_privacy/pate

commit 1810be3e31279391b5f339c29c4e25d631409736
Author: Ilya Mironov <[email protected]>
Date:   Mon Apr 2 15:18:49 2018 -0700

    Addressing Ananth's comments:
     - added link to https://github.com/abseil/abseil-py
     - added "pip install absl-py"
     - fixed import statement in core_test.py

    Added requirement of write access.

commit e2f7e01462815bdd957952769edfea8f94407ce6
Author: Ilya Mironov <[email protected]>
Date:   Mon Apr 2 12:32:27 2018 -0700

    Overview section of README.md is revised.

commit 692680868cb1a691ff09f2bce194298603c80467
Author: Ilya Mironov <[email protected]>
Date:   Mon Apr 2 11:55:53 2018 -0700

    Massive restructuring:
    - Files renamed from pate.py to core.py, pate_smooth_sensitivity to smooth_sensitivity.py.
    - Absolute references changed to relative.
    - BUILDs removed.
    - README.md amended.

commit b2dc2246c50a51a6eb455c6736d351387bad4e31
Author: Ilya Mironov <[email protected]>
Date:   Fri Mar 30 10:10:51 2018 -0700

    Editing README files.

commit 4762823e1ca58d2b7a4e28d7bda191aff137a245
Author: Ilya Mironov <[email protected]>
Date:   Thu Mar 29 22:07:25 2018 -0700

    Adding download.py file.

commit e488d1a4a9e066fa83fa948cb5b71fe134a73c12
Author: Ilya Mironov <[email protected]>
Date:   Thu Mar 29 21:44:43 2018 -0700

    Adding the copyright statement.

commit 5b5759c9399914731340031bc668e69bc2c83d48
Author: Ilya Mironov <[email protected]>
Date:   Thu Mar 29 19:07:01 2018 -0700

    Moving everything under research/

commit 50ff075ce1e14700328ebc4f42ee418d7a7874f5
Author: Ilya Mironov <[email protected]>
Date:   Thu Mar 29 18:56:20 2018 -0700

    Initial commit
  • Loading branch information
ilyamironov committed Apr 3, 2018
1 parent abd5042 commit f2b07ae
Show file tree
Hide file tree
Showing 17 changed files with 2,785 additions and 3 deletions.
2 changes: 1 addition & 1 deletion CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
/research/compression/ @nmjohn
/research/deeplab/ @aquariusjay @yknzhu @gpapan
/research/delf/ @andrefaraujo
/research/differential_privacy/ @panyx0718
/research/differential_privacy/ @panyx0718 @mironov
/research/domain_adaptation/ @bousmalis @dmrd
/research/gan/ @joel-shor
/research/im2txt/ @cshallue
Expand Down
4 changes: 2 additions & 2 deletions research/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ installation](https://www.tensorflow.org/install).
pre-trained Residual GRU network.
- [deeplab](deeplab): deep labelling for semantic image segmentation.
- [delf](delf): deep local features for image matching and retrieval.
- [differential_privacy](differential_privacy): privacy-preserving student
models from multiple teachers.
- [differential_privacy](differential_privacy): differential privacy for training
data.
- [domain_adaptation](domain_adaptation): domain separation networks.
- [gan](gan): generative adversarial networks.
- [im2txt](im2txt): image-to-text neural network for image captioning.
Expand Down
56 changes: 56 additions & 0 deletions research/differential_privacy/pate/ICLR2018/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
Scripts in support of the paper "Scalable Private Learning with PATE" by Nicolas
Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Ulfar
Erlingsson (ICLR 2018, https://arxiv.org/abs/1802.08908).


### Requirements

* Python, version &ge; 2.7
* absl (see [here](https://github.com/abseil/abseil-py), or just type `pip install absl-py`)
* matplotlib
* numpy
* scipy
* sympy (for smooth sensitivity analysis)
* write access to current directory (otherwise, output directories in download.py and *.sh scripts
must be changed)

## Reproducing Figures 1 and 5, and Table 2

Before running any of the analysis scripts, create the data/ directory and download votes files by running\
`$ python download.py`

To generate Figures 1 and 5 run\
`$ sh generate_figures.sh`\
The output is written to the figures/ directory.

For Table 2 run (may take several hours)\
`$ sh generate_table.sh`\
The output is written to the console.

For data-independent bounds (for comparing with Table 2), run\
`$ sh generate_table_data_independent.sh`\
The output is written to the console.

## Files in this directory

* generate_figures.sh --- Master script for generating Figures 1 and 5.

* generate_table.sh --- Master script for generating Table 2.

* generate_table_data_independent.sh --- Master script for computing data-independent
bounds.

* rdp_bucketized.py --- Script for producing Figures 1 (right) and 5 (right).

* rdp_cumulative.py --- Script for producing Figure 1 (left, middle), Figure 5
(left), and partition.pdf (a detailed breakdown of privacy costs per
source).

* smooth_sensitivity_table.py --- Script for generating Table 2.

* rdp_flow.py and plot_ls_q.py are currently not used.

* download.py --- Utility script for populating the data/ directory.


All Python files take flags. Run script_name.py --help for help on flags.
43 changes: 43 additions & 0 deletions research/differential_privacy/pate/ICLR2018/download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Script to download votes files to the data/ directory.
"""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from six.moves import urllib
import os
import tarfile

FILE_URI = 'https://storage.googleapis.com/pate-votes/votes.gz'
DATA_DIR = 'data/'


def download():
print('Downloading ' + FILE_URI)
tar_filename, _ = urllib.request.urlretrieve(FILE_URI)
print('Unpacking ' + tar_filename)
with tarfile.open(tar_filename, "r:gz") as tar:
tar.extractall(DATA_DIR)
print('Done!')


if __name__ == '__main__':
if not os.path.exists(DATA_DIR):
print('Data directory does not exist. Creating ' + DATA_DIR)
os.makedirs(DATA_DIR)
download()
44 changes: 44 additions & 0 deletions research/differential_privacy/pate/ICLR2018/generate_figures.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/bin/bash
# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================


counts_file="data/glyph_5000_teachers.npy"
output_dir="figures/"
executable1="python rdp_bucketized.py"
executable2="python rdp_cumulative.py"

mkdir -p $output_dir

if [ ! -d "$output_dir" ]; then
echo "Directory $output_dir does not exist."
exit 1
fi

$executable1 \
--plot=small \
--counts_file=$counts_file \
--plot_file=$output_dir"noisy_thresholding_check_perf.pdf"

$executable1 \
--plot=large \
--counts_file=$counts_file \
--plot_file=$output_dir"noisy_thresholding_check_perf_details.pdf"


$executable2 \
--cache=False \
--counts_file=$counts_file \
--figures_dir=$output_dir
93 changes: 93 additions & 0 deletions research/differential_privacy/pate/ICLR2018/generate_table.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#!/bin/bash
# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================


echo "Reproducing Table 2. Takes a couple of hours."

executable="python smooth_sensitivity_table.py"
data_dir="data"

echo
echo "######## MNIST ########"
echo

$executable \
--counts_file=$data_dir"/mnist_250_teachers.npy" \
--threshold=200 \
--sigma1=150 \
--sigma2=40 \
--queries=640 \
--delta=1e-5

echo
echo "######## SVHN ########"
echo

$executable \
--counts_file=$data_dir"/svhn_250_teachers.npy" \
--threshold=300 \
--sigma1=200 \
--sigma2=40 \
--queries=8500 \
--delta=1e-6

echo
echo "######## Adult ########"
echo

$executable \
--counts_file=$data_dir"/adult_250_teachers.npy" \
--threshold=300 \
--sigma1=200 \
--sigma2=40 \
--queries=1500 \
--delta=1e-5

echo
echo "######## Glyph (Confident) ########"
echo

$executable \
--counts_file=$data_dir"/glyph_5000_teachers.npy" \
--threshold=1000 \
--sigma1=500 \
--sigma2=100 \
--queries=12000 \
--delta=1e-8

echo
echo "######## Glyph (Interactive, Round 1) ########"
echo

$executable \
--counts_file=$data_dir"/glyph_round1.npy" \
--threshold=3500 \
--sigma1=1500 \
--sigma2=100 \
--delta=1e-8

echo
echo "######## Glyph (Interactive, Round 2) ########"
echo

$executable \
--counts_file=$data_dir"/glyph_round2.npy" \
--baseline_file=$data_dir"/glyph_round2_student.npy" \
--threshold=3500 \
--sigma1=2000 \
--sigma2=200 \
--teachers=5000 \
--delta=1e-8
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
#!/bin/bash
# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================


echo "Table 2 with data-independent analysis."

executable="python smooth_sensitivity_table.py"
data_dir="data"

echo
echo "######## MNIST ########"
echo

$executable \
--counts_file=$data_dir"/mnist_250_teachers.npy" \
--threshold=200 \
--sigma1=150 \
--sigma2=40 \
--queries=640 \
--delta=1e-5 \
--data_independent
echo
echo "######## SVHN ########"
echo

$executable \
--counts_file=$data_dir"/svhn_250_teachers.npy" \
--threshold=300 \
--sigma1=200 \
--sigma2=40 \
--queries=8500 \
--delta=1e-6 \
--data_independent

echo
echo "######## Adult ########"
echo

$executable \
--counts_file=$data_dir"/adult_250_teachers.npy" \
--threshold=300 \
--sigma1=200 \
--sigma2=40 \
--queries=1500 \
--delta=1e-5 \
--data_independent

echo
echo "######## Glyph (Confident) ########"
echo

$executable \
--counts_file=$data_dir"/glyph_5000_teachers.npy" \
--threshold=1000 \
--sigma1=500 \
--sigma2=100 \
--queries=12000 \
--delta=1e-8 \
--data_independent

echo
echo "######## Glyph (Interactive, Round 1) ########"
echo

$executable \
--counts_file=$data_dir"/glyph_round1.npy" \
--threshold=3500 \
--sigma1=1500 \
--sigma2=100 \
--delta=1e-8 \
--data_independent

echo
echo "######## Glyph (Interactive, Round 2) ########"
echo

$executable \
--counts_file=$data_dir"/glyph_round2.npy" \
--baseline_file=$data_dir"/glyph_round2_student.npy" \
--threshold=3500 \
--sigma1=2000 \
--sigma2=200 \
--teachers=5000 \
--delta=1e-8 \
--order=8.5 \
--data_independent
Loading

0 comments on commit f2b07ae

Please sign in to comment.