Name		Name	Last commit message	Last commit date
Latest commit History 393 Commits
.github/workflows		.github/workflows
.idea		.idea
analysis		analysis
contributing		contributing
examples		examples
pipeline_dp		pipeline_dp
tests		tests
utility_analysis		utility_analysis
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pylintrc.dms		pylintrc.dms
pyproject.toml		pyproject.toml
requirements.dev.txt		requirements.dev.txt
setup.cfg		setup.cfg
setup.py		setup.py

Repository files navigation

PipelineDP

PipelineDP is a framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.

To make differential privacy accessible to non-experts, PipelineDP:

Provides a convenient API familiar to Spark or Beam developers.
Encapsulates the complexities of differential privacy, such as:
- protecting outliers and rare categories,
- generating safe noise,
- privacy budget accounting.
Supports many standard computations, such as count, sum, and average.

Additional information can be found at pipelinedp.io.

Note that this project is still experimental and is subject to change. At the moment we don't recommend its usage in production systems as it's not thoroughly tested yet. You can learn more in the Roadmap section.

The project is a collaboration between OpenMined and Google in an effort to bring Differential Privacy to production.

Getting started

Here are some examples of how to use PipelineDP:

Please check out the codelab for a more detailed demonstration of the API functionality and usage.

Code sample showing private processing on Spark:

# Define the privacy budget available for our computation.
budget_accountant = pipeline_dp.NaiveBudgetAccountant(total_epsilon=1,
                                                      total_delta=1e-6)

# Wrap Spark's RDD into its private version. You will use this private wrapper
# for all further processing instead of the Spark's RDD. Using the wrapper ensures
# that only private statistics can be released.
private_movie_views = \
    make_private(movie_views, budget_accountant, lambda mv: mv.user_id)

# Calculate the private sum of ratings per movie
dp_result = private_movie_views.sum(
    SumParams(
              # The aggregation key: we're grouping data by movies
              partition_extractor=lambda mv: mv.movie_id,
              # The value we're aggregating: we're summing up ratings
              value_extractor=lambda mv: mv.rating,

              # Limits to how much one user can contribute:
              # .. at most two movies rated per user
              #    (if there's more, randomly choose two)
              max_partitions_contributed=2,
              # .. at most one ratings for each movie
              max_contributions_per_partition=1,
              # .. with minimal rating of "1"
              #    (automatically clip the lesser values to "1")
              min_value=1,
              # .. and maximum rating of "5"
              #    (automatically clip the greater values to "5")
              max_value=5)
              )
budget_accountant.compute_budgets()

# Save the results
dp_result.saveAsTextFile(FLAGS.output_file)

Installation

PipelineDP without any frameworks:

pip install pipeline-dp

If you like to run PipelineDP on Apache Spark:

pip install pipeline-dp pyspark

on Apache Beam:

pip install pipeline-dp apache-beam.

Supported Python version >= 3.8.

Note for Apple Silicon users: PipelineDP pip package is currently available only for x86 architecture. The reason is that PyDP does not have pip pacakge. It might be possible to compile it from sources for Apple Silicon.

Development

To setup a local environment and contribute with the development of PipelineDP, please see our guidelines in CONTRIBUTING.

Support and Community on Slack

If you have questions about the PipelineDP, join OpenMined's Slack and check the #differential-privacy channel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PipelineDP

Getting started

Installation

Development

Support and Community on Slack

About

Releases 23

Sponsor this project

Packages

Contributors 33

Languages

License

OpenMined/PipelineDP

Folders and files

Latest commit

History

Repository files navigation

PipelineDP

Getting started

Installation

Development

Support and Community on Slack

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 23

Sponsor this project

Packages 0

Contributors 33

Languages

Packages