Skip to content

Commit

Permalink
Fix LaTeX code for ":="
Browse files Browse the repository at this point in the history
  • Loading branch information
nigoshh authored Jun 22, 2021
1 parent 954a6b1 commit 4a49d57
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions privacy-accuracy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@
"source": [
"# Modeling the relationship between privacy and accuracy\n",
"\n",
"We adapt an economic model from the paper [Differential Privacy: An Economic Method for Choosing Epsilon](https://doi.org/10.1109/CSF.2014.35) (2014) by Hsu et al. Using the same notation as in the paper, we choose $T := 0.01$ as the desired error, $B := 3000$ as the budget, and $E := 12.5$ as the maximum expected cost per individual; the value for $E$ is the same as in the educational data example in the paper. Unlike in the paper, we cap the number of participants. That is, we assume that there is a maximum number of participants, $N_{\\max} := 10^6$. This is a sensible assumption, since there is a finite number of people on the planet, and possibly only some of them are viable candidates for the study. Using Equation (4) from the paper we then define $N(\\varepsilon) := \\min\\bigl\\{\\lfloor BE^{-1}(e^\\varepsilon - 1)^{-1}\\rfloor,\\ N_{\\max}\\bigr\\}$, the number of participants in the study, as a function of $\\varepsilon$. We use the floor function because the number of participants must be an integer. For the purpose of our analysis, it would not matter if the floor function was omitted; in fact, they do not use it in the paper. Like in the paper, $A(\\varepsilon, N) := 2\\exp\\bigl(-\\frac{NT^2}{12}\\bigr) + \\exp\\bigl(-\\frac{TN\\varepsilon}{2}\\bigr)$ is an upper bound to the probability of exceeding the error guarantee $T$. In the paper, $A(\\varepsilon, N)$ is taken as the accuracy. But here we consider accuracy as something to maximize, not minimize. Therefore, first we clip $A(\\varepsilon, N)$ at $1$, since upper bounds on probabilities are only useful if they are smaller than $1$. Then we subtract this value from $1$. That is, we define accuracy as $1 - \\min\\bigl\\{A\\bigl(\\varepsilon, N(\\varepsilon)\\bigr),\\ 1\\bigr\\}$, a function of $\\varepsilon$. We also plot $\\frac{N(\\varepsilon)}{N_{\\max}}$ as a function of $\\varepsilon$. In the plot, the y axis is scaled; that is, we multiply quantities on the y axis by $100$. We do this because we obtain percentages, and most people are more familiar with percentages.\n",
"We adapt an economic model from the paper [Differential Privacy: An Economic Method for Choosing Epsilon](https://doi.org/10.1109/CSF.2014.35) (2014) by Hsu et al. Using the same notation as in the paper, we choose $T \mathrel{\mathop:}= 0.01$ as the desired error, $B \mathrel{\mathop:}= 3000$ as the budget, and $E \mathrel{\mathop:}= 12.5$ as the maximum expected cost per individual; the value for $E$ is the same as in the educational data example in the paper. Unlike in the paper, we cap the number of participants. That is, we assume that there is a maximum number of participants, $N_{\\max} \mathrel{\mathop:}= 10^6$. This is a sensible assumption, since there is a finite number of people on the planet, and possibly only some of them are viable candidates for the study. Using Equation (4) from the paper we then define $N(\\varepsilon) \mathrel{\mathop:}= \\min\\bigl\\{\\lfloor BE^{-1}(e^\\varepsilon - 1)^{-1}\\rfloor,\\ N_{\\max}\\bigr\\}$, the number of participants in the study, as a function of $\\varepsilon$. We use the floor function because the number of participants must be an integer. For the purpose of our analysis, it would not matter if the floor function was omitted; in fact, they do not use it in the paper. Like in the paper, $A(\\varepsilon, N) \mathrel{\mathop:}= 2\\exp\\bigl(-\\frac{NT^2}{12}\\bigr) + \\exp\\bigl(-\\frac{TN\\varepsilon}{2}\\bigr)$ is an upper bound to the probability of exceeding the error guarantee $T$. In the paper, $A(\\varepsilon, N)$ is taken as the accuracy. But here we consider accuracy as something to maximize, not minimize. Therefore, first we clip $A(\\varepsilon, N)$ at $1$, since upper bounds on probabilities are only useful if they are smaller than $1$. Then we subtract this value from $1$. That is, we define accuracy as $1 - \\min\\bigl\\{A\\bigl(\\varepsilon, N(\\varepsilon)\\bigr),\\ 1\\bigr\\}$, a function of $\\varepsilon$. We also plot $\\frac{N(\\varepsilon)}{N_{\\max}}$ as a function of $\\varepsilon$. In the plot, the y axis is scaled; that is, we multiply quantities on the y axis by $100$. We do this because we obtain percentages, and most people are more familiar with percentages.\n",
"\n",
"From the plot we see that accuracy is not a monotonically increasing function of $\\varepsilon$; the highest values for accuracy are found in the central part of the plot. Let us think about what happens, going from the rightmost end of the plot towards the left. At first, on the right, higher values of $\\varepsilon$ mean that we do not offer much privacy. As a result, we have to pay a lot of money to convince people to participate. With our fixed budget $B$ we can get only a very small number of participants, so our study is very inaccurate. Going towards the left, as $\\varepsilon$ gets smaller, privacy improves, so people agree to participate for a smaller fee. Then with our fixed budget $B$ we can get more and more participants, and accuracy increases as a result. After we reach the maximum number of participants, decreasing $\\varepsilon$ cannot get us any more participants. Going towards the left, we offer stronger and stronger privacy to the participants we already have. Then the privacy-accuracy tradeoff intrinsic to differential privacy kicks in: accuracy decreases towards zero as $\\varepsilon$ decreases towards zero.\n",
"\n",
"We also calculate accuracy in a completely non-private scenario, modeled as in the same paper referenced above. We take $W := 12500$ as the worst-case cost. The value for $W$ is the same as in the educational data example in the paper. Note that we used the educational data example also in our private scenario. We are being consistent, as if the situation we are in does not change; the only thing that changes is the way we conduct the study: privately (with different values for $\\varepsilon$) or non-privately. Then there's $\\phi$, the estimated fraction of participant whose data will be exposed as a result of the non-private study. We choose $\\phi := 7 \\cdot 10^{-5}$. The cost is now $\\phi W$ per participant, so substituting that into Equation (4) in the paper we obtain $N_{\\mathrm{np}} := B\\bigl(\\phi W\\bigr)^{-1}$, where the subscript $\\mathrm{np}$ stands for “non-private”. Using the same Chernoff bound as in the paper, we obtain $A_{\\mathrm{np}} := 2 \\exp\\bigl(-\\frac{N_{\\mathrm{np}}T^2}{3}\\bigr)$ as an upper bound to the probability of exceeding the error guarantee $T$. Like in the private case, we consider accuracy as something to maximize; we clip $A_{\\mathrm{np}}$ at $1$, since upper bounds on probabilities are only useful if they are less than $1$, and we subtract this value from $1$. That is, we define accuracy as $1 - \\min\\bigl\\{A_{\\mathrm{np}},\\ 1\\bigr\\}$. Again, we multiply this by $100$ to obtain a percentage.\n",
"We also calculate accuracy in a completely non-private scenario, modeled as in the same paper referenced above. We take $W \mathrel{\mathop:}= 12500$ as the worst-case cost. The value for $W$ is the same as in the educational data example in the paper. Note that we used the educational data example also in our private scenario. We are being consistent, as if the situation we are in does not change; the only thing that changes is the way we conduct the study: privately (with different values for $\\varepsilon$) or non-privately. Then there's $\\phi$, the estimated fraction of participant whose data will be exposed as a result of the non-private study. We choose $\\phi \mathrel{\mathop:}= 7 \\cdot 10^{-5}$. The cost is now $\\phi W$ per participant, so substituting that into Equation (4) in the paper we obtain $N_{\\mathrm{np}} \mathrel{\mathop:}= B\\bigl(\\phi W\\bigr)^{-1}$, where the subscript $\\mathrm{np}$ stands for “non-private”. Using the same Chernoff bound as in the paper, we obtain $A_{\\mathrm{np}} \mathrel{\mathop:}= 2 \\exp\\bigl(-\\frac{N_{\\mathrm{np}}T^2}{3}\\bigr)$ as an upper bound to the probability of exceeding the error guarantee $T$. Like in the private case, we consider accuracy as something to maximize; we clip $A_{\\mathrm{np}}$ at $1$, since upper bounds on probabilities are only useful if they are less than $1$, and we subtract this value from $1$. That is, we define accuracy as $1 - \\min\\bigl\\{A_{\\mathrm{np}},\\ 1\\bigr\\}$. Again, we multiply this by $100$ to obtain a percentage.\n",
"\n",
"The fact that we get low accuracy (less than $40\\%$) shows that there are very plausible cases where a private study can be more accurate than its non-private version. We note that changing the value of $\\phi$ influences accuracy very much. But as $\\phi$ grows smaller, accuracy increases. The value we choose for $\\phi$ is smaller than the value in the paper; using the value from the paper we would get even lower accuracy."
]
Expand Down

0 comments on commit 4a49d57

Please sign in to comment.