Skip to content

Commit

Permalink
fixing bv mistake
Browse files Browse the repository at this point in the history
  • Loading branch information
briandalessandro committed Oct 10, 2019
1 parent ecefb2c commit 3de370e
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions ipython/python35/Lecture_BiasVariance_3.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>This plot should confirm several of our expectations. First, when the complexity of our models increase (measured by the degree of the polynomial), model variance increases. Also, variance decreases as we increase the sample size. Beyond that, note the shape of the curves in the top plot above (for any model degree). We can see that doubling the data (each tick on the x-axis represents a doubling) does not halve the variance. For instance, with $d = 4$ (our one unbiased model family), going from $2^{10}$ to $2^{11}$ records drops the variance around 50%. The bottom chart shows both the sample size and model on the log2 scale. We can see that relationship on the log-log scale is nearly perfectly linear. Every unit increase in log2 scale decreases the log2 of the variance by a constant amount, such that: $log_2(\\hat{\\sigma_m}^2) = -log_2(N) + c$. With a little algebra we get to $\\hat{\\sigma_m}^2 \\propto N^{-1}$. Here we have shown empirically what is well known analytically - that model variance reduces at the rate of $O(N^{-1})$. For reasons that we'll discuss later, investing in more and more data to reduce the variance doesn't always pay off (in terms of reducing error to balance out the additional data cost).\n",
"<p>This plot should confirm several of our expectations. First, when the complexity of our models increase (measured by the degree of the polynomial), model variance increases. Also, variance decreases as we increase the sample size. Beyond that, note the shape of the curves in the top plot above (for any model degree). We can see that doubling the data (each tick on the x-axis represents a doubling) cuts the variance in half. For instance, with $d = 4$ (our one unbiased model family), going from $2^{10}$ to $2^{11}$ records drops the variance around 50%. The bottom chart shows both the sample size and model on the log2 scale. We can see that relationship on the log-log scale is nearly perfectly linear. Every unit increase in log2 scale decreases the log2 of the variance by a constant amount, such that: $log_2(\\hat{\\sigma_m}^2) = -log_2(N) + c$. With a little algebra we get to $\\hat{\\sigma_m}^2 \\propto N^{-1}$. Here we have shown empirically what is well known analytically - that model variance reduces at the rate of $O(N^{-1})$. For reasons that we'll discuss later, investing in more and more data to reduce the variance doesn't always pay off (in terms of reducing error to balance out the additional data cost).\n",
"</p>\n",
"\n",
"## Model Error, Bias and Variance\n",
Expand Down Expand Up @@ -289,8 +289,8 @@
"\n",
"<p>Understanding the bias-variance decomposition isn't just a theoretical exercise. Quite the opposite, it is the most fundamental principal that governs model generalizability and its lessons are directly applicable to empirical problems. Here are some key takeaways that we can highlight based on the above analysis that can be directly applied to our work. <br><br>\n",
"\n",
"<b>More data is always better:</b><br><br>\n",
"This is probably an obvious statement based on everything we hear about \"Big Data,\" but now we have a framework to understand why. Having more data examples reduces model estimation variance at the rate of $O(N^{-1})$, which all else being fixed (remember, more data doesn't change the bias or irreducible error given a fixed model family), reduces the total error. An important caveat is that the value of this variance reduction might not be proportional to the costs associated with increased data. This is a very problem specific trade-off that should at least always be considered (remember, data costs can be in actual economic currency or CPUs). \n",
"<b>More data should not hurt, and is likely better:</b><br><br>\n",
"This is probably an obvious statement based on everything we hear about \"Big Data,\" but now we have a framework to understand why. Having more data examples reduces model estimation variance at the rate of $O(N^{-1})$, which all else being fixed (remember, more data doesn't change the bias or irreducible error given a fixed model family), reduces the total error. An important caveat is that the value of this variance reduction might not be proportional to the costs associated with increased data. This is a very problem specific trade-off that should at least always be considered (remember, data costs can be in actual economic currency or CPUs). Also, having more data doesn't guarantee measurable results. There are always diminishing returns, so the improvement will depend on where you are on the sample size vs performance curve.\n",
"\n",
"<b>Model complexity is your friend, but it can stab you in the back if you are not careful:</b><br><br>\n",
"The bias part of the error decomposition is purely a function of your model specification and is independent of the data (specifically, the number of examples, not features, in your data). One can often get better results by using more complex (flexible) modeling algorithms. Example ways to add model complexity are: adding new features to the dataset, using less regularization in linear models, adding non-linear kernel functions to SVMs, using Neural Networks or Decision Tree based methods over linear methods. <br>\n",
Expand Down Expand Up @@ -395,21 +395,21 @@
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python [Root]",
"language": "python",
"name": "python3"
"name": "Python [Root]"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"pygments_lexer": "ipython2",
"version": "2.7.12"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 3de370e

Please sign in to comment.