clean up hypothesis testing

meklf · Jan 10, 2012 · d3d8424 · d3d8424
1 parent 63e8812
commit d3d8424
Showing 1 changed file with 37 additions and 31 deletions.
diff --git a/day3/hypothesis_testing.py b/day3/hypothesis_testing.py
@@ -24,20 +24,15 @@
 # contributions.  Statistical techniques will help us determine how
 # true this statement is.
 #   * Imagine two towns that only differ in
-# that one of the twons had "something in the water" the year a bunch
+# that one of the towns had "something in the water" the year a bunch
 # of kids were born.  Did that something in the water affect the
-# height of these kids?  (Note: this situation is not realistic.  it's
+# height of these kids?  (Note: This situation is unrealistic.  It's
 # never the case that the only difference between two communities is
 # the one you want to measure, but it's a nice goal!)  We'll use statistics
 # to determine whether the two communities have meaninfully different heights.
 #
 # <h3>Comparing Averages</h3>
 #
-# show averages for both height and campaign.  they are different.  how different?
-# visual approach: histogram.  impossible.  so show boxplot.  then show code for height boxplot.  then have them make campaign boxplot.
-# then there's a problem: how much overlap is good or bad?  need math for this!
-# t test
-#
 # Let's start by comparing a simple statistic, to see if in the data
 # we observe there's any difference.  We'll start by comparing the
 # average heights of the two towns.  (As an aside: it would help if
@@ -58,15 +53,15 @@
 
 # It looks like town 2's average height (6.35 feet) is higher than
 # town 1 (5.87 feet) by a difference of .479 feet.  This difference is
-# called the ** Effect size **.  Town 2 certainly looks taller than
+# called the ** effect size **.  Town 2 certainly looks taller than
 # Town 1!
-#
+
 # ** Exercise ** Compute the average campaign contribution for the
 # Obama and McCain campaigns from the dataset in day 1.  What's the
 # effect size?  We have an average contribution of $423 for McCain and
 # $192 for Obama, for an effect size of $231.  McCain appears, on
 # average, to have more giving donors.
-#
+
 # Before we fire up the presses on either of these stories, let's look
 # at the data in more depth.
 #
@@ -107,7 +102,7 @@
 #
 # Not bad!  The buckets are all exactly the same size except for one
 # person of height between 4 and 5 feet in town 1.
-#
+
 # ** Exercise ** Build a histogram for the Obama and McCain campaigns.
 # This is challenging, because there are a large number of outliers
 # that make the histograms difficult to compare.  Add the line
@@ -144,7 +139,7 @@
 # Let's interpret this plot.  We show town 1 on the left and town 2 on
 # the right.  Each town is represented by a box with a red line and
 # whiskers.
-
+#
 # * The red line in the box represents the ** median **, or
 # ** 50th percentile ** value of the distribution.  If we sort the
 # dataset, 50% of the values will be below this line, and 50% will be
@@ -154,7 +149,7 @@
 # represents the ** 75th percentile ** (the value larger than 75% of
 # your dataset).  The difference between the 75th and 25th percentile
 # is called the ** inner quartile range (IQR) **.
-#   * The whiskers represent the "extremes" of your dataset: the
+#   * The whiskers represent the "extremes" of our dataset: the
 #   largest value we're willing to consider in our dataset before
 #   calling it an outlier.  In our case, we set ** whis=1 **,
 #   requesting that we show whiskers the most extreme value at a
@@ -165,14 +160,14 @@
 # image](https://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.svg) might
 # help you interpret the box-and-whiskers plot.
 #
-# Again, we see that the towns' height distributions don't look all
+# Like in the histogram, we see that the towns' height distributions don't look all
 # that different from one-another.  Generally, if the boxes of each
 # distribution overlap, and you haven't taken something on the order
 # of a buttload (metric units) of measurements, you should doubt the
 # differenes in distribution averages.  It looks like a single height
 # measurement for town 1 is pretty far away from the others, and you
 # should investigate such measurements as potential outliers.
-#
+
 # ** Exercise ** Build a box-and-whiskers plot of the McCain and Obama
 # campaign contributions.  Again, outliers make this a difficult task.  With ** whis=1 **, and by setting the y range of the plots like so
 
@@ -182,8 +177,8 @@
 #
 # ![Boxplots of McCain and Obama 2008 campaign contributions](figures/mccain-obama-boxplot.png)
 #
-# Obama is on the left, and McCain on the right.  Man, real data sure
-# is more confusing than fake data.  Obama's box plot is a lot tighter
+# Obama is on the left, and McCain on the right.  Real data sure
+# is more confusing than fake data!  Obama's box plot is a lot tighter
 # than McCains, who has a larger spread of donation sizes.  Both of
 # Obama's whiskers are visible on this chart, whereas only the top
 # whisker of McCain's plot is visible.  Another feature we haven't
@@ -201,8 +196,10 @@
 #
 # We have two population height averages.  We know that they are
 # different, but charts show that overall the two towns look similar.
-# We have two contribution averages that are also different, but with
-# a murkier story after looking at our box-and-whisker plots.
+# We have two campaign contribution averages that are also different,
+# but with a murkier story after looking at our box-and-whisker plots.
+# How will we definitively say whether the differences we observe are
+# meaningful?
 #
 # In statistics, what we are asking is whether differences we observed
 # are reliable indicators of some trend, or just happened by lucky
@@ -212,10 +209,10 @@
 # reason, we stumbled upon the results we did by chance.
 #
 # There are several tests for statistical significance, each applying
-# to a different question.  Our question is: "Is the difference in
-# averages between the height of people in town 1 and town 2
+# to a different question.  Our question is: "Is the difference
+# between the average height of people in town 1 and town 2
 # statistically significant?"  We ask a similar question about the
-# difference in averages in campaign contributions.  The test that
+# difference in average campaign contributions.  The test that
 # answers this question is the
 # [T-Test](https://en.wikipedia.org/wiki/Student's_t-test).  There are
 # several flavors of T-Test and we will discuss these soon, but for
@@ -241,14 +238,14 @@
 # we might have reached significance.  But, given our current results,
 # let's not jump to conclusions.  After all, it was just food coloring
 # in the water!
-#
+
 # ** Exercise ** Run Welch's T-test on the campaign data.  Is the
 # effect size between McCain and Obama significant?  By our
 # measurements, the p-value reported is within rounding error of 0.
 # That's significant by anyone's measure: there's a near-nonexistant
 # chance we're seeing this difference between the candidates by some
 # random fluke in the universe.  Time to write an article!
-#
+
 # <h3>Can You Have a Very Significant Result?</h3>
 #
 # No.  There is no such thing as "very" or "almost" significant.
@@ -259,6 +256,16 @@
 # the observations we made happened by anything more than random
 # chance.  While people disagree about whether a p-value of .05 or .01
 # is required, they all agree that significance is a binary value.
+
+#
+# Strictly speaking, you've learned about T-Tests at this point.  If
+# you are pressed for time, read [Putting it all
+# Together](#alltogether) below and move on to the next section.  For
+# the overachievers in our midst, there's lots of important
+# information to follow, and you can instead keep reading until the
+# end.
+#
+
 #
 # <h3>Types of T-Test</h3> The T-Test has two major flavors: paired
 # and unpaired.
@@ -268,8 +275,8 @@
 # set of students on an exam before and after teaching them the course
 # content.  To use a paired T-Test, you have to be able to measure an
 # item twice, usually before and after some treatment.  This is the
-# ideal condition: by tracking each measurement in before/after
-# treatments, you control for other potential differences in the items
+# ideal condition: by having before and after measurements of a
+# treatment, you control for other potential differences in the items
 # you mentioned, like performance between students.
 #
 # Other times, you are measuring the difference between two sets of
@@ -344,17 +351,16 @@
 # our less conservative Welch's test was unable to give us
 # significance, so we don't expect a more conservative test to
 # magically find significance.
-#
+
 # ** Exercise ** since we shouldn't be using Welch's T-Test on the
 # campaign contribution data, run the Mann-Whitney U test on the data.
 # Is the difference between the Obama and McCain contributions still
 # significant?
-#
+
 # We got a p-value of about 0, so you will still find the result to be
 # statistically significant.  A+ for you!
 #
-# <h3>Putting it All Together</h3>
-#
+# <a name="alltogether"><h3>Putting it All Together</h3></a>
 # So far, we've learned the steps to test a hypothesis:
 #
 #  * Compute summary statistics, like averages or medians, and see if
@@ -371,4 +377,4 @@
 #
 #  There's a lot more to statistics than T-Tests, which compare two
 #  datasets' averages.  Next, we'll cover correlation between two
-#  datasets using linear regression.
+#  datasets using [linear regression](regression.html).