Skip to content

Commit

Permalink
clean up hypothesis testing
Browse files Browse the repository at this point in the history
  • Loading branch information
marcua committed Jan 10, 2012
1 parent 63e8812 commit d3d8424
Showing 1 changed file with 37 additions and 31 deletions.
68 changes: 37 additions & 31 deletions day3/hypothesis_testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,20 +24,15 @@
# contributions. Statistical techniques will help us determine how
# true this statement is.
# * Imagine two towns that only differ in
# that one of the twons had "something in the water" the year a bunch
# that one of the towns had "something in the water" the year a bunch
# of kids were born. Did that something in the water affect the
# height of these kids? (Note: this situation is not realistic. it's
# height of these kids? (Note: This situation is unrealistic. It's
# never the case that the only difference between two communities is
# the one you want to measure, but it's a nice goal!) We'll use statistics
# to determine whether the two communities have meaninfully different heights.
#
# <h3>Comparing Averages</h3>
#
# show averages for both height and campaign. they are different. how different?
# visual approach: histogram. impossible. so show boxplot. then show code for height boxplot. then have them make campaign boxplot.
# then there's a problem: how much overlap is good or bad? need math for this!
# t test
#
# Let's start by comparing a simple statistic, to see if in the data
# we observe there's any difference. We'll start by comparing the
# average heights of the two towns. (As an aside: it would help if
Expand All @@ -58,15 +53,15 @@

# It looks like town 2's average height (6.35 feet) is higher than
# town 1 (5.87 feet) by a difference of .479 feet. This difference is
# called the ** Effect size **. Town 2 certainly looks taller than
# called the ** effect size **. Town 2 certainly looks taller than
# Town 1!
#

# ** Exercise ** Compute the average campaign contribution for the
# Obama and McCain campaigns from the dataset in day 1. What's the
# effect size? We have an average contribution of $423 for McCain and
# $192 for Obama, for an effect size of $231. McCain appears, on
# average, to have more giving donors.
#

# Before we fire up the presses on either of these stories, let's look
# at the data in more depth.
#
Expand Down Expand Up @@ -107,7 +102,7 @@
#
# Not bad! The buckets are all exactly the same size except for one
# person of height between 4 and 5 feet in town 1.
#

# ** Exercise ** Build a histogram for the Obama and McCain campaigns.
# This is challenging, because there are a large number of outliers
# that make the histograms difficult to compare. Add the line
Expand Down Expand Up @@ -144,7 +139,7 @@
# Let's interpret this plot. We show town 1 on the left and town 2 on
# the right. Each town is represented by a box with a red line and
# whiskers.

#
# * The red line in the box represents the ** median **, or
# ** 50th percentile ** value of the distribution. If we sort the
# dataset, 50% of the values will be below this line, and 50% will be
Expand All @@ -154,7 +149,7 @@
# represents the ** 75th percentile ** (the value larger than 75% of
# your dataset). The difference between the 75th and 25th percentile
# is called the ** inner quartile range (IQR) **.
# * The whiskers represent the "extremes" of your dataset: the
# * The whiskers represent the "extremes" of our dataset: the
# largest value we're willing to consider in our dataset before
# calling it an outlier. In our case, we set ** whis=1 **,
# requesting that we show whiskers the most extreme value at a
Expand All @@ -165,14 +160,14 @@
# image](https://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.svg) might
# help you interpret the box-and-whiskers plot.
#
# Again, we see that the towns' height distributions don't look all
# Like in the histogram, we see that the towns' height distributions don't look all
# that different from one-another. Generally, if the boxes of each
# distribution overlap, and you haven't taken something on the order
# of a buttload (metric units) of measurements, you should doubt the
# differenes in distribution averages. It looks like a single height
# measurement for town 1 is pretty far away from the others, and you
# should investigate such measurements as potential outliers.
#

# ** Exercise ** Build a box-and-whiskers plot of the McCain and Obama
# campaign contributions. Again, outliers make this a difficult task. With ** whis=1 **, and by setting the y range of the plots like so

Expand All @@ -182,8 +177,8 @@
#
# ![Boxplots of McCain and Obama 2008 campaign contributions](figures/mccain-obama-boxplot.png)
#
# Obama is on the left, and McCain on the right. Man, real data sure
# is more confusing than fake data. Obama's box plot is a lot tighter
# Obama is on the left, and McCain on the right. Real data sure
# is more confusing than fake data! Obama's box plot is a lot tighter
# than McCains, who has a larger spread of donation sizes. Both of
# Obama's whiskers are visible on this chart, whereas only the top
# whisker of McCain's plot is visible. Another feature we haven't
Expand All @@ -201,8 +196,10 @@
#
# We have two population height averages. We know that they are
# different, but charts show that overall the two towns look similar.
# We have two contribution averages that are also different, but with
# a murkier story after looking at our box-and-whisker plots.
# We have two campaign contribution averages that are also different,
# but with a murkier story after looking at our box-and-whisker plots.
# How will we definitively say whether the differences we observe are
# meaningful?
#
# In statistics, what we are asking is whether differences we observed
# are reliable indicators of some trend, or just happened by lucky
Expand All @@ -212,10 +209,10 @@
# reason, we stumbled upon the results we did by chance.
#
# There are several tests for statistical significance, each applying
# to a different question. Our question is: "Is the difference in
# averages between the height of people in town 1 and town 2
# to a different question. Our question is: "Is the difference
# between the average height of people in town 1 and town 2
# statistically significant?" We ask a similar question about the
# difference in averages in campaign contributions. The test that
# difference in average campaign contributions. The test that
# answers this question is the
# [T-Test](https://en.wikipedia.org/wiki/Student's_t-test). There are
# several flavors of T-Test and we will discuss these soon, but for
Expand All @@ -241,14 +238,14 @@
# we might have reached significance. But, given our current results,
# let's not jump to conclusions. After all, it was just food coloring
# in the water!
#

# ** Exercise ** Run Welch's T-test on the campaign data. Is the
# effect size between McCain and Obama significant? By our
# measurements, the p-value reported is within rounding error of 0.
# That's significant by anyone's measure: there's a near-nonexistant
# chance we're seeing this difference between the candidates by some
# random fluke in the universe. Time to write an article!
#

# <h3>Can You Have a Very Significant Result?</h3>
#
# No. There is no such thing as "very" or "almost" significant.
Expand All @@ -259,6 +256,16 @@
# the observations we made happened by anything more than random
# chance. While people disagree about whether a p-value of .05 or .01
# is required, they all agree that significance is a binary value.

#
# Strictly speaking, you've learned about T-Tests at this point. If
# you are pressed for time, read [Putting it all
# Together](#alltogether) below and move on to the next section. For
# the overachievers in our midst, there's lots of important
# information to follow, and you can instead keep reading until the
# end.
#

#
# <h3>Types of T-Test</h3> The T-Test has two major flavors: paired
# and unpaired.
Expand All @@ -268,8 +275,8 @@
# set of students on an exam before and after teaching them the course
# content. To use a paired T-Test, you have to be able to measure an
# item twice, usually before and after some treatment. This is the
# ideal condition: by tracking each measurement in before/after
# treatments, you control for other potential differences in the items
# ideal condition: by having before and after measurements of a
# treatment, you control for other potential differences in the items
# you mentioned, like performance between students.
#
# Other times, you are measuring the difference between two sets of
Expand Down Expand Up @@ -344,17 +351,16 @@
# our less conservative Welch's test was unable to give us
# significance, so we don't expect a more conservative test to
# magically find significance.
#

# ** Exercise ** since we shouldn't be using Welch's T-Test on the
# campaign contribution data, run the Mann-Whitney U test on the data.
# Is the difference between the Obama and McCain contributions still
# significant?
#

# We got a p-value of about 0, so you will still find the result to be
# statistically significant. A+ for you!
#
# <h3>Putting it All Together</h3>
#
# <a name="alltogether"><h3>Putting it All Together</h3></a>
# So far, we've learned the steps to test a hypothesis:
#
# * Compute summary statistics, like averages or medians, and see if
Expand All @@ -371,4 +377,4 @@
#
# There's a lot more to statistics than T-Tests, which compare two
# datasets' averages. Next, we'll cover correlation between two
# datasets using linear regression.
# datasets using [linear regression](regression.html).

0 comments on commit d3d8424

Please sign in to comment.