Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/statinf' into statinf
Browse files Browse the repository at this point in the history
  • Loading branch information
WilCrofter committed Nov 5, 2014
2 parents f5484e0 + feac301 commit 2477911
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 11 deletions.
13 changes: 8 additions & 5 deletions Statistical_Inference/CommonDistros/lesson
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
Version: 2.2.0

- Class: text
Output: "Common Distributions. (Slides for this and other Data Science courses may be found at github https://github.com/DataScienceSpecialization/courses/. If you care to use them, they must be downloaded as a zip file and viewed locally. This lesson corresponds to 06_Statistical_Inference/06_Conditional_Probability.)"
Output: "Common Distributions. (Slides for this and other Data Science courses may be found at github https://github.com/DataScienceSpecialization/courses/. If you care to use them, they must be downloaded as a zip file and viewed locally. This lesson corresponds to 06_Statistical_Inference/06_CommonDistros.)"

- Class: mult_question
Output: Given the title of this lesson, what do you think it will cover?
Expand All @@ -17,7 +17,7 @@
Hint: Part of the title is an abbreviation for another word we've seen several times in earlier lessons.

- Class: text
Output: The first distribution we'll examine is the Bernoulli and it's associated with experiments which have only 2 possible outcomes. These are also called (by people in the know) binary trials.
Output: The first distribution we'll examine is the Bernoulli which is associated with experiments which have only 2 possible outcomes. These are also called (by people in the know) binary trials.

- Class: mult_question
Output: It might surprise you to learn that you've probably had experience with Bernoulli trials. Which of the following would be a Bernoulli trial?
Expand All @@ -40,7 +40,7 @@
AnswerTests: omnitest(correctVal='p^x * (1-p)^(1-x)')
Hint: When x=1, which of the given expressions yields p?

- Class: cmd_question
- Class: mult_question
Output: Recall the definition of the expectation of a random variable. Suppose we have a Bernoulli random variable and, as before, the probability it equals 1 (a success) is p and probability it equals 0 (a failure) is 1-p. What is its mean?
AnswerChoices: p; 1-p; p^2; p(1-p)
CorrectAnswer: p
Expand Down Expand Up @@ -140,7 +140,7 @@
Output: The converse is also true. If Z is standard normal, i.e., Z ~ N(0,1), then the random variable X defined as X = mu + sigma*Z is normally distributed with mean mu and variance sigma^2, i.e., X ~ N(mu, sigma^2)

- Class: text
Output: These formulae allow you to easily compute quantiles (and thus percentiles) for ANY normally distributed variable if you know its mean and variance. We'll show how to find the 97.5th percentile of a normal distribution with mean 2 and variance 4.
Output: These formulae allow you to easily compute quantiles (and thus percentiles) for ANY normally distributed variable if you know its mean and variance. We'll show how to find the 97.5th percentile of a normal distribution with mean 3 and variance 4.

- Class: cmd_question
Output: Again, we can use R's qnorm function and simply specify the mean and standard deviation (the square root of the variance). Do this now. Find the 97.5th percentile of a normal distribution with mean 3 and standard deviation 2.
Expand Down Expand Up @@ -215,11 +215,14 @@
Hint: Type pbinom(5,1000,.01) at the R prompt.

- Class: cmd_question
Output: Now use the function ppois with lambda equal to n*p to see if you get a similar result.
Output: Now use the function ppois with quantile equal to 5 and lambda equal to n*p to see if you get a similar result.
CorrectAnswer: ppois(5,1000*.01)
AnswerTests: omnitest(correctExpr='ppois(5,1000*.01)')
Hint: Type ppois(5,1000*.01) at the R prompt.

- Class: text
Output: See how they're close? Pretty cool, right? This worked because n was large (1000) and p was small (.01).

- Class: text
Output: Congrats! You've concluded this uncommon lesson on common distributions.

15 changes: 9 additions & 6 deletions Statistical_Inference/ConditionalProbability/lesson
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
Output: P(B|A) = P(B&A)/P(A) = P(A|B) * P(B)/P(A). This is a simple form of Bayes' Rule which relates the two conditional probabilities.

- Class: text
Output: Suppose we don't know P(A) itself, but only know its conditional probabilities, that is, the probability that it occurs if B occurs and the probability that it occurs if B doesn't occur. These are P(A|B) and P(A|~B), respectively. We use ~B to represent not B or B complement.
Output: Suppose we don't know P(A) itself, but only know its conditional probabilities, that is, the probability that it occurs if B occurs and the probability that it occurs if B doesn't occur. These are P(A|B) and P(A|~B), respectively. We use ~B to represent 'not B' or 'B complement'.

- Class: text
Output: We can then express P(A) = P(A|B) * P(B) + P(A|~B) * P(~B) and substitute this is into the denominator of Bayes' Formula.
Expand All @@ -71,7 +71,7 @@
Output: Suppose we know the accuracy rates of the test for both the positive case (positive result when the patient has HIV) and negative (negative test result when the patient doesn't have HIV). These are referred to as test sensitivity and specificity, respectively.

- Class: mult_question
Output: Let 'D' be the event that the patient has HIV, and let '+' indicate a positive test result and '-' a negative. What information do we know?
Output: Let 'D' be the event that the patient has HIV, and let '+' indicate a positive test result and '-' a negative. What information do we know? Recall that we know the accuracy rates of the HIV test.
AnswerChoices: P(+|D) and P(-|~D); P(+|~D) and P(-|~D); P(+|~D) and P(-|D); P(+|D) and P(-|D)
CorrectAnswer: P(+|D) and P(-|~D)
AnswerTests: omnitest(correctVal='P(+|D) and P(-|~D)')
Expand All @@ -91,7 +91,7 @@
Output: We can use the prevalence of HIV in the patient's population as the value for P(D). Note that since P(~D)=1-P(D) and P(+|~D) = 1-P(-|~D) we can calculate P(D|+). In other words, we know values for all the terms on the right side of the equation. Let's do it!

- Class: cmd_question
Output: Disease prevalence is .001. Test sensitivity is 99.7% and specificity is 98.5%. First compute the numerator, P(+|D)*P(D). (This is also part of the denominator.)
Output: Disease prevalence is .001. Test sensitivity (+ result with disease) is 99.7% and specificity (- result without disease) is 98.5%. First compute the numerator, P(+|D)*P(D). (This is also part of the denominator.)
CorrectAnswer: .997*.001
AnswerTests: ANY_of_exprs('.997*.001','.001*.997')
Hint: Multiply the test sensitivity by the prevalence.
Expand All @@ -103,7 +103,7 @@
Hint: Multiply the complement of test specificity by the complement of prevalence.

- Class: cmd_question
Output: Now put the pieces together to compute the probability that the patient has the disease given his positive test result.
Output: Now put the pieces together to compute the probability that the patient has the disease given his positive test result, P(D|+). Plug your last two answers into the formula P(+|D) * P(D) / ( P(+|D) * P(D) + P(+|~D) * P(~D) ) to compute P(D|+).
CorrectAnswer: .000997/(.000997+.014985)
AnswerTests: equiv_val(.06238268)
Hint: Divide (.997*.001) by (.997*.001 + .015*.999)
Expand Down Expand Up @@ -140,7 +140,7 @@
Output: Now to likelihood ratios. Recall Bayes Formula. P(D|+) = P(+|D) * P(D) / ( P(+|D) * P(D) + P(+|~D) * P(~D) ) and notice that if we replace all occurrences of 'D' with '~D', the denominator doesn't change. This means that if we formed a ratio of P(D|+) to P(~D|+) we'd get a much simpler expression (since the complicated denominators would cancel each other out). Like this....

- Class: text
Output: P(D|+) / P(~D|+) = P(+|D) * P(D) / P(+|~D) * P(~D) = P(+|D)/P(+|~D) * P(D)/P(~D).
Output: P(D|+) / P(~D|+) = P(+|D) * P(D) / (P(+|~D) * P(~D)) = P(+|D)/P(+|~D) * P(D)/P(~D).

- Class: mult_question
Output: The left side of the equation represents the post-test odds of disease given a positive test result. The equation says that the post-test odds of disease equals the pre-test odds of disease times
Expand All @@ -153,7 +153,10 @@
Output: In other words, a DLR_+ value equal to N indicates that the hypothesis of disease is N times more supported by the data than the hypothesis of no disease.

- Class: text
Output: Taking the formula above and replacing the '+' signs with '-' yields a formula with the DLR_-. Specifically, P(D|-) / P(~D|-) = P(-|D) * P(D) / P(-|~D) * P(~D). This relates the decrease in the odds of the disease post negative test result to the odds of disease pre-test. Remember that we showed that DLR_- is small.
Output: Taking the formula above and replacing the '+' signs with '-' yields a formula with the DLR_-. Specifically, P(D|-) / P(~D|-) = P(-|D) / P(-|~D) * P(D)/P(~D). This relates the DECREASE in the odds of the disease post negative test result to the odds of disease pre-test. Remember that we showed that DLR_- is small (less than 1).

- Class: text
Output: Let's wrap up now with some basics.

- Class: text
Output: Two events, A and B, are independent if they have no effect on each other. Formally, P(A&B) = P(A)*P(B). It's easy to see that if A and B are independent, then P(A|B)=P(A). The definition is similar for random variables X and Y.
Expand Down

0 comments on commit 2477911

Please sign in to comment.