Merge remote-tracking branch 'origin/statinf' into statinf

leighrostron · Nov 5, 2014 · 2477911 · 2477911
2 parents f5484e0 + feac301
commit 2477911
Show file tree

Hide file tree

Showing 2 changed files with 17 additions and 11 deletions.
diff --git a/Statistical_Inference/CommonDistros/lesson b/Statistical_Inference/CommonDistros/lesson
@@ -7,7 +7,7 @@
   Version: 2.2.0
 
 - Class: text
-  Output: "Common Distributions. (Slides for this and other Data Science courses may be found at github https://github.com/DataScienceSpecialization/courses/. If you care to use them, they must be downloaded as a zip file and viewed locally. This lesson corresponds to 06_Statistical_Inference/06_Conditional_Probability.)"
+  Output: "Common Distributions. (Slides for this and other Data Science courses may be found at github https://github.com/DataScienceSpecialization/courses/. If you care to use them, they must be downloaded as a zip file and viewed locally. This lesson corresponds to 06_Statistical_Inference/06_CommonDistros.)"
 
 - Class: mult_question
   Output: Given the title of this lesson, what do you think it will cover?
@@ -17,7 +17,7 @@
   Hint: Part of the title is an abbreviation for another word we've seen several times in earlier lessons.
 
 - Class: text
-  Output: The first distribution we'll examine is the Bernoulli and it's associated with experiments which have only 2 possible outcomes. These are also called (by people in the know) binary trials.
+  Output: The first distribution we'll examine is the Bernoulli which is associated with experiments which have only 2 possible outcomes. These are also called (by people in the know) binary trials.
 
 - Class: mult_question
   Output: It might surprise you to learn that you've probably had experience with Bernoulli trials. Which of the following would be a Bernoulli trial?
@@ -40,7 +40,7 @@
   AnswerTests: omnitest(correctVal='p^x * (1-p)^(1-x)')
   Hint: When x=1, which of the given expressions yields p?
 
-- Class: cmd_question
+- Class: mult_question
   Output: Recall the definition of the expectation of a random variable. Suppose we have a Bernoulli random variable and, as before, the probability it equals 1 (a success) is p and probability it equals 0 (a failure) is 1-p. What is its mean?
   AnswerChoices: p; 1-p; p^2; p(1-p)
   CorrectAnswer: p
@@ -140,7 +140,7 @@
   Output: The converse is also true. If Z is standard normal, i.e., Z ~ N(0,1), then  the random variable X defined as  X = mu + sigma*Z is normally distributed with mean mu and variance sigma^2, i.e., X ~ N(mu, sigma^2) 
 
 - Class: text
-  Output: These formulae allow you to easily compute quantiles (and thus percentiles) for ANY normally distributed variable if you know its mean and variance. We'll show how to find the 97.5th percentile of a normal distribution with mean 2 and variance 4.
+  Output: These formulae allow you to easily compute quantiles (and thus percentiles) for ANY normally distributed variable if you know its mean and variance. We'll show how to find the 97.5th percentile of a normal distribution with mean 3 and variance 4.
 
 - Class: cmd_question
   Output: Again, we can use R's qnorm function and simply specify the mean and standard deviation (the square root of the variance). Do this now. Find the 97.5th percentile of a normal distribution with mean 3 and standard deviation 2.
@@ -215,11 +215,14 @@
   Hint: Type  pbinom(5,1000,.01)  at the R prompt.
 
 - Class: cmd_question
-  Output: Now use the function ppois with lambda equal to n*p to see if you get a similar result.
+  Output: Now use the function ppois with quantile equal to 5 and lambda equal to n*p to see if you get a similar result.
   CorrectAnswer: ppois(5,1000*.01)
   AnswerTests: omnitest(correctExpr='ppois(5,1000*.01)')
   Hint: Type  ppois(5,1000*.01)  at the R prompt.
 
+- Class: text
+  Output: See how they're close? Pretty cool, right? This worked because n was large (1000) and p was small (.01).
+
 - Class: text
   Output: Congrats! You've concluded this uncommon lesson on common distributions. 
 
diff --git a/Statistical_Inference/ConditionalProbability/lesson b/Statistical_Inference/ConditionalProbability/lesson
@@ -56,7 +56,7 @@
   Output: P(B|A) = P(B&A)/P(A) = P(A|B) * P(B)/P(A). This is a simple form of Bayes' Rule which relates the two conditional probabilities.
 
 - Class: text
-  Output: Suppose we don't know P(A) itself, but only know its conditional probabilities, that is, the probability that it occurs if B occurs and the probability that it occurs if B doesn't occur. These are  P(A|B) and P(A|~B), respectively. We use ~B to represent not B or B complement. 
+  Output: Suppose we don't know P(A) itself, but only know its conditional probabilities, that is, the probability that it occurs if B occurs and the probability that it occurs if B doesn't occur. These are  P(A|B) and P(A|~B), respectively. We use ~B to represent 'not B' or 'B complement'. 
 
 - Class: text
   Output: We can then express P(A) = P(A|B) * P(B) + P(A|~B) * P(~B) and substitute this is into the denominator of Bayes' Formula. 
@@ -71,7 +71,7 @@
   Output: Suppose we know the accuracy rates of the test for both the positive case (positive result when the patient has HIV) and negative (negative test result when the patient doesn't have HIV). These are referred to as test sensitivity and specificity, respectively.
 
 - Class: mult_question
-  Output: Let 'D' be the event that the patient has HIV, and let '+' indicate a positive test result and '-' a negative. What information do we know?
+  Output: Let 'D' be the event that the patient has HIV, and let '+' indicate a positive test result and '-' a negative. What information do we know? Recall that we know the accuracy rates of the HIV test.
   AnswerChoices: P(+|D) and P(-|~D); P(+|~D) and P(-|~D); P(+|~D) and P(-|D); P(+|D) and P(-|D)
   CorrectAnswer: P(+|D) and P(-|~D)
   AnswerTests: omnitest(correctVal='P(+|D) and P(-|~D)')
@@ -91,7 +91,7 @@
   Output: We can use the prevalence of HIV in the patient's population as the value for P(D). Note that since P(~D)=1-P(D) and  P(+|~D) = 1-P(-|~D) we can calculate P(D|+). In other words, we know values for all the terms on the right side of the equation. Let's do it!
 
 - Class: cmd_question
-  Output: Disease prevalence is .001. Test sensitivity is 99.7% and specificity is 98.5%. First compute the numerator, P(+|D)*P(D). (This is also part of the denominator.)
+  Output: Disease prevalence is .001. Test sensitivity (+ result with disease) is 99.7% and specificity (- result without disease) is 98.5%. First compute the numerator, P(+|D)*P(D). (This is also part of the denominator.)
   CorrectAnswer: .997*.001
   AnswerTests: ANY_of_exprs('.997*.001','.001*.997')
   Hint: Multiply the test sensitivity by the prevalence.
@@ -103,7 +103,7 @@
   Hint: Multiply the complement of test specificity by the complement of prevalence.
 
 - Class: cmd_question
-  Output: Now put the pieces together to compute the probability that the patient has the disease given his positive test result.
+  Output: Now put the pieces together to compute the probability that the patient has the disease given his positive test result, P(D|+). Plug your last two answers into the formula   P(+|D) * P(D) / ( P(+|D) * P(D) + P(+|~D) * P(~D) ) to compute P(D|+).
   CorrectAnswer: .000997/(.000997+.014985)
   AnswerTests: equiv_val(.06238268)
   Hint: Divide (.997*.001) by (.997*.001 + .015*.999)
@@ -140,7 +140,7 @@
   Output:  Now to likelihood ratios. Recall Bayes Formula. P(D|+) = P(+|D) * P(D) / ( P(+|D) * P(D) + P(+|~D) * P(~D) ) and notice that if we replace all occurrences of 'D' with '~D', the denominator doesn't change. This means that if we formed a ratio of P(D|+) to P(~D|+) we'd get a much simpler expression (since the complicated denominators would cancel each other out). Like this....
 
 - Class: text
-  Output:   P(D|+) / P(~D|+) = P(+|D) * P(D) / P(+|~D) * P(~D) = P(+|D)/P(+|~D) * P(D)/P(~D).
+  Output:   P(D|+) / P(~D|+) = P(+|D) * P(D) / (P(+|~D) * P(~D)) = P(+|D)/P(+|~D) * P(D)/P(~D).
 
 - Class: mult_question
   Output: The left side of the equation represents the post-test odds of disease given a positive test result. The equation says that the post-test odds of disease equals the pre-test odds of disease times
@@ -153,7 +153,10 @@
   Output: In other words, a DLR_+ value equal to N indicates that the hypothesis of disease is N times more supported by the data than the hypothesis of no disease.
 
 - Class: text
-  Output:  Taking the formula above and replacing the '+' signs with '-' yields a formula with the DLR_-. Specifically, P(D|-) / P(~D|-) = P(-|D) * P(D) / P(-|~D) * P(~D). This relates the decrease in the odds of the disease post negative test result to the odds of disease pre-test. Remember that we showed that DLR_- is small.
+  Output:  Taking the formula above and replacing the '+' signs with '-' yields a formula with the DLR_-. Specifically, P(D|-) / P(~D|-) = P(-|D) / P(-|~D) *  P(D)/P(~D). This relates the DECREASE in the odds of the disease post negative test result to the odds of disease pre-test. Remember that we showed that DLR_- is small (less than 1).
+
+- Class: text
+  Output: Let's wrap up now with some basics. 
 
 - Class: text
   Output: Two events, A and B, are independent if they have no effect on each other. Formally, P(A&B) = P(A)*P(B). It's easy to see that if A and B are independent, then P(A|B)=P(A). The definition is similar for random variables X and Y.