minor conflict (extra space)

Merge remote-tracking branch 'origin/statinf' into statinf Conflicts: Regression_Models/Least_Squares_Estimation/initLesson.R
leighrostron · Nov 3, 2014 · a41091f · a41091f
2 parents c6fbe18 + 56c4d77
commit a41091f
Show file tree

Hide file tree

Showing 93 changed files with 6,865 additions and 213 deletions.
diff --git a/Getting_and_Cleaning_Data/Manipulating_Data_with_dplyr/lesson.yaml b/Getting_and_Cleaning_Data/Manipulating_Data_with_dplyr/lesson.yaml
@@ -40,13 +40,13 @@
   Hint: Use library(dplyr) to load the dplyr package.
 
 - Class: cmd_question
-  Output: It's important that you have dplyr version 0.2 or later. To confirm this, type packageVersion("dplyr").
+  Output: It's important that you have dplyr version 0.3 or later. To confirm this, type packageVersion("dplyr").
   CorrectAnswer: packageVersion("dplyr")
   AnswerTests: omnitest(correctExpr='packageVersion("dplyr")')
   Hint: Check what version of dplyr you have with packageVersion("dplyr").
 
 - Class: text
-  Output: If your dplyr version is not at least 0.2, then you should hit the Esc key now, reinstall dplyr, then resume this lesson where you left off.
+  Output: If your dplyr version is not at least 0.3, then you should hit the Esc key now, reinstall dplyr, then resume this lesson where you left off.
 
 - Class: cmd_question
   Output: "The first step of working with data in dplyr is to load the data into what the package authors call a 'data frame tbl' or 'tbl_df'. Use the following code to create a new tbl_df called cran: \n\ncran <- tbl_df(mydf)."
@@ -76,10 +76,13 @@
   Output: 'According to the "Introduction to dplyr" vignette written by the package authors, "The dplyr philosophy is to have small functions that each do one thing well." Specifically, dplyr supplies five ''verbs'' that cover all fundamental data manipulation tasks: select(), filter(), arrange(), mutate(), and summarize().'
 
 - Class: cmd_question
-  Output: Use ?manip to pull up the documentation for these core functions.
-  CorrectAnswer: ?manip
-  AnswerTests: omnitest(correctExpr='?manip')
-  Hint: ?manip will display the documentation for dplyr's five core data manipulation functions.
+  Output: Use ?select to pull up the documentation for the first these core functions.
+  CorrectAnswer: ?select
+  AnswerTests: omnitest(correctExpr='?select')
+  Hint: ?select will display the documentation for dplyr's select() function.
+
+- Class: text
+  Output: Help files for the other functions are accessible in the same way.
 
 - Class: cmd_question
   Output: As may often be the case, particularly with larger datasets, we are only interested in some of the variables. Use select(cran, ip_id, package, country) to select only the ip_id, package, and country variables from the cran dataset.

diff --git a/R_Programming/Logic/lesson.yaml b/R_Programming/Logic/lesson.yaml
diff --git a/R_Programming/MANIFEST b/R_Programming/MANIFEST
@@ -4,6 +4,7 @@ Vectors
 Missing_Values
 Subsetting_Vectors
 Matrices_and_Data_Frames
+Logic
 lapply_and_sapply
 vapply_and_tapply
 Looking_at_Data

diff --git a/R_Programming/Sequences_of_Numbers/lesson.yaml b/R_Programming/Sequences_of_Numbers/lesson.yaml
@@ -80,7 +80,7 @@
 - Class: cmd_question
   Output: Or maybe we don't care what the increment is and we just want a sequence
     of 30 numbers between 5 and 10. seq(5, 10, length=30) does the trick. Give it
-    shot now and store the result in a new variable called my_seq.
+    a shot now and store the result in a new variable called my_seq.
   CorrectAnswer: my_seq <- seq(5, 10, length=30)
   AnswerTests: omnitest(correctExpr='my_seq <- seq(5, 10, length=30)')
   Hint: 'You''re using the same function here, but changing its arguments for different

diff --git a/R_Programming/Simulation/lesson.yaml b/R_Programming/Simulation/lesson.yaml
@@ -94,7 +94,7 @@
   Hint: Call rbinom() with n = 1, size = 100, and prob = 0.7.
 
 - Class: cmd_question
-  Output: Equivilently, if we want to see all of the 0s and 1s, we can request 100 observations, each of size 1, with success probability of 0.7. Give it a try, assigning the result to a new variable called flips2.
+  Output: Equivalently, if we want to see all of the 0s and 1s, we can request 100 observations, each of size 1, with success probability of 0.7. Give it a try, assigning the result to a new variable called flips2.
   CorrectAnswer: flips2 <- rbinom(100, size = 1, prob = 0.7)
   AnswerTests: match_call('flips2 <- rbinom(100, size = 1, prob = 0.7)')
   Hint: Call rbinom() with n = 100, size = 1, and prob = 0.7 and assign the result to flips2.

diff --git a/R_Programming/vapply_and_tapply/lesson.yaml b/R_Programming/vapply_and_tapply/lesson.yaml
@@ -97,7 +97,7 @@
   Hint: You can see a summary of populations for countries with and without the color red on their flag with tapply(flags$population, flags$red, summary).
 
 - Class: mult_question  
-  Output: What is the median population (in millions) for counties *without* the color red on their flag?
+  Output: What is the median population (in millions) for countries *without* the color red on their flag?
   AnswerChoices: 9.0; 4.0; 27.6; 3.0; 22.1; 0.0
   CorrectAnswer: 3.0
   AnswerTests: omnitest(correctVal= '3.0')

diff --git a/R_Programming_Alt/Logic/lesson.yaml b/R_Programming_Alt/Logic/lesson.yaml
diff --git a/R_Programming_Alt/MANIFEST b/R_Programming_Alt/MANIFEST
@@ -4,6 +4,7 @@ Vectors
 Missing_Values
 Subsetting_Vectors
 Matrices_and_Data_Frames
+Logic
 lapply_and_sapply
 vapply_and_tapply
 Looking_at_Data

diff --git a/R_Programming_Alt/Simulation/lesson.yaml b/R_Programming_Alt/Simulation/lesson.yaml
@@ -94,7 +94,7 @@
   Hint: Call rbinom() with n = 1, size = 100, and prob = 0.7.
 
 - Class: cmd_question
-  Output: Equivilently, if we want to see all of the 0s and 1s, we can request 100 observations, each of size 1, with success probability of 0.7. Give it a try, assigning the result to a new variable called flips2.
+  Output: Equivalently, if we want to see all of the 0s and 1s, we can request 100 observations, each of size 1, with success probability of 0.7. Give it a try, assigning the result to a new variable called flips2.
   CorrectAnswer: flips2 <- rbinom(100, size = 1, prob = 0.7)
   AnswerTests: match_call('flips2 <- rbinom(100, size = 1, prob = 0.7)')
   Hint: Call rbinom() with n = 100, size = 1, and prob = 0.7 and assign the result to flips2.

diff --git a/R_Programming_Alt/vapply_and_tapply/lesson.yaml b/R_Programming_Alt/vapply_and_tapply/lesson.yaml
@@ -97,7 +97,7 @@
   Hint: You can see a summary of populations for countries with and without the color red on their flag with tapply(flags$population, flags$red, summary).
 
 - Class: mult_question  
-  Output: What is the median population (in millions) for counties *without* the color red on their flag?
+  Output: What is the median population (in millions) for countries *without* the color red on their flag?
   AnswerChoices: 9.0; 4.0; 27.6; 3.0; 22.1; 0.0
   CorrectAnswer: 3.0
   AnswerTests: omnitest(correctVal= '3.0')

diff --git a/Regression_Models/Count_Outcomes/lesson.yaml b/Regression_Models/Count_Outcomes/lesson.yaml
@@ -34,7 +34,7 @@
   FigureType: new
 
 - Class: figure
-  Output: "In a Poisson regression, the log of lambda is assumed to be a linear function of the predictors. Since we will try to model the growth of visits to a web site, the log of lambda will be a linear function the date: log(lambda) = b0 + b1*date. This implies that the average number of hits per day, lambda, is exponential in the date: lambda = exp(b0)*exp(b1)^date. Exponential growth is also suggested by the smooth, black curve drawn though the data. Thus exp(b1) would represent the percentage by which visits grow per day."
+  Output: "In a Poisson regression, the log of lambda is assumed to be a linear function of the predictors. Since we will try to model the growth of visits to a web site, the log of lambda will be a linear function of the date: log(lambda) = b0 + b1*date. This implies that the average number of hits per day, lambda, is exponential in the date: lambda = exp(b0)*exp(b1)^date. Exponential growth is also suggested by the smooth, black curve drawn though the data. Thus exp(b1) would represent the percentage by which visits grow per day."
   Figure: hits.R
   FigureType: new
 
@@ -103,7 +103,7 @@
   FigureType: new
 
 - Class: cmd_question
-  Output: "In the figure, the maximum number of visits occurred in late 2012. Visits from the Simply Statistics blog were also at their maximum that day. To find the exact date we can use which.max(hits[,'visits']. Do this now."
+  Output: "In the figure, the maximum number of visits occurred in late 2012. Visits from the Simply Statistics blog were also at their maximum that day. To find the exact date we can use which.max(hits[,'visits']). Do this now."
   CorrectAnswer: which.max(hits[,'visits'])
   AnswerTests: omnitest("which.max(hits[,'visits'])", 704)
   Hint: Type which.max(hits[,'visits']) or something equivalent.

diff --git a/Statistical_Inference/Asymptotics/ACComp.R b/Statistical_Inference/Asymptotics/ACComp.R
@@ -0,0 +1,11 @@
+ACCompar <- function(n){
+ num <- 1:n 
+ den <- n
+ nn <- num+2
+ nd <- den+4
+ nf <- nn/nd
+ of <- num/den
+ scor <- nf<of
+ print(scor)
+ sum(scor)
+}
diff --git a/Statistical_Inference/Asymptotics/ACDemo.R b/Statistical_Inference/Asymptotics/ACDemo.R
@@ -0,0 +1,9 @@
+n <- 20; pvals <- seq(.1, .9, by = .05); nosim <- 1000
+coverage <- sapply(pvals, function(p){
+  phats <- (rbinom(nosim, prob = p, size = n) + 2) / (n + 4)
+  ll <- phats - qnorm(.975) * sqrt(phats * (1 - phats) / n)
+  ul <- phats + qnorm(.975) * sqrt(phats * (1 - phats) / n)
+  mean(ll < p & ul > p)
+})
+g <- ggplot(data.frame(pvals, coverage), aes(x = pvals, y = coverage)) + geom_line(size = 2) + geom_hline(yintercept = 0.95) + ylim(.75, 1.0)
+print(g)
diff --git a/Statistical_Inference/Asymptotics/PoisDemo.R b/Statistical_Inference/Asymptotics/PoisDemo.R
@@ -0,0 +1,10 @@
+lambdavals <- seq(0.005, 0.10, by = .01); nosim <- 1000
+t <- 100
+coverage <- sapply(lambdavals, function(lambda){
+  lhats <- rpois(nosim, lambda = lambda * t) / t
+  ll <- lhats - qnorm(.975) * sqrt(lhats / t)
+  ul <- lhats + qnorm(.975) * sqrt(lhats / t)
+  mean(ll < lambda & ul > lambda)
+})
+g <- ggplot(data.frame(lambdavals, coverage), aes(x = lambdavals, y = coverage)) + geom_line(size = 2) + geom_hline(yintercept = 0.95)+ylim(0, 1.0)
+print(g)
diff --git a/Statistical_Inference/Asymptotics/PoisDemoImpr.R b/Statistical_Inference/Asymptotics/PoisDemoImpr.R
@@ -0,0 +1,10 @@
+lambdavals <- seq(0.005, 0.10, by = .01); nosim <- 1000
+t <- 1000
+coverage <- sapply(lambdavals, function(lambda){
+  lhats <- rpois(nosim, lambda = lambda * t) / t
+  ll <- lhats - qnorm(.975) * sqrt(lhats / t)
+  ul <- lhats + qnorm(.975) * sqrt(lhats / t)
+  mean(ll < lambda & ul > lambda)
+})
+g <- ggplot(data.frame(lambdavals, coverage), aes(x = lambdavals, y = coverage)) + geom_line(size = 2) + geom_hline(yintercept = 0.95)+ylim(0, 1.0)
+print(g)
diff --git a/Statistical_Inference/Asymptotics/WaldDemo.R b/Statistical_Inference/Asymptotics/WaldDemo.R
@@ -0,0 +1,14 @@
+n <- 20
+nosim <- 30
+mywald <- function(p){
+  phats <- rbinom(nosim, prob = p, size = n) / n
+  ll <- phats - qnorm(.975) * sqrt(phats * (1 - phats) / n)
+  ul <- phats + qnorm(.975) * sqrt(phats * (1 - phats) / n)
+  print("Here are the p\' values")
+  print(phats)
+  print("Here are the lower")
+  print(ll)
+  print("Here are the upper")
+  print(ul)
+  mean(ll < p & ul > p)
+}
diff --git a/Statistical_Inference/Asymptotics/WaldFail.R b/Statistical_Inference/Asymptotics/WaldFail.R
@@ -0,0 +1,9 @@
+n <- 20; pvals <- seq(.1, .9, by = .05); nosim <- 1000
+coverage <- sapply(pvals, function(p){
+  phats <- rbinom(nosim, prob = p, size = n) / n
+  ll <- phats - qnorm(.975) * sqrt(phats * (1 - phats) / n)
+  ul <- phats + qnorm(.975) * sqrt(phats * (1 - phats) / n)
+  mean(ll < p & ul > p)
+})
+g <- ggplot(data.frame(pvals, coverage), aes(x = pvals, y = coverage)) + geom_line(size = 2) + geom_hline(yintercept = 0.95) + ylim(.75, 1.0)
+print(g)
diff --git a/Statistical_Inference/Asymptotics/WaldPass.R b/Statistical_Inference/Asymptotics/WaldPass.R
@@ -0,0 +1,9 @@
+n <- 100; pvals <- seq(.1, .9, by = .05); nosim <- 1000
+coverage <- sapply(pvals, function(p){
+  phats <- rbinom(nosim, prob = p, size = n) / n
+  ll <- phats - qnorm(.975) * sqrt(phats * (1 - phats) / n)
+  ul <- phats + qnorm(.975) * sqrt(phats * (1 - phats) / n)
+  mean(ll < p & ul > p)
+})
+g <- ggplot(data.frame(pvals, coverage), aes(x = pvals, y = coverage)) + geom_line(size = 2) + geom_hline(yintercept = 0.95) + ylim(.75, 1.0)
+print(g)
diff --git a/Statistical_Inference/Asymptotics/cltDice.R b/Statistical_Inference/Asymptotics/cltDice.R
@@ -11,4 +11,5 @@ dat <- data.frame(
   size = factor(rep(c(10, 20, 30), rep(nosim, 3))))
 g <- ggplot(dat, aes(x = x, fill = size)) + geom_histogram(alpha = .20, binwidth=.3, colour = "black", aes(y = ..density..)) 
 g <- g + stat_function(fun = dnorm, size = 2)
-g + facet_grid(. ~ size)
+g <- g + facet_grid(. ~ size)
+print(g)
diff --git a/Statistical_Inference/Asymptotics/cltFairCoin.R b/Statistical_Inference/Asymptotics/cltFairCoin.R
@@ -11,4 +11,5 @@ dat <- data.frame(
   size = factor(rep(c(10, 20, 30), rep(nosim, 3))))
 g <- ggplot(dat, aes(x = x, fill = size)) + geom_histogram(binwidth=.3, colour = "black", aes(y = ..density..)) 
 g <- g + stat_function(fun = dnorm, size = 2)
-g + facet_grid(. ~ size)
+g <- g + facet_grid(. ~ size)
+print(g)
diff --git a/Statistical_Inference/Asymptotics/cltUnfairCoin.R b/Statistical_Inference/Asymptotics/cltUnfairCoin.R
@@ -11,4 +11,5 @@ dat <- data.frame(
   size = factor(rep(c(10, 20, 30), rep(nosim, 3))))
 g <- ggplot(dat, aes(x = x, fill = size)) + geom_histogram(binwidth=.3, colour = "black", aes(y = ..density..)) 
 g <- g + stat_function(fun = dnorm, size = 2)
-g + facet_grid(. ~ size)
+g <- g + facet_grid(. ~ size)
+print(g)
diff --git a/Statistical_Inference/Asymptotics/dependson.txt b/Statistical_Inference/Asymptotics/dependson.txt
@@ -0,0 +1 @@
+ggplot2
diff --git a/Statistical_Inference/Asymptotics/initLesson.R b/Statistical_Inference/Asymptotics/initLesson.R
@@ -1,3 +1,4 @@
+library(ggplot2)
 # Put initialization code in this file.
 coinPlot <- function(n){
   means <- cumsum(sample(0 : 1, n , replace = TRUE)) / (1  : n)