Skip to content

Commit

Permalink
bring rprog up to speed with tweaks to rprog alt
Browse files Browse the repository at this point in the history
  • Loading branch information
ncarchedi committed Aug 20, 2014
1 parent 8dff55b commit d491fea
Show file tree
Hide file tree
Showing 5 changed files with 44 additions and 16 deletions.
4 changes: 2 additions & 2 deletions R_Programming/Dates_and_Times/lesson.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
Output: R has a special way of representing dates and times, which can be helpful if you're working with data that show how something changes over time (i.e. time-series data) or if your data contain some other temporal information, like dates of birth.

- Class: text
Output: Dates are represented by the 'Date' class and times are represented by the 'POSIXct' and 'POSIXlt' classes. Internally, dates are stored as the number of days since 1970-01-01 and times are stored as either the number of seconds since 1970-01-01 (for 'POSIXct') or a list of seconds, minutes, hours, etc (for 'POSIXlt').
Output: Dates are represented by the 'Date' class and times are represented by the 'POSIXct' and 'POSIXlt' classes. Internally, dates are stored as the number of days since 1970-01-01 and times are stored as either the number of seconds since 1970-01-01 (for 'POSIXct') or a list of seconds, minutes, hours, etc. (for 'POSIXlt').

- Class: cmd_question
Output: Let's start by using d1 <- Sys.Date() to get the current date and store it in the variable d1. (That's the letter 'd' and the number 1.)
Expand Down Expand Up @@ -97,7 +97,7 @@
Hint: Type t2 to view its contents.

- Class: cmd_question
Output: The printed format of t2 is identical to that of t1 (except for the slight difference in time). Now unclass() t2 to see how it is different internally.
Output: The printed format of t2 is identical to that of t1. Now unclass() t2 to see how it is different internally.
CorrectAnswer: unclass(t2)
AnswerTests: omnitest(correctExpr='unclass(t2)')
Hint: Use unclass(t2) to view its internal structure.
Expand Down
4 changes: 2 additions & 2 deletions R_Programming/Looking_at_Data/lesson.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
Output: It's very common for data to be stored in a data frame. It is the default class for data read into R using functions like read.csv() and read.table(), which you'll learn about in another lesson.

- Class: cmd_question
Output: Since the dataset is stored in a data frame, we know it is rectangular. In other words, it has two dimensions (rows and columns) and fits neatly into a table or spreadsheet. Now use dim(plants) to see exactly how many rows and columns we're dealing with.
Output: Since the dataset is stored in a data frame, we know it is rectangular. In other words, it has two dimensions (rows and columns) and fits neatly into a table or spreadsheet. Use dim(plants) to see exactly how many rows and columns we're dealing with.
CorrectAnswer: dim(plants)
AnswerTests: omnitest(correctExpr='dim(plants)')
Hint: Use dim(plants) to see the exact dimensions of the plants dataset.
Expand Down Expand Up @@ -97,7 +97,7 @@
Output: For categorical variables (called 'factor' variables in R), summary() displays the number of times each value (or 'level') occurs in the data. For example, each value of Scientific_Name only appears once, since it is unique to a specific plant. In contrast, the summary for Duration (also a factor variable) tells us that our dataset contains 3031 Perennial plants, 682 Annual plants, etc.

- Class: cmd_question
Output: You can see that R truncated the summary for Active_Growth_Period by including a catch-all category called 'Other'. Since it is a categorical/factor variable, we can see how many times each value occurs in the data with table(plants$Active_Growth_Period).
Output: You can see that R truncated the summary for Active_Growth_Period by including a catch-all category called 'Other'. Since it is a categorical/factor variable, we can see how many times each value actually occurs in the data with table(plants$Active_Growth_Period).
CorrectAnswer: table(plants$Active_Growth_Period)
AnswerTests: omnitest(correctExpr='table(plants$Active_Growth_Period)')
Hint: table(plants$Active_Growth_Period) will display counts for each level of the factor variable.
Expand Down
9 changes: 6 additions & 3 deletions R_Programming/Simulation/lesson.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,14 @@
AnswerTests: match_call('sample(1:20, 10)')
Hint: Type sample(1:20, 10) to sample 10 numbers between 1 and 20, without replacement.

- Class: text
Output: Since the last command sampled without replacement, no number appears more than once in the output.

- Class: cmd_question
Output: LETTERS is a predefined variable in R containing a vector of all 26 letters of the English alphabet. Take a look at it now.
CorrectAnswer: LETTERS
AnswerTests: omnitest(correctExpr='LETTERS')
Hint: Just type LETTERS to print its contents to the console.
Hint: Type LETTERS to print its contents to the console.

- Class: cmd_question
Output: The sample() function can also be used to permute, or rearrange, the elements of a vector. For example, try sample(LETTERS) to permute all 26 letters of the English alphabet.
Expand Down Expand Up @@ -91,7 +94,7 @@
Hint: Call rbinom() with n = 1, size = 100, and prob = 0.7.

- Class: cmd_question
Output: Equivilently, if we want to see all of the 0s and 1s, we can perform 100 observations, each of size 1, with success probability of 0.7. Give it a try, assigning the result to a new variable called flips2.
Output: Equivilently, if we want to see all of the 0s and 1s, we can request 100 observations, each of size 1, with success probability of 0.7. Give it a try, assigning the result to a new variable called flips2.
CorrectAnswer: flips2 <- rbinom(100, size = 1, prob = 0.7)
AnswerTests: match_call('flips2 <- rbinom(100, size = 1, prob = 0.7)')
Hint: Call rbinom() with n = 100, size = 1, and prob = 0.7 and assign the result to flips2.
Expand Down Expand Up @@ -127,7 +130,7 @@
Hint: Use rnorm(10, mean = 100, sd = 25) to generate 10 random numbers from a normal distribution with mean 100 and standard deviation 25.

- Class: text
Output: Finally, what if we want to simulate 100 groups of random numbers, each containing 5 values generated from a Poisson distribution with mean 10? Let's start with one group of 5 numbers, then I'll show you how to repeat the operation 100 times in a convenient and compact way.
Output: Finally, what if we want to simulate 100 *groups* of random numbers, each containing 5 values generated from a Poisson distribution with mean 10? Let's start with one group of 5 numbers, then I'll show you how to repeat the operation 100 times in a convenient and compact way.

- Class: cmd_question
Output: Generate 5 random values from a Poisson distribution with mean 10. Check out the documentation for rpois() if you need help.
Expand Down
9 changes: 6 additions & 3 deletions R_Programming/lapply_and_sapply/lesson.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@
Output: Therefore, if we want to know the total number of countries (in our dataset) with, for example, the color orange on their flag, we can just add up all of the 1s and 0s in the 'orange' column. Try sum(flags$orange) to see this.
CorrectAnswer: sum(flags$orange)
AnswerTests: omnitest(correctExpr='sum(flags$orange)')
Hint: Use sum(flags$orange) to add up all of the 1s in the 'orange' column.
Hint: Use sum(flags$orange) to add up all of the 1s and 0s in the 'orange' column.

- Class: text
Output: Now we want to repeat this operation for each of the colors recorded in the dataset.
Expand Down Expand Up @@ -157,7 +157,7 @@
Output: The range() function returns the minimum and maximum of its first argument, which should be a numeric vector. Use lapply() to apply the range function to each column of flag_shapes. Don't worry about storing the result in a new variable. By now, we know that lapply() always returns a list.
CorrectAnswer: lapply(flag_shapes, range)
AnswerTests: omnitest(correctExpr='lapply(flag_shapes, range)')
Hint: Try lapply(flag_shapes, range).
Hint: Try lapply(flag_shapes, range) to apply the range() function to each column of flag_shapes.

- Class: cmd_question
Output: Do the same operation, but using sapply() and store the result in a variable called range_mat.
Expand All @@ -171,14 +171,17 @@
AnswerTests: any_of_exprs('shape_mat', 'print(shape_mat)')
Hint: Type shape_mat to view its contents.

- Class: text
Output: Each column of shape_mat gives the minimum (row 1) and maximum (row 2) number of times its respective shape appears in different flags.

- Class: cmd_question
Output: Use the class() function to confirm that shape_mat is a matrix.
CorrectAnswer: class(shape_mat)
AnswerTests: omnitest(correctExpr='class(shape_mat)')
Hint: class(shape_mat) returns the class of shape_mat.

- Class: text
Output: As we've seen, sapply() always attempts to simplify the result given by lapply(). It has been successful in doing so for each of the examples we've looked at so far. Let's look at an example that where sapply() can't figure out how to simplify the result and thus returns a list, just like lapply().
Output: As we've seen, sapply() always attempts to simplify the result given by lapply(). It has been successful in doing so for each of the examples we've looked at so far. Let's look at an example where sapply() can't figure out how to simplify the result and thus returns a list, no different from lapply().

- Class: cmd_question
Output: When given a vector, the unique() function returns a vector with all duplicate elements removed. In other words, unique() returns a vector of only the 'unique' elements. To see how it works, try unique(c(3, 4, 5, 5, 5, 6, 6)).
Expand Down
34 changes: 28 additions & 6 deletions R_Programming/vapply_and_tapply/lesson.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,12 @@
- Class: text
Output: As a data analyst, you'll often wish to split your data up into groups based on the value of some variable, then apply a function to the members of each group. The next function we'll look at, tapply(), does exactly that.

- Class: cmd_question
Output: Use ?tapply to pull up the documentation.
CorrectAnswer: ?tapply
AnswerTests: any_of_exprs('?tapply', 'help(tapply)')
Hint: Pull up the help file with ?tapply.

- Class: cmd_question
Output: The 'landmass' variable in our dataset takes on integer values between 1 and 6, each of which represents a different part of the world. Use table(flags$landmass) to see how many flags/countries fall into each group.
CorrectAnswer: table(flags$landmass)
Expand All @@ -72,27 +78,43 @@
AnswerTests: omnitest(correctExpr="table(flags$animate)")
Hint: Use table(flags$animate) to see how many flags contain an animate image.

- Class: text
Output: This tells us that 39 flags contain an animate object (animate = 1) and 155 do not (animate = 0).

- Class: cmd_question
Output: If you take the arithmetic mean of a bunch of 0s and 1s, you get the proportion of 1s. Use tapply(flags$animate, flags$landmass, mean) to apply the mean function to the 'animate' variable separately for each of the six landmass groups, thus giving us the proportion of flags containing an animate image WITHIN each landmass group.
CorrectAnswer: tapply(flags$animate, flags$landmass, mean)
AnswerTests: omnitest(correctExpr="tapply(flags$animate, flags$landmass, mean)")
Hint: tapply(flags$animate, flags$landmass, mean) will tell us the proportion of flags containing an animate image within each landmass group.

- Class: text
Output: The first landmass group (flags$landmass == 1) corresponds to North America and contains the highest proportion of flags with an animate image (0.4194).
Output: The first landmass group (landmass = 1) corresponds to North America and contains the highest proportion of flags with an animate image (0.4194).

- Class: cmd_question
Output: Similarly, we can look at a summary of population values (in round millions) for countries with and without the color red on their flag with tapply(flags$population, flags$red, summary).
CorrectAnswer: tapply(flags$population, flags$red, summary)
AnswerTests: omnitest(correctExpr="tapply(flags$population, flags$red, summary)")
Hint: You can see a summary of populations for countries with and without the color red on their flag with tapply(flags$population, flags$red, summary).

- Class: mult_question
Output: What is the median population (in millions) for counties *without* the color red on their flag?
AnswerChoices: 9.0; 4.0; 27.6; 3.0; 22.1; 0.0
CorrectAnswer: 3.0
AnswerTests: omnitest(correctVal= '3.0')
Hint: Use your result from the last question. Recall that red = 0 means that the color red is NOT present on a countries flag.

- Class: cmd_question
Output: Lastly, use the same approach to look at a summary of population values for each of the six landmasses.
CorrectAnswer: tapply(flags$population, flags$landmass, summary)
AnswerTests: omnitest(correctExpr="tapply(flags$population, flags$landmass, summary)")
Hint: You can see a summary of populations within each landmass group with tapply(flags$population, flags$landmass, summary).
Hint: "You can see a summary of populations for each of the six landmasses by calling tapply() with three arguments: flags$population, flags$landmass, and summary."

- Class: mult_question
Output: What is the maximum population (in millions) for the fourth landmass group (Africa)?
AnswerChoices: 56; 1010; 119; 5
CorrectAnswer: 56
AnswerTests: omnitest(correctVal= '56')
AnswerChoices: 56.00; 1010.0; 119.0; 5.00; 157.00
CorrectAnswer: 56.00
AnswerTests: omnitest(correctVal= '56.00')
Hint: Use your result from the last question.

- Class: text
Output: In this lesson, you learned how to use vapply() as a safer (and possibly faster) alternative to sapply(), which is most helpful when writing your own functions. You also learned how to use tapply() to split your data into groups based on the value of some variable, then apply a function to each group. These functions will come in handy on your quest to become a better data analyst.
Output: In this lesson, you learned how to use vapply() as a safer alternative to sapply(), which is most helpful when writing your own functions. You also learned how to use tapply() to split your data into groups based on the value of some variable, then apply a function to each group. These functions will come in handy on your quest to become a better data analyst.

0 comments on commit d491fea

Please sign in to comment.