diff --git a/R_Programming/Dates_and_Times/lesson.yaml b/R_Programming/Dates_and_Times/lesson.yaml index ee32821e..9870faab 100644 --- a/R_Programming/Dates_and_Times/lesson.yaml +++ b/R_Programming/Dates_and_Times/lesson.yaml @@ -10,7 +10,7 @@ Output: R has a special way of representing dates and times, which can be helpful if you're working with data that show how something changes over time (i.e. time-series data) or if your data contain some other temporal information, like dates of birth. - Class: text - Output: Dates are represented by the 'Date' class and times are represented by the 'POSIXct' and 'POSIXlt' classes. Internally, dates are stored as the number of days since 1970-01-01 and times are stored as either the number of seconds since 1970-01-01 (for 'POSIXct') or a list of seconds, minutes, hours, etc (for 'POSIXlt'). + Output: Dates are represented by the 'Date' class and times are represented by the 'POSIXct' and 'POSIXlt' classes. Internally, dates are stored as the number of days since 1970-01-01 and times are stored as either the number of seconds since 1970-01-01 (for 'POSIXct') or a list of seconds, minutes, hours, etc. (for 'POSIXlt'). - Class: cmd_question Output: Let's start by using d1 <- Sys.Date() to get the current date and store it in the variable d1. (That's the letter 'd' and the number 1.) @@ -97,7 +97,7 @@ Hint: Type t2 to view its contents. - Class: cmd_question - Output: The printed format of t2 is identical to that of t1 (except for the slight difference in time). Now unclass() t2 to see how it is different internally. + Output: The printed format of t2 is identical to that of t1. Now unclass() t2 to see how it is different internally. CorrectAnswer: unclass(t2) AnswerTests: omnitest(correctExpr='unclass(t2)') Hint: Use unclass(t2) to view its internal structure. diff --git a/R_Programming/Looking_at_Data/lesson.yaml b/R_Programming/Looking_at_Data/lesson.yaml index 75479939..d9b2c78a 100644 --- a/R_Programming/Looking_at_Data/lesson.yaml +++ b/R_Programming/Looking_at_Data/lesson.yaml @@ -28,7 +28,7 @@ Output: It's very common for data to be stored in a data frame. It is the default class for data read into R using functions like read.csv() and read.table(), which you'll learn about in another lesson. - Class: cmd_question - Output: Since the dataset is stored in a data frame, we know it is rectangular. In other words, it has two dimensions (rows and columns) and fits neatly into a table or spreadsheet. Now use dim(plants) to see exactly how many rows and columns we're dealing with. + Output: Since the dataset is stored in a data frame, we know it is rectangular. In other words, it has two dimensions (rows and columns) and fits neatly into a table or spreadsheet. Use dim(plants) to see exactly how many rows and columns we're dealing with. CorrectAnswer: dim(plants) AnswerTests: omnitest(correctExpr='dim(plants)') Hint: Use dim(plants) to see the exact dimensions of the plants dataset. @@ -97,7 +97,7 @@ Output: For categorical variables (called 'factor' variables in R), summary() displays the number of times each value (or 'level') occurs in the data. For example, each value of Scientific_Name only appears once, since it is unique to a specific plant. In contrast, the summary for Duration (also a factor variable) tells us that our dataset contains 3031 Perennial plants, 682 Annual plants, etc. - Class: cmd_question - Output: You can see that R truncated the summary for Active_Growth_Period by including a catch-all category called 'Other'. Since it is a categorical/factor variable, we can see how many times each value occurs in the data with table(plants$Active_Growth_Period). + Output: You can see that R truncated the summary for Active_Growth_Period by including a catch-all category called 'Other'. Since it is a categorical/factor variable, we can see how many times each value actually occurs in the data with table(plants$Active_Growth_Period). CorrectAnswer: table(plants$Active_Growth_Period) AnswerTests: omnitest(correctExpr='table(plants$Active_Growth_Period)') Hint: table(plants$Active_Growth_Period) will display counts for each level of the factor variable. diff --git a/R_Programming/Simulation/lesson.yaml b/R_Programming/Simulation/lesson.yaml index 0146a2d9..a6ddb04d 100644 --- a/R_Programming/Simulation/lesson.yaml +++ b/R_Programming/Simulation/lesson.yaml @@ -39,11 +39,14 @@ AnswerTests: match_call('sample(1:20, 10)') Hint: Type sample(1:20, 10) to sample 10 numbers between 1 and 20, without replacement. +- Class: text + Output: Since the last command sampled without replacement, no number appears more than once in the output. + - Class: cmd_question Output: LETTERS is a predefined variable in R containing a vector of all 26 letters of the English alphabet. Take a look at it now. CorrectAnswer: LETTERS AnswerTests: omnitest(correctExpr='LETTERS') - Hint: Just type LETTERS to print its contents to the console. + Hint: Type LETTERS to print its contents to the console. - Class: cmd_question Output: The sample() function can also be used to permute, or rearrange, the elements of a vector. For example, try sample(LETTERS) to permute all 26 letters of the English alphabet. @@ -91,7 +94,7 @@ Hint: Call rbinom() with n = 1, size = 100, and prob = 0.7. - Class: cmd_question - Output: Equivilently, if we want to see all of the 0s and 1s, we can perform 100 observations, each of size 1, with success probability of 0.7. Give it a try, assigning the result to a new variable called flips2. + Output: Equivilently, if we want to see all of the 0s and 1s, we can request 100 observations, each of size 1, with success probability of 0.7. Give it a try, assigning the result to a new variable called flips2. CorrectAnswer: flips2 <- rbinom(100, size = 1, prob = 0.7) AnswerTests: match_call('flips2 <- rbinom(100, size = 1, prob = 0.7)') Hint: Call rbinom() with n = 100, size = 1, and prob = 0.7 and assign the result to flips2. @@ -127,7 +130,7 @@ Hint: Use rnorm(10, mean = 100, sd = 25) to generate 10 random numbers from a normal distribution with mean 100 and standard deviation 25. - Class: text - Output: Finally, what if we want to simulate 100 groups of random numbers, each containing 5 values generated from a Poisson distribution with mean 10? Let's start with one group of 5 numbers, then I'll show you how to repeat the operation 100 times in a convenient and compact way. + Output: Finally, what if we want to simulate 100 *groups* of random numbers, each containing 5 values generated from a Poisson distribution with mean 10? Let's start with one group of 5 numbers, then I'll show you how to repeat the operation 100 times in a convenient and compact way. - Class: cmd_question Output: Generate 5 random values from a Poisson distribution with mean 10. Check out the documentation for rpois() if you need help. diff --git a/R_Programming/lapply_and_sapply/lesson.yaml b/R_Programming/lapply_and_sapply/lesson.yaml index c43492b6..cedcaf05 100644 --- a/R_Programming/lapply_and_sapply/lesson.yaml +++ b/R_Programming/lapply_and_sapply/lesson.yaml @@ -100,7 +100,7 @@ Output: Therefore, if we want to know the total number of countries (in our dataset) with, for example, the color orange on their flag, we can just add up all of the 1s and 0s in the 'orange' column. Try sum(flags$orange) to see this. CorrectAnswer: sum(flags$orange) AnswerTests: omnitest(correctExpr='sum(flags$orange)') - Hint: Use sum(flags$orange) to add up all of the 1s in the 'orange' column. + Hint: Use sum(flags$orange) to add up all of the 1s and 0s in the 'orange' column. - Class: text Output: Now we want to repeat this operation for each of the colors recorded in the dataset. @@ -157,7 +157,7 @@ Output: The range() function returns the minimum and maximum of its first argument, which should be a numeric vector. Use lapply() to apply the range function to each column of flag_shapes. Don't worry about storing the result in a new variable. By now, we know that lapply() always returns a list. CorrectAnswer: lapply(flag_shapes, range) AnswerTests: omnitest(correctExpr='lapply(flag_shapes, range)') - Hint: Try lapply(flag_shapes, range). + Hint: Try lapply(flag_shapes, range) to apply the range() function to each column of flag_shapes. - Class: cmd_question Output: Do the same operation, but using sapply() and store the result in a variable called range_mat. @@ -171,6 +171,9 @@ AnswerTests: any_of_exprs('shape_mat', 'print(shape_mat)') Hint: Type shape_mat to view its contents. +- Class: text + Output: Each column of shape_mat gives the minimum (row 1) and maximum (row 2) number of times its respective shape appears in different flags. + - Class: cmd_question Output: Use the class() function to confirm that shape_mat is a matrix. CorrectAnswer: class(shape_mat) @@ -178,7 +181,7 @@ Hint: class(shape_mat) returns the class of shape_mat. - Class: text - Output: As we've seen, sapply() always attempts to simplify the result given by lapply(). It has been successful in doing so for each of the examples we've looked at so far. Let's look at an example that where sapply() can't figure out how to simplify the result and thus returns a list, just like lapply(). + Output: As we've seen, sapply() always attempts to simplify the result given by lapply(). It has been successful in doing so for each of the examples we've looked at so far. Let's look at an example where sapply() can't figure out how to simplify the result and thus returns a list, no different from lapply(). - Class: cmd_question Output: When given a vector, the unique() function returns a vector with all duplicate elements removed. In other words, unique() returns a vector of only the 'unique' elements. To see how it works, try unique(c(3, 4, 5, 5, 5, 6, 6)). diff --git a/R_Programming/vapply_and_tapply/lesson.yaml b/R_Programming/vapply_and_tapply/lesson.yaml index b4262e07..4aea928c 100644 --- a/R_Programming/vapply_and_tapply/lesson.yaml +++ b/R_Programming/vapply_and_tapply/lesson.yaml @@ -60,6 +60,12 @@ - Class: text Output: As a data analyst, you'll often wish to split your data up into groups based on the value of some variable, then apply a function to the members of each group. The next function we'll look at, tapply(), does exactly that. +- Class: cmd_question + Output: Use ?tapply to pull up the documentation. + CorrectAnswer: ?tapply + AnswerTests: any_of_exprs('?tapply', 'help(tapply)') + Hint: Pull up the help file with ?tapply. + - Class: cmd_question Output: The 'landmass' variable in our dataset takes on integer values between 1 and 6, each of which represents a different part of the world. Use table(flags$landmass) to see how many flags/countries fall into each group. CorrectAnswer: table(flags$landmass) @@ -72,6 +78,9 @@ AnswerTests: omnitest(correctExpr="table(flags$animate)") Hint: Use table(flags$animate) to see how many flags contain an animate image. +- Class: text + Output: This tells us that 39 flags contain an animate object (animate = 1) and 155 do not (animate = 0). + - Class: cmd_question Output: If you take the arithmetic mean of a bunch of 0s and 1s, you get the proportion of 1s. Use tapply(flags$animate, flags$landmass, mean) to apply the mean function to the 'animate' variable separately for each of the six landmass groups, thus giving us the proportion of flags containing an animate image WITHIN each landmass group. CorrectAnswer: tapply(flags$animate, flags$landmass, mean) @@ -79,20 +88,33 @@ Hint: tapply(flags$animate, flags$landmass, mean) will tell us the proportion of flags containing an animate image within each landmass group. - Class: text - Output: The first landmass group (flags$landmass == 1) corresponds to North America and contains the highest proportion of flags with an animate image (0.4194). + Output: The first landmass group (landmass = 1) corresponds to North America and contains the highest proportion of flags with an animate image (0.4194). - Class: cmd_question Output: Similarly, we can look at a summary of population values (in round millions) for countries with and without the color red on their flag with tapply(flags$population, flags$red, summary). + CorrectAnswer: tapply(flags$population, flags$red, summary) + AnswerTests: omnitest(correctExpr="tapply(flags$population, flags$red, summary)") + Hint: You can see a summary of populations for countries with and without the color red on their flag with tapply(flags$population, flags$red, summary). + +- Class: mult_question + Output: What is the median population (in millions) for counties *without* the color red on their flag? + AnswerChoices: 9.0; 4.0; 27.6; 3.0; 22.1; 0.0 + CorrectAnswer: 3.0 + AnswerTests: omnitest(correctVal= '3.0') + Hint: Use your result from the last question. Recall that red = 0 means that the color red is NOT present on a countries flag. + +- Class: cmd_question + Output: Lastly, use the same approach to look at a summary of population values for each of the six landmasses. CorrectAnswer: tapply(flags$population, flags$landmass, summary) AnswerTests: omnitest(correctExpr="tapply(flags$population, flags$landmass, summary)") - Hint: You can see a summary of populations within each landmass group with tapply(flags$population, flags$landmass, summary). + Hint: "You can see a summary of populations for each of the six landmasses by calling tapply() with three arguments: flags$population, flags$landmass, and summary." - Class: mult_question Output: What is the maximum population (in millions) for the fourth landmass group (Africa)? - AnswerChoices: 56; 1010; 119; 5 - CorrectAnswer: 56 - AnswerTests: omnitest(correctVal= '56') + AnswerChoices: 56.00; 1010.0; 119.0; 5.00; 157.00 + CorrectAnswer: 56.00 + AnswerTests: omnitest(correctVal= '56.00') Hint: Use your result from the last question. - Class: text - Output: In this lesson, you learned how to use vapply() as a safer (and possibly faster) alternative to sapply(), which is most helpful when writing your own functions. You also learned how to use tapply() to split your data into groups based on the value of some variable, then apply a function to each group. These functions will come in handy on your quest to become a better data analyst. + Output: In this lesson, you learned how to use vapply() as a safer alternative to sapply(), which is most helpful when writing your own functions. You also learned how to use tapply() to split your data into groups based on the value of some variable, then apply a function to each group. These functions will come in handy on your quest to become a better data analyst.