Skip to content

Commit

Permalink
add c7 assessments
Browse files Browse the repository at this point in the history
  • Loading branch information
elmoallistair committed Apr 19, 2021
1 parent ae13352 commit 9ae6e82
Show file tree
Hide file tree
Showing 5 changed files with 728 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
## Weekly challenge 3

Latest Submission Grade: 100%

 

### Question 1

A data analyst is creating a new data frame. Their dataset has dates, currency, and text strings. What characteristic of data frames is this an instance of?

* **Data stored can be many different types**
* Columns should contain the same number of items
* Columns should be named
* Variables should be named

> A data frame is a collection of columns. Characteristics of data frames include: all columns should be named, data stored can be many different types, and all columns should contain the same number of items. The dataset in question has a variety of data types, which is related to the idea that data stored can be many different types.
 

### Question 2

A data analyst is considering using tibbles instead of basic data frames. What are some of the limitations of tibbles? Select all that apply.

* Tibbles can overload a console
* **Tibbles can never create row names**
* **Tibbles won’t automatically change the names of variables**
* **Tibbles can never change the input type of the data**

> Tibbles are useful when working with large datasets because they make printing easier. But tibbles can never change the input type of the data, create row names, or change the names of variables.
 

### Question 3

A data analyst is working with a large data frame. It contains so many columns that they don’t all fit on the screen at once. The analyst wants a quick list of all of the column names to get a better idea of what is in their data. What function should they use?

* **colnames()**
* head()
* str()
* mutate()

> The `colnames()` function will return a list of all the column names in a data frame for easy reference.
 

### Question 4

A data analyst is working with the ToothGrowth dataset in R. What code chunk will allow them to get a quick summary of the dataset?

* **`glimpse(ToothGrowth)`**
* `min(ToothGrowth)`
* `separate(ToothGrowth)`
* `colnames(ToothGrowth)`

> The code chunk is `glimpse(ToothGrowth)`. The `glimpse()` function provides the analyst with a quick summary of the data in the ToothGrowth dataset. This function shows what all of the column names are and how many rows there are.
 

### Question 5

A data analyst is working with the penguins dataset. What code chunk does the analyst write to make sure all the column names are unique and consistent and contain only letters, numbers, and underscores?

* `drop_na(penguins)`
* **`clean_names(penguins)`**
* `rename(penguins)`
* `select(penguins)`

> The code chunk is `clean_names(penguins)`. The `clean_names()` function ensures that there are only characters, numbers, and underscores in the names used in the data frame.
 

### Question 6

A data analyst is working with the penguins data. They write the following code:

`penguins %>%`

The variable species includes three penguin species: Adelie, Chinstrap, and Gentoo. What code chunk does the analyst add to create a data frame that only includes the Gentoo species?

* `filter(Gentoo == species)`
* `filter(species <- "Gentoo")`
* **`filter(species == "Gentoo")`**
* `filter(species == "Adelie")`

> The code chunk is filter(species == "Gentoo"). The filter function allows the data analyst to specify which part of the data they want to view. Two equal signs in an argument mean "exactly equal to." Using this operator instead of the assignment operator <- calls only the data about Gentoo penguins to the dataset.
&nbsp;

### Question 7

A data analyst is working with the penguins dataset. They write the following code:

```
penguins %>%
group_by(species) %>%
```

What code chunk does the analyst add to find the mean value for the variable body_mass_g?

* `summarize(=body_mass_g)`
* `summarize(max(body_mass_g))`
* **`summarize(mean(body_mass_g))`**
* `summarize(body_mass_g(mean))`

> The code chunk is `summarize(mean(body_mass_g))`. The `summarize` function gives high-level information about a dataset.
&nbsp;

### Question 8

A data analyst is working with a data frame named salary_data. They want to create a new column named wages that includes data from the rate column multiplied by 40. What code chunk lets the analyst create the wages column?

* `mutate(salary_data, rate = wages * 40)`
* `mutate(wages = rate * 40)`
* **`mutate(salary_data, wages = rate * 40)`**
* `mutate(salary_data, wages = rate + 40)`

> The code chunk is `mutate(salary_data, wages = rate * 40)`. The analyst can use the mutate() function to create a new column called wages that includes data from the rate column multiplied by 40. The mutate() function can create a new column without affecting any existing columns.
&nbsp;

### Question 9

A data analyst is working with a data frame named customers. It has separate columns for area code (area_code) and phone number (phone_num). The analyst wants to combine the two columns into a single column called phone_number, with the area code and phone number separated by a hyphen. What code chunk lets the analyst create the phone_number column?

* `unite(customers, area_code, phone_num, sep="-")`
* `unite(customers, "phone_number", area_code, phone_num)`
* `unite(customers, "phone_number", area_code, sep="-")`
* **`unite(customers, "phone_number", area_code, phone_num, sep="-")`**

> The code chunk `unite(customers, "phone_number", area_code, phone_num, sep="-")`. lets the analyst create the phone_number column. The `unite()` function lets the analyst combine the area code and phone number data into a single column. In the parentheses of the function, the analyst writes the name of the data frame, then the name of the new column in quotation marks, followed by the names of the two columns they want to combine. Finally, the argument `sep="-"` places a hyphen between the area code and phone number data in the phone_number column.
&nbsp;

### Question 10

A data analyst wants to summarize their data with the sd(), cor(), and mean(). What kind of measures are these?

* **Statistical**
* Numerical
* Summary
* Standard

> Standard deviation, correlation, mean, maximum, and minimum are statistical measures which can be used to summarize data.
&nbsp;

### Question 11

In R, which statistical measure demonstrates how strong the relationship is between two variables?

* Standard deviation
* **Correlation**
* Average
* Maximum

> Correlation measures how strong the relationship between two variables is. This is represented by the cor() function.
&nbsp;

### Question 12

A data analyst is studying weather data. They write the following code chunk:

`bias(actual_temp, predicted_temp)`

What will this code chunk calculate?

* `The minimum difference between the actual and predicted values`
* `The maximum difference between the actual and predicted values`
* **`The average difference between the actual and predicted values`**
* `The total average of the values`

> The bias() function can be used to calculate the average amount a predicted outcome and actual outcome differ in order to determine if the data model is biased.
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
## Weekly challenge 4

Latest Submission Grade: 100%

&nbsp;

### Question 1

Which of the following are benefits of using ggplot2? Select all that apply.

* Automatically clean data before creating a plot
* **Easily add layers to your plot**
* **Combine data manipulation and visualization**
* **Customize the look and feel of your plot**

> The benefits of using ggplot2 include easily adding layers to your plot, customizing the look and feel of your plot, combining data manipulation and visualization.
&nbsp;

### Question 2

In ggplot2, what symbol do you use to add layers to your plot?

* The equal sign (=)
* The ampersand symbol (&)
* The pipe operator (%>%)
* **The plus sign (+)**

> In ggplot2, you use the plus sign (+) to add layers to your plot.
&nbsp;

### Question 3

A data analyst creates a plot using the following code chunk:

```
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
```

Which of the following represents a variable in the code chunk? Select all that apply.

* **`body_mass_g`**
* `x`
* **`flipper_length_mm`**
* `y`

> The two variables in the code are flipper_length_mm and body_mass_g. The two variables are part of the penguins dataset. The aesthetic x maps the variable flipper_length_mm to the x-axis of the plot. The aesthetic y maps the variable body_mass_g to the y-axis of the plot.
&nbsp;

### Question 4

A data analyst uses the aes() function to define the connection between their data and the plots in their visualization. What argument is used to refer to matching up a specific variable in your data set with a specific aesthetic?

* Faceting
* **Mapping**
* Jittering
* Annotating

> Mapping is an argument that matches up a specific variable in your data set with a specific aesthetic. You use the aes() function to define the mapping between your data and your plot.
&nbsp;

### Question 5

A data analyst is working with the penguins data. The analyst creates a scatterplot with the following code:

```
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g,alpha = species))
```

What does the alpha aesthetic do to the appearance of the points on the plot?

* **Makes some points on the plot more transparent**
* Makes the points on the plot more colorful
* Makes the points on the plot smaller
* Makes the points on the plot larger

> The alpha aesthetic makes some points on a plot more transparent, or see-through, than others.
&nbsp;

### Question 6

You are working with the penguins dataset. You create a scatterplot with the following code chunk:

```
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
```

How do you change the second line of code to map the aesthetic size to the variable species?

* `geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, species = size)`
* **`geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, size = species))`**
* `geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, species + size)`
* `geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, size + species))`

> You change the second line of code to `geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, size = species))` to map the aesthetic size to the variable species. Inside the parentheses of the aes() function, add a comma after y = body_mass_g to add a new aesthetic attribute, then write size = species to map the aesthetic size to the variable species. The data points for each of the three penguin species will now appear in different sizes.
&nbsp;

### Question 7

Fill in the blank: The _____ creates a scatterplot and then adds a small amount of random noise to each point in the plot to make the points easier to find.

* geom_bar() function
* **geom_jitter() function**
* geom_smooth() function
* geom_point() function

> The `geom_jitter()` function creates a scatterplot and then adds a small amount of random noise to each point in the plot to make the points easier to find.
&nbsp;

### Question 8

You have created a plot based on data in the diamonds dataset. What code chunk can be added to your existing plot to create wrap around facets based on the variable *color*?

* **`facet_wrap(~color)`**
* `facet_wrap(color)`
* `facet_wrap(color~)`
* `facet(~color)`

> The code chunk is `facet_wrap(~color)`. Inside the parentheses of the facet_wrap() function, type a tilde symbol (~) followed by the name of the variable you want to facet.
&nbsp;

### Question 9

A data analyst uses the annotate() function to create a text label for a plot. Which attributes of the text can the analyst change by adding code to the argument of the annotate() function? Select all that apply.

* **Change the size of the text**
* **Change the font style of the text**
* **Change the color of the text**
* Change the text into a title for the plot

> By adding code to the argument of the annotate() function, the analyst can change the font style, color, and size of the text.

&nbsp;

### Question 10

You are working with the penguins dataset. You create a scatterplot with the following lines of code:

```
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
```

What code chunk do you add to the third line to save your plot as a jpeg file with "penguins" as the file name?

* `ggsave(penguins)`
* `ggsave("penguins.jpeg")`
* `ggsave(penguins.jpeg)`
* `ggsave("jpeg.penguins")`

> You add the code chunk `ggsave("penguins.jpeg")` to save your plot as a jpeg file with "penguins" as the file name. Inside the parentheses of the ggsave() function, type a quotation mark followed by the file name (penguins), then a period, then the type of file (jpeg), then a closing quotation mark.
Loading

0 comments on commit 9ae6e82

Please sign in to comment.