Skip to content

Commit

Permalink
Typo + grammatical fixes + issue triage (hadley#1217)
Browse files Browse the repository at this point in the history
* Fix ex wording + grammatical, closes hadley#1209

* Suppress warnings, closes hadley#1210

* Update screenshot, closes hadley#1211

* Grammatical

* Typos + grammatical

* Update workflow-basics.qmd

* Update workflow-basics.qmd

* Update workflow-basics.qmd

* Update workflow-help.qmd

* Update workflow-pipes.qmd
  • Loading branch information
mine-cetinkaya-rundel authored Jan 5, 2023
1 parent e3b8211 commit b4bde71
Show file tree
Hide file tree
Showing 8 changed files with 85 additions and 84 deletions.
20 changes: 10 additions & 10 deletions data-tidy.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,14 @@ In this chapter, you will learn a consistent way to organize your data in R usin
Getting your data into this format requires some work up front, but that work pays off in the long term.
Once you have tidy data and the tidy tools provided by packages in the tidyverse, you will spend much less time munging data from one representation to another, allowing you to spend more time on the data questions you care about.

In this chapter, you'll first learn the definition of tidy data and see it applied to simple toy dataset.
Then we'll dive into the main tool you'll use for tidying data: pivoting.
Pivoting allows you to change the form of your data, without changing any of the values.
We'll finish up with a discussion of usefully untidy data, and how you can create it if needed.
In this chapter, you'll first learn the definition of tidy data and see it applied to a simple toy dataset.
Then we'll dive into the primary tool you'll use for tidying data: pivoting.
Pivoting allows you to change the form of your data without changing any of the values.
We'll finish with a discussion of usefully untidy data and how you can create it if needed.

### Prerequisites

In this chapter we'll focus on tidyr, a package that provides a bunch of tools to help tidy up your messy datasets.
In this chapter, we'll focus on tidyr, a package that provides a bunch of tools to help tidy up your messy datasets.
tidyr is a member of the core tidyverse.

```{r}
Expand All @@ -41,7 +41,7 @@ From this chapter on, we'll suppress the loading message from `library(tidyverse
## Tidy data {#sec-tidy-data}

You can represent the same underlying data in multiple ways.
The example below shows the same data organised in four different ways.
The example below shows the same data organized in four different ways.
Each dataset shows the same values of four variables: *country*, *year*, *population*, and *cases* of TB (tuberculosis), but each dataset organizes the values in a different way.

<!-- TODO redraw as tables -->
Expand All @@ -62,7 +62,7 @@ One of them, `table1`, will be much easier to work with inside the tidyverse bec
There are three interrelated rules that make a dataset tidy:

1. Each variable is a column; each column is a variable.
2. Each observation is row; each row is an observation.
2. Each observation is a row; each row is an observation.
3. Each value is a cell; each cell is a single value.

@fig-tidy-structure shows the rules visually.
Expand All @@ -88,17 +88,17 @@ There are two main advantages:
1. There's a general advantage to picking one consistent way of storing data.
If you have a consistent data structure, it's easier to learn the tools that work with it because they have an underlying uniformity.

2. There's a specific advantage to placing variables in columns because it allows R's vectorised nature to shine.
2. There's a specific advantage to placing variables in columns because it allows R's vectorized nature to shine.
As you learned in @sec-mutate and @sec-summarize, most built-in R functions work with vectors of values.
That makes transforming tidy data feel particularly natural.

dplyr, ggplot2, and all the other packages in the tidyverse are designed to work with tidy data.
Here are a couple of small examples showing how you might work with `table1`.
Here are a few small examples showing how you might work with `table1`.

```{r}
#| fig-width: 5
#| fig-alt: >
#| This figure shows the numbers of cases in 1999 and 2000 for
#| This figure shows the number of cases in 1999 and 2000 for
#| Afghanistan, Brazil, and China, with year on the x-axis and number
#| of cases on the y-axis. Each point on the plot represents the number
#| of cases in a given country in a given year. The points for each
Expand Down
13 changes: 7 additions & 6 deletions data-visualize.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ Our ultimate goal in this chapter is to recreate the following visualization dis
#| fig-alt: >
#| A scatterplot of body mass vs. flipper length of penguins, with a
#| smooth curve displaying the relationship between these two variables
#| overlaid. The plot displays a positive, fairly linear, relatively
#| overlaid. The plot displays a positive, fairly linear, and relatively
#| strong relationship between these two variables. Species (Adelie,
#| Chinstrap, and Gentoo) are represented with different colors and
#| shapes. The relationship between body mass and flipper length is
Expand Down Expand Up @@ -186,7 +186,7 @@ You'll learn a whole bunch of geoms throughout the book, particularly in @sec-la
```{r}
#| fig-alt: >
#| A scatterplot of body mass vs. flipper length of penguins. The plot
#| displays a positive, linear, relatively strong relationship between
#| displays a positive, linear, and relatively strong relationship between
#| these two variables.
ggplot(
Expand Down Expand Up @@ -232,7 +232,7 @@ Throughout the book you will make many more ggplots and have many more opportuni
#| warning: false
#| fig-alt: >
#| A scatterplot of body mass vs. flipper length of penguins. The plot
#| displays a positive, fairly linear, relatively strong relationship
#| displays a positive, fairly linear, and relatively strong relationship
#| between these two variables. Species (Adelie, Chinstrap, and Gentoo)
#| are represented with different colors.
Expand Down Expand Up @@ -326,7 +326,7 @@ Other arguments match the aesthetic mappings, `x` is the x-axis label, `y` is th
#| fig-alt: >
#| A scatterplot of body mass vs. flipper length of penguins, with a
#| smooth curve displaying the relationship between these two variables
#| overlaid. The plot displays a positive, fairly linear, relatively
#| overlaid. The plot displays a positive, fairly linear, and relatively
#| strong relationship between these two variables. Species (Adelie,
#| Chinstrap, and Gentoo) are represented with different colors and
#| shapes. The relationship between body mass and flipper length is
Expand Down Expand Up @@ -771,7 +771,7 @@ You will learn about many other geoms for visualizing distributions of variables
How can you see this information when you run `mpg`?

2. Make a scatterplot of `hwy` vs. `displ` using the `mpg` data frame.
Then, map a third, numerical variable to `color`, `size`, and `shape`.
Next, map a third, numerical variable to `color`, then `size`, then both `color` and `size`, then `shape`.
How do these aesthetics behave differently for categorical vs. numerical variables?

3. In the scatterplot of `hwy` vs. `displ`, what happens if you map a third variable to `linewidth`?
Expand All @@ -781,7 +781,7 @@ You will learn about many other geoms for visualizing distributions of variables
5. Make a scatterplot of `bill_depth_mm` vs. `bill_length_mm` and color the points by `species`.
What does adding coloring by species reveal about the relationship between these two variables?

6. Why does the following yield two separate legends.
6. Why does the following yield two separate legends?
How would you fix it to combine the two legends?

```{r}
Expand Down Expand Up @@ -810,6 +810,7 @@ That's the job of `ggsave()`, which will save the most recent plot to disk:
```{r}
#| fig-show: hide
#| warning: false
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point()
Expand Down
Binary file modified screenshots/rstudio-env.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
40 changes: 20 additions & 20 deletions workflow-basics.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ status("polishing")

You now have some experience running R code.
We didn't give you many details, but you've obviously figured out the basics, or you would've thrown this book away in frustration!
Frustration is natural when you start programming in R, because it is such a stickler for punctuation, and even one character out of place will cause it to complain.
But while you should expect to be a little frustrated, take comfort in that this experience is both typical and temporary: it happens to everyone, and the only way to get over it is to keep trying.
Frustration is natural when you start programming in R because it is such a stickler for punctuation, and even one character out of place will cause it to complain.
But while you should expect to be a little frustrated, take comfort in that this experience is typical and temporary: it happens to everyone, and the only way to get over it is to keep trying.

Before we go any further, let's make sure you've got a solid foundation in running R code, and that you know about some of the most helpful RStudio features.
Before we go any further, let's ensure you've got a solid foundation in running R code and that you know some of the most helpful RStudio features.

## Coding basics

Let's review some basics we've so far omitted in the interests of getting you plotting as quickly as possible.
Let's review some basics we've omitted so far in the interest of getting you plotting as quickly as possible.
You can use R as a calculator:

```{r}
Expand Down Expand Up @@ -55,18 +55,18 @@ object_name <- value

When reading that code, say "object name gets value" in your head.

You will make lots of assignments and `<-` is a pain to type.
You will make lots of assignments, and `<-` is a pain to type.
You can save time with RStudio's keyboard shortcut: Alt + - (the minus sign).
Notice that RStudio automatically surrounds `<-` with spaces, which is a good code formatting practice.
Code is miserable to read on a good day, so giveyoureyesabreak and use spaces.

## Comments

R will ignore any text after `#`.
This allows to you to write **comments**, text that is ignored by R but read by other humans.
This allows you to write **comments**, text that is ignored by R but read by other humans.
We'll sometimes include comments in examples explaining what's happening with the code.

Comments can be helpful for briefly describing what the subsequent code does.
Comments can be helpful for briefly describing what the following code does.

```{r}
# define primes
Expand All @@ -76,26 +76,26 @@ primes <- c(2, 3, 5, 7, 11, 13)
primes * 2
```

With short pieces of code like this, it might not be necessary to leave a command for every single line of code.
But as the code you're writing gets more complex, comments can save you (and your collaborators) a lot of time in figuring out what was done in the code.
With short pieces of code like this, leaving a comment for every single line of code might not be necessary.
But as the code you're writing gets more complex, comments can save you (and your collaborators) a lot of time figuring out what was done in the code.

Use comments to explain the *why* of your code, not the *how* or the *what*.
The *what* and *how* of code your is always possible to figure out, even if it might be tedious, by carefully reading the code.
But if you describe the "what" in your comments and your code, you'll have to remember to carefully update the comment and code in tandem.
If you change the code and forget to update the comment, they'll be inconsistent which will lead to confusion when you come back to your code in the future.
The *what* and *how* of your code are always possible to figure out, even if it might be tedious, by carefully reading it.
But if you describe the "what" in your comments and your code, you'll have to remember to update the comment and code in tandem carefully.
If you change the code and forget to update the comment, they'll be inconsistent, leading to confusion when you return to your code in the future.

Figuring out *why* something was done is much more difficult, if not impossible.
For example, `geom_smooth()` has an argument called `span`, which controls the smoothness of the curve, with larger values yielding a smoother curve.
Suppose you decide to change the value of `span` from its default of 0.75 to 0.3: it's easy for a future reader to understand *what* is happening, but unless you note your thinking in a comment, no one will understand *why* you changed the default.

For data analysis code, use comments to explain your overall plan of attack and record important insight as you encounter them.
For data analysis code, use comments to explain your overall plan of attack and record important insights as you encounter them.
There's no way to re-capture this knowledge from the code itself.

## What's in a name? {#sec-whats-in-a-name}

Object names must start with a letter, and can only contain letters, numbers, `_` and `.`.
Object names must start with a letter and can only contain letters, numbers, `_`, and `.`.
You want your object names to be descriptive, so you'll need to adopt a convention for multiple words.
We recommend **snake_case** where you separate lowercase words with `_`.
We recommend **snake_case**, where you separate lowercase words with `_`.

```{r}
#| eval: false
Expand All @@ -106,7 +106,7 @@ some.people.use.periods
And_aFew.People_RENOUNCEconvention
```

We'll come back to names again when we talk more about code style in @sec-workflow-style.
We'll return to names again when we discuss code style in @sec-workflow-style.

You can inspect an object by typing its name:

Expand Down Expand Up @@ -148,8 +148,8 @@ R_rocks
```

This illustrates the implied contract between you and R: R will do the tedious computations for you, but in exchange, you must be completely precise in your instructions.
Typos matter; R can't read your mind and say "oh, they probably meant `r_rocks` when they typed `r_rock`".
Case matters; similarly R can't read your mind and say "oh, they probably meant `r_rocks` when they typed `R_rocks`".
Typos matter; R can't read your mind and say, "oh, they probably meant `r_rocks` when they typed `r_rock`".
Case matters; similarly, R can't read your mind and say, "oh, they probably meant `r_rocks` when they typed `R_rocks`".

## Calling functions

Expand All @@ -161,10 +161,10 @@ R has a large collection of built-in functions that are called like this:
function_name(arg1 = val1, arg2 = val2, ...)
```

Let's try using `seq()`, which makes regular **seq**uences of numbers and, while we're at it, learn more helpful features of RStudio.
Let's try using `seq()`, which makes regular **seq**uences of numbers, and while we're at it, learn more helpful features of RStudio.
Type `se` and hit TAB.
A popup shows you possible completions.
Specify `seq()` by typing more (a `q`) to disambiguate, or by using ↑/↓ arrows to select.
Specify `seq()` by typing more (a `q`) to disambiguate or by using ↑/↓ arrows to select.
Notice the floating tooltip that pops up, reminding you of the function's arguments and purpose.
If you want more help, press F1 to get all the details in the help tab in the lower right pane.

Expand Down
Loading

0 comments on commit b4bde71

Please sign in to comment.