Use US spelling of summarize()

Fixes hadley#1125
p0bs · Nov 18, 2022 · 3045d05 · 3045d05
1 parent d06d412
commit 3045d05
Show file tree

Hide file tree

Showing 17 changed files with 87 additions and 88 deletions.
diff --git a/base-R.qmd b/base-R.qmd
@@ -247,7 +247,7 @@ There are a number other base approaches to creating new columns including with
 Hadley collected a few examples at <https://gist.github.com/hadley/1986a273e384fb2d4d752c18ed71bedf>.
 
 Using `$` directly is convenient when performing quick summaries.
-For example, if you just want find the size of the biggest diamond or the possible values of `cut`, there's no need to use `summarise()`:
+For example, if you just want find the size of the biggest diamond or the possible values of `cut`, there's no need to use `summarize()`:
 
 ```{r}
 max(diamonds$carat)
@@ -423,7 +423,7 @@ Another important member of the apply family is `tapply()` which computes a sing
 ```{r}
 diamonds |> 
   group_by(cut) |> 
-  summarise(price = mean(price))
+  summarize(price = mean(price))
 
 tapply(diamonds$price, diamonds$cut, mean)
 ```

diff --git a/communicate-plots.qmd b/communicate-plots.qmd
@@ -187,7 +187,7 @@ It's not wonderful for this plot, but it isn't too bad.
 ```{r}
 class_avg <- mpg |>
   group_by(class) |>
-  summarise(
+  summarize(
     displ = median(displ),
     hwy = median(hwy)
   )
@@ -208,7 +208,7 @@ Often, you want the label in the corner of the plot, so it's convenient to creat
 
 ```{r}
 label_info <- mpg |>
-  summarise(
+  summarize(
     displ = max(displ),
     hwy = max(hwy),
     label = "Increasing engine size is \nrelated to decreasing fuel economy."

diff --git a/data-transform.qmd b/data-transform.qmd
@@ -423,11 +423,10 @@ This means subsequent operations will now work "by month".
 
 ### `summarize()` {#sec-summarize}
 
-The most important grouped operation is a summary.
-It collapses each group to a single row[^data-transform-3].
-Here we compute the average departure delay by month:
+The most important grouped operation is a summary, which each collapses each group to a single row.
+In dplyr, this is operation is performed by `summarize()`[^data-transform-3], as shown by the following example, which computes the average departure delay by month:
 
-[^data-transform-3]: This is a slightly simplification; later on you'll learn how to use `summarize()` to produce multiple summary rows for each group.
+[^data-transform-3]: Or `summarise()`, if you prefer British English.
 
 ```{r}
 flights |> 
@@ -673,7 +672,7 @@ You can find a good explanation of this problem and how to overcome it at <http:
 ## Summary
 
 In this chapter, you've learned the tools that dplyr provides for working with data frames.
-The tools are roughly grouped into three categories: those that manipulate the rows (like `filter()` and `arrange()`, those that manipulate the columns (like `select()` and `mutate()`), and those that manipulate groups (like `group_by()` and `summarise()`).
+The tools are roughly grouped into three categories: those that manipulate the rows (like `filter()` and `arrange()`, those that manipulate the columns (like `select()` and `mutate()`), and those that manipulate groups (like `group_by()` and `summarize()`).
 In this chapter, we've focused on these "whole data frame" tools, but you haven't yet learned much about what you can do with the individual variable.
 We'll come back to that in the Transform part of the book, where each chapter will give you tools for a specific type of variable.
 

diff --git a/databases.qmd b/databases.qmd
@@ -310,7 +310,7 @@ flights |>
 ```{r}
 flights |> 
   group_by(dest) |> 
-  summarise(dep_delay = mean(dep_delay, na.rm = TRUE)) |> 
+  summarize(dep_delay = mean(dep_delay, na.rm = TRUE)) |> 
   show_query()
 ```
 
@@ -393,14 +393,14 @@ You'll see more complex examples once we hit the join functions.
 
 ### GROUP BY
 
-`group_by()` is translated to the `GROUP BY`[^databases-6] clause and `summarise()` is translated to the `SELECT` clause:
+`group_by()` is translated to the `GROUP BY`[^databases-6] clause and `summarize()` is translated to the `SELECT` clause:
 
 [^databases-6]: This is no coincidence: the dplyr function name was inspired by the SQL clause.
 
 ```{r}
 diamonds_db |> 
   group_by(cut) |> 
-  summarise(
+  summarize(
     n = n(),
     avg_price = mean(price, na.rm = TRUE)
   ) |> 
@@ -445,7 +445,7 @@ dbplyr will remind you about this behavior the first time you hit it:
 ```{r}
 flights |> 
   group_by(dest) |> 
-  summarise(delay = mean(arr_delay))
+  summarize(delay = mean(arr_delay))
 ```
 
 If you want to learn more about how NULLs work, you might enjoy "[*Three valued logic*](https://modern-sql.com/concept/three-valued-logic)" by Markus Winand.
@@ -471,7 +471,7 @@ This is a one of the idiosyncracies of SQL created because `WHERE` is evaluated
 ```{r}
 diamonds_db |> 
   group_by(cut) |> 
-  summarise(n = n()) |> 
+  summarize(n = n()) |> 
   filter(n > 100) |> 
   show_query()
 ```
@@ -579,13 +579,13 @@ The easiest way to see the full set of what's currently available is to visit th
 So far we've focused on the big picture of how dplyr verbs are translated to the clauses of a query.
 Now we're going to zoom in a little and talk about the translation of the R functions that work with individual columns, e.g. what happens when you use `mean(x)` in a `summarize()`?
 
-To help see what's going on, we'll use a couple of little helper functions that run a `summarise()` or `mutate()` and show the generated SQL.
+To help see what's going on, we'll use a couple of little helper functions that run a `summarize()` or `mutate()` and show the generated SQL.
 That will make it a little easier to explore a few variations and see how summaries and transformations can differ.
 
 ```{r}
 summarize_query <- function(df, ...) {
   df |> 
-    summarise(...) |> 
+    summarize(...) |> 
     show_query()
 }
 mutate_query <- function(df, ...) {

diff --git a/datetimes.qmd b/datetimes.qmd
@@ -351,7 +351,7 @@ It looks like flights leaving in minutes 20-30 and 50-60 have much lower delays
 flights_dt |> 
   mutate(minute = minute(dep_time)) |> 
   group_by(minute) |> 
-  summarise(
+  summarize(
     avg_delay = mean(dep_delay, na.rm = TRUE),
     n = n()) |> 
   ggplot(aes(minute, avg_delay)) +
@@ -369,7 +369,7 @@ Interestingly, if we look at the *scheduled* departure time we don't see such a
 sched_dep <- flights_dt |> 
   mutate(minute = minute(sched_dep_time)) |> 
   group_by(minute) |> 
-  summarise(
+  summarize(
     avg_delay = mean(arr_delay, na.rm = TRUE),
     n = n())
 

diff --git a/factors.qmd b/factors.qmd
@@ -179,7 +179,7 @@ For example, imagine you want to explore the average number of hours spent watch
 #|   any sense of overall pattern.
 relig_summary <- gss_cat |>
   group_by(relig) |>
-  summarise(
+  summarize(
     age = mean(age, na.rm = TRUE),
     tvhours = mean(tvhours, na.rm = TRUE),
     n = n()
@@ -232,7 +232,7 @@ What if we create a similar plot looking at how average age varies across report
 #|   then $8000-9999.
 rincome_summary <- gss_cat |>
   group_by(rincome) |>
-  summarise(
+  summarize(
     age = mean(age, na.rm = TRUE),
     tvhours = mean(tvhours, na.rm = TRUE),
     n = n()

diff --git a/functions.qmd b/functions.qmd
@@ -441,7 +441,7 @@ So the key challenge in writing data frame functions is figuring out which argum
 Fortunately this is easy because you can look it up from the documentation 😄.
 There are two terms to look for in the docs which corresponding to the two most common sub-types of tidy evaluation:
 
--   **Data-masking**: this is used in functions like `arrange()`, `filter()`, and `summarise()` that compute with variables.
+-   **Data-masking**: this is used in functions like `arrange()`, `filter()`, and `summarize()` that compute with variables.
 
 -   **Tidy-selection**: this is used for for functions like `select()`, `relocate()`, and `rename()` that select variables.
 
@@ -455,7 +455,7 @@ If you commonly perform the same set of summaries when doing initial data explor
 
 ```{r}
 summary6 <- function(data, var) {
-  data |> summarise(
+  data |> summarize(
     min = min({{ var }}, na.rm = TRUE),
     mean = mean({{ var }}, na.rm = TRUE),
     median = median({{ var }}, na.rm = TRUE),
@@ -468,9 +468,9 @@ summary6 <- function(data, var) {
 diamonds |> summary6(carat)
 ```
 
-(Whenever you wrap `summarise()` in a helper, we think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
+(Whenever you wrap `summarize()` in a helper, we think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
 
-The nice thing about this function is because it wraps `summarise()` you can used it on grouped data:
+The nice thing about this function is because it wraps `summarize()` you can used it on grouped data:
 
 ```{r}
 diamonds |> 
@@ -489,7 +489,7 @@ diamonds |>
 
 To summarize multiple variables you'll need to wait until @sec-across, where you'll learn how to use `across()`.
 
-Another popular `summarise()` helper function is a version of `count()` that also computes proportions:
+Another popular `summarize()` helper function is a version of `count()` that also computes proportions:
 
 ```{r}
 # https://twitter.com/Diabb6/status/1571635146658402309
@@ -547,7 +547,7 @@ You might try writing something like:
 count_missing <- function(df, group_vars, x_var) {
   df |> 
     group_by({{ group_vars }}) |> 
-    summarise(n_miss = sum(is.na({{ x_var }})))
+    summarize(n_miss = sum(is.na({{ x_var }})))
 }
 flights |> 
   count_missing(c(year, month, day), dep_time)
@@ -560,7 +560,7 @@ We can work around that problem by using the handy `pick()` which allows you to
 count_missing <- function(df, group_vars, x_var) {
   df |> 
     group_by(pick({{ group_vars }})) |> 
-    summarise(n_miss = sum(is.na({{ x_var }})))
+    summarize(n_miss = sum(is.na({{ x_var }})))
 }
 flights |> 
   count_missing(c(year, month, day), dep_time)
@@ -602,7 +602,7 @@ While our examples have mostly focused on dplyr, tidy evaluation also underpins
 
         ```{r}
         #| eval: false
-        flights |> group_by(dest) |> summarise_severe()
+        flights |> group_by(dest) |> summarize_severe()
         ```
 
     3.  Finds all flights that were cancelled or delayed by more than a user supplied number of hours:
@@ -616,7 +616,7 @@ While our examples have mostly focused on dplyr, tidy evaluation also underpins
 
         ```{r}
         #| eval: false
-        weather |> summarise_weather(temp)
+        weather |> summarize_weather(temp)
         ```
 
     5.  Converts the user supplied variable that uses clock time (e.g. `dep_time`, `arr_time`, etc) into a decimal time (i.e. hours + minutes / 60).