More hacking of string chapter

ydebessu · Apr 22, 2021 · 2505136 · 2505136
1 parent 554890b
commit 2505136
Showing 3 changed files with 227 additions and 198 deletions.
diff --git a/prog-strings.Rmd b/prog-strings.Rmd
@@ -6,6 +6,49 @@ library(tidyr)
 library(tibble)
 ```
 
+### str_c
+
+`NULL`s are silently dropped.
+This is particularly useful in conjunction with `if`:
+
+```{r}
+name <- "Hadley"
+time_of_day <- "morning"
+birthday <- FALSE
+
+str_c(
+  "Good ", time_of_day, " ", name,
+  if (birthday) " and HAPPY BIRTHDAY",
+  "."
+)
+```
+
+## Performance
+
+`fixed()`: matches exactly the specified sequence of bytes.
+It ignores all special regular expressions and operates at a very low level.
+This allows you to avoid complex escaping and can be much faster than regular expressions.
+The following microbenchmark shows that it's about 3x faster for a simple example.
+
+```{r}
+microbenchmark::microbenchmark(
+  fixed = str_detect(sentences, fixed("the")),
+  regex = str_detect(sentences, "the"),
+  times = 20
+)
+```
+
+As you saw with `str_split()` you can use `boundary()` to match boundaries.
+You can also use it with the other functions:
+
+```{r}
+x <- "This is a sentence."
+str_view_all(x, boundary("word"))
+str_extract_all(x, boundary("word"))
+```
+
+### 
+
 ### Extract
 
 ```{r}

diff --git a/regexps.Rmd b/regexps.Rmd
@@ -296,6 +296,55 @@ There are two useful function in base R that also use regular expressions:
 
     (If you're more comfortable with "globs" like `*.Rmd`, you can convert them to regular expressions with `glob2rx()`):
 
+## Options
+
+When you use a pattern that's a string, it's automatically wrapped into a call to `regex()`:
+
+```{r, eval = FALSE}
+# The regular call:
+str_view(fruit, "nana")
+# Is shorthand for
+str_view(fruit, regex("nana"))
+```
+
+You can use the other arguments of `regex()` to control details of the match:
+
+-   `ignore_case = TRUE` allows characters to match either their uppercase or lowercase forms.
+    This always uses the current locale.
+
+    ```{r}
+    bananas <- c("banana", "Banana", "BANANA")
+    str_view(bananas, "banana")
+    str_view(bananas, regex("banana", ignore_case = TRUE))
+    ```
+
+-   `multiline = TRUE` allows `^` and `$` to match the start and end of each line rather than the start and end of the complete string.
+
+    ```{r}
+    x <- "Line 1\nLine 2\nLine 3"
+    str_extract_all(x, "^Line")[[1]]
+    str_extract_all(x, regex("^Line", multiline = TRUE))[[1]]
+    ```
+
+-   `comments = TRUE` allows you to use comments and white space to make complex regular expressions more understandable.
+    Spaces are ignored, as is everything after `#`.
+    To match a literal space, you'll need to escape it: `"\\ "`.
+
+    ```{r}
+    phone <- regex("
+      \\(?     # optional opening parens
+      (\\d{3}) # area code
+      [) -]?   # optional closing parens, space, or dash
+      (\\d{3}) # another three numbers
+      [ -]?    # optional space or dash
+      (\\d{3}) # three more numbers
+      ", comments = TRUE)
+
+    str_match("514-791-8141", phone)
+    ```
+
+-   `dotall = TRUE` allows `.` to match everything, including `\n`.
+
 ## A caution
 
 A word of caution before we continue: because regular expressions are so powerful, it's easy to try and solve every problem with a single regular expression.
@@ -394,4 +443,3 @@ See the Stack Overflow discussion at <http://stackoverflow.com/a/201378> for mor
 Don't forget that you're in a programming language and you have other tools at your disposal.
 Instead of creating one complex regular expression, it's often easier to write a series of simpler regexps.
 If you get stuck trying to create a single regexp that solves your problem, take a step back and think if you could break the problem down into smaller pieces, solving each challenge before moving onto the next one.
-