Skip to content

Commit

Permalink
Starting to write about expressions
Browse files Browse the repository at this point in the history
  • Loading branch information
hadley committed Sep 27, 2017
1 parent 7bad8f3 commit d331c9e
Show file tree
Hide file tree
Showing 8 changed files with 122 additions and 66 deletions.
184 changes: 118 additions & 66 deletions Expressions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,7 @@ library(rlang)
library(rlang)
```

The structure of code is a tree. The leaves of the tree are constants and names. The branches of the tree are calls. Define graphical conventions.

The AST is is usally construcuted from a string, by parsing code. The opposite operation is deparsing. But you can also modify the tree by hand, and insert any R object. This can be useful when you need to override the usual lookup rules.

## Structure of expressions {#structure-of-expressions}
## Introduction {#structure-of-expressions}

To compute on the language, we first need to understand the structure of the language. That will require some new vocabulary, some new tools, and some new ways of thinking about R code. The first thing you'll need to understand is the distinction between an operation and a result: \index{expressions}

Expand All @@ -33,73 +29,78 @@ z

`quote()` returns an __expression__: an object that represents an action that can be performed by R. (Unfortunately `expression()` does not return an expression in this sense. Instead, it returns something more like a list of expressions. See [parsing and deparsing](#parsing-and-deparsing) for more details.) \indexc{quote()}

An expression is also called an abstract syntax tree (AST) because it represents the hierarchical tree structure of the code. We'll use `pryr::ast()` to see this more clearly: \index{abstract syntax tree} \indexc{ast()}
## Tree

An expression is also called an abstract syntax tree (AST) because it represents the hierarchical tree structure of the code. \index{abstract syntax tree}

The following diagram illustrates the graphical conventions that we'll use, drawing a tree of `f(x, 1)`.

```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/expression-simple.png", dpi = 450)
```

* The leaves of the tree are names like `f` and `x` and constants like `1` or
`"a"`. Names have a purple border and rounded corners. Constants, which
are atomic vectors of length one, have black borders and square corners.

* Function calls make a tree-like strucutre. The call object is represented
by an orange circle. The first child of the call is the function: which
is typically a name. The second and subsequent children are the arguments.
Unlike many tree diagrams the order of the child is important:
`f(x, 1)` is not the same as `f(1, x)`.

Every call in R can be written in this form. Take `y <- x * 10`, for example. It doesn't seem like it's the same form as `f(x, 1)`. That's because it uses the __infix__ operators `<-` and `*`. These are call infix because the arguments come in between the name of the function. Most functions in R are __prefix__ functions where the name of the function comes first. (Some programming languages also use __postfix__ where the name of the function comes last). In R, any infix operator can be converted to prefix form as long as you escape the name. That means that these too lines of code are equivalent:

```{r}
ast(y <- x * 10)
y <- x * 10
`<-`(y, `*`(x, 10))
```

There are four possible components of an expression: constants, names, calls, and pairlists.
And yield the same AST:

* __constants__ include the length one atomic vectors, like `"a"` or `10`,
and `NULL`. `ast()` displays them as is. \index{constants}
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/expression-prefix.png", dpi = 450)
```

```{r}
ast("a")
ast(1)
ast(1L)
ast(TRUE)
```
Drawing these diagrams by hand takes some time, and obviously you can't use with your own code. So to supplement them we'll also use `lobstr::ast()` which uses similar conventions.

Quoting a constant returns it unchanged:
### Ambiguity and precedence

```{r}
identical(1, quote(1))
identical("test", quote("test"))
```
These diagrams help resolve several sources of ambiguity.

* __names__, or symbols, represent the name of an object rather than its value.
`ast()` prefixes names with a backtick. \index{names} \index{symbols|see{names}}
First, what does `1 + 2 * 3` yield? Do you get 7 (i.e. `(1 + 2) * 3`), or 9 (i.e. `1 + (2 * 3)`).

```{r}
ast(x)
ast(mean)
ast(`an unusual name`)
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/expression-ambig-order.png", dpi = 450)
```

* __calls__ represent the action of calling a function. Like lists, calls are
recursive: they can contain constants, names, pairlists, and other calls.
`ast()` prints `()` and then lists the children. The first child is the
function that is called, and the remaining children are the function's
arguments. \index{calls}
Infix functions introduce an ambiguity in the parser in a way that prefix functions do not. Programming langauges resolve this using a set of conventions known as operator precdence.

```{r}
ast(f())
ast(f(1, 2))
ast(f(a, b))
ast(f(g(), h(1, a)))
```
What's the difference between these three things?

As mentioned in
[every operation is a function call](#all-calls),
even things that don't look like function calls still have this
hierarchical structure:
```{r}
x1 <- quote(1 + 2)
x2 <- quote(`1 + 2`)
x3 <- quote("1 + 2")
```

```{r}
ast(a + b)
ast(if (x > 1) x else 1/x)
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/expression-ambig-value.png", dpi = 450)
```

* __pairlists__, short for dotted pair lists, are a legacy of R's past.
They are only used in one place: the formal arguments of a function.
`ast()` prints `[]` at the top-level of a pairlist. Like calls, pairlists
are also recursive and can contain constants, names, and calls.
\index{pairlists}
While the first component of the call is a usually a symbol providing a function name, it can also be a function that returns a function (i.e. a function factory).

```{r}
ast(function(x = 1, y) x)
ast(function(x = 1, y = x * 2) {x / y})
```
```{r, eval = FALSE}
f(a, 1)
f()(a, 1)
f(a, 1)()
```

```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/expression-ambig-nesting.png", dpi = 450)
```

### Base R naming conventions

Note that `str()` does not follow these naming conventions when describing objects. Instead, it describes names as symbols and calls as language objects:

Expand All @@ -108,15 +109,7 @@ str(quote(a))
str(quote(a + b))
```

Using low-level functions, it is possible to create call trees that contain objects other than constants, names, calls, and pairlists. The following example uses `substitute()` to insert a data frame into a call tree. This is a bad idea, however, because the object does not print correctly: the printed call looks like it should return "list" but when evaluated, it returns "data.frame". \indexc{substitute()}

```{r}
class_df <- substitute(class(df), list(df = data.frame(x = 10)))
class_df
eval(class_df)
```

Together these four components define the structure of all R code. They are explained in more detail in the following sections.
Beware printing language objects because R can print different things in the same way - it's not always possible to uniquely convert a tree into text.

### Exercises

Expand All @@ -138,7 +131,34 @@ Together these four components define the structure of all R code. They are expl
Which one of the six types of atomic vector can't appear in an expression?
Why?

## Names {#names}

## Leaves: constants and symbols {#names}

* __constants__ include the length one atomic vectors, like `"a"` or `10`,
and `NULL`. `ast()` displays them as is. \index{constants}

```{r}
ast("a")
ast(1)
ast(1L)
ast(TRUE)
```
Quoting a constant returns it unchanged:
```{r}
identical(1, quote(1))
identical("test", quote("test"))
```
* __names__, or symbols, represent the name of an object rather than its value.
`ast()` prefixes names with a backtick. \index{names} \index{symbols|see{names}}
```{r}
ast(x)
ast(mean)
ast(`an unusual name`)
```
Typically, we use `quote()` to capture names. You can also convert a string to a name with `as.name()`. However, this is most useful only when your function receives strings as input. Otherwise it involves more typing than using `quote()`. (You can use `is.name()` to test if an object is a name.) \index{names} \indexc{as.name()}
Expand Down Expand Up @@ -201,6 +221,8 @@ quote(expr =)
A call is very similar to a list. It has `length`, `[[` and `[` methods, and is recursive because calls can contain other calls. The first element of the call is the function that gets called. It's usually the _name_ of a function: \index{calls}
### Subsetting
```{r}
x <- quote(read.csv("important.csv", row.names = FALSE))
x[[1]]
Expand All @@ -224,6 +246,12 @@ x$row.names
names(x)
```

You can use `[` to, but removing the first element is not usually useful:

```{r}
x[-1]
```

The length of a call minus 1 gives the number of arguments:

```{r}
Expand All @@ -233,6 +261,30 @@ length(x) - 1
There are many ways to supply the arguments to a function.
To work around this problem, pryr provides `standardise_call()`. It uses the base `match.call()` function to convert all positional arguments to named arguments: \indexc{standardise\_call()} \indexc{match.call()}

### Constructing

```{r}
lang(`+`, 1, 2)
lang(quote(`+`), 1, 2)
lang("+", 1, 2)
args <- list(1 , 2)
lang("f", args, 3)
lang("f", quote(list(1, 2)), 3)
lang("f", splice(args), 3)
```

### Inlining

Using low-level functions, it is possible to create call trees that contain objects other than constants, names, calls, and pairlists. The following example uses `substitute()` to insert a data frame into a call tree. This is a bad idea, however, because the object does not print correctly: the printed call looks like it should return "list" but when evaluated, it returns "data.frame". \indexc{substitute()}

```{r}
class_df <- substitute(class(df), list(df = data.frame(x = 10)))
class_df
eval(class_df)
```


### Exercises

1. The following two calls look the same, but are actually different:
Expand Down
Binary file added diagrams/expression-ambig-nesting.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/expression-ambig-order.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/expression-ambig-value.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/expression-prefix.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/expression-simple.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified diagrams/expressions.graffle
Binary file not shown.
4 changes: 4 additions & 0 deletions tidy-evaluation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,8 @@
source("common.R")
```

## Quasiquotation

## Overscoping

## Quosures

0 comments on commit d331c9e

Please sign in to comment.