Skip to content

Commit

Permalink
Merge branch 'master' of github.com:dpastoor/r4ds
Browse files Browse the repository at this point in the history
  • Loading branch information
dpastoor committed Jun 3, 2016
2 parents c12a384 + c66de8e commit 9e5560b
Show file tree
Hide file tree
Showing 24 changed files with 1,080 additions and 381 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Imports:
bookdown,
broom,
dplyr,
DSR,
ggplot2,
hexbin,
htmltools,
Expand All @@ -19,7 +20,6 @@ Imports:
jsonlite,
Lahman,
lubridate,
modelr,
knitr,
maps,
microbenchmark,
Expand All @@ -37,8 +37,8 @@ Remotes:
garrettgman/DSR,
hadley/modelr,
hadley/purrr,
hadley/tidyr,
hadley/stringr,
hadley/ggplot2,
hadley/nycflights13,
yihui/knitr,
rstudio/bookdown
7 changes: 4 additions & 3 deletions communicate-plots.Rmd
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# Communication with plots

*Section 2* will show you how to prepare your plots for communication. You'll learn how to make your plots more legible with titles, labels, zooming, and default visual themes.

```{r include = FALSE}
```{r echo = FALSE, messages = FALSE, warning=FALSE}
library(ggplot2)
```


# Communication with plots

The previous sections showed you how to make plots that you can use as a tools for _exploration_. When you made these plots, you knew---even before you looked at them---which variables the plot would display and which data sets the variables would come from. You might have even known what to look for in the completed plots, assuming that you made each plot with a goal in mind. As a result, it was not very important to put a title or a useful set of labels on your plots.

The importance of titles and labels changes once you use your plots for _communication_. Your audience will not share your background knowledge. In fact, they may not know anything about your plots except what the plots themselves display. If you want your plots to communicate your findings effectively, you will need to make them as self-explanatory as possible.
Expand Down
2 changes: 1 addition & 1 deletion datetimes.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,7 @@ datetimes %>%

### Setting dates

You can also use each accessor function to set the components of a date or datetime.
You can also use each accessor funtion to set the components of a date or datetime.

```{r}
datetime
Expand Down
9 changes: 7 additions & 2 deletions dynamic-documents.Rmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# Dynamic Documents with R Markdown

...

## Output formats
## Slide syntax
## Customizing output
## Tables
## Citations and bibliographies
## Interactive documents
## Templates
31 changes: 30 additions & 1 deletion explore.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,33 @@

# Introduction

Data poses a cognitive problem; Data comprehension is a skill.
```{r, include = FALSE}
library(ggplot2)
library(dplyr)
```

If you are like most humans, your brain is not designed to work with raw data. The working memory can only attend to a few values at a time, which makes it difficult to discover patterns in raw data. For example, can you spot the striking relationship between $X$ and $Y$ in the table below?

```{r data, echo=FALSE}
x <- rep(seq(0.2, 1.8, length = 5), 2) + runif(10, -0.15, 0.15)
X <- c(0.02, x, 1.94)
Y <- sqrt(1 - (X - 1)^2)
Y[1:6] <- -1 * Y[1:6]
Y <- Y - 1
order <- sample(1:10)
knitr::kable(round(data.frame(X = X[order], Y = Y[order]), 2))
```

While we may stumble over raw data, we can easily process visual information. Within your mind is a visual processing system that has been fine-tuned by thousands of years of evolution. As a result, the quickest way to understand your data is to visualize it. Once you plot your data, you can instantly see the relationships between values. Here, we see that the values above fall on a circle.

```{r echo=FALSE, dependson=data}
ggplot2::qplot(X, Y) + ggplot2::coord_fixed(ylim = c(-2.5, 2.5), xlim = c(-2.5, 2.5))
```

Visualization works because your brain processes visual information in a different (and much wider) channel than it processes symbolic information, like words and numbers. However, visualization is not the only way to comprehend data.

You can also comprehend data by transforming it. You can easily attend to a small set of summary values, which lets you absorb important information about the data. This is why it feels natural to work with things like averages, maximums, minimums, medians, and so on.

Another way to summarize your data is to replace it with a model, a function that describes the relationships between two or more variables. You can attend to the important parts of a model more easily than you can attend to the raw values in your data set.

The first problem in Data Science is a cognitive problem: how can you understand your own data? In this part of the book, you'll learn how to use R to discover and understand the information contained in your data.
Binary file added images/EDA-boxplot.pdf
Binary file not shown.
Binary file added images/EDA-data-science-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/EDA-data-science-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/EDA-data-science-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/EDA-data-science-4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/EDA-data-science-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/EDA-hclust.pdf
Binary file not shown.
Binary file added images/EDA-kmeans.pdf
Binary file not shown.
Binary file added images/EDA-linkage.pdf
Binary file not shown.
Binary file added images/EDA-plotly.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 8 additions & 8 deletions iteration.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ In [functions], we talked about how important it is to reduce duplication in you
1. You're likely to have fewer bugs because each line of code is
used in more places.

One part of reducing duplication is writing functions. Functions allow you to identify repeated patterns of code and extract them out into independent pieces that you can reuse and easily update as code changes. Iteration helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets. (Generally, you won't need to use explicit iteration to deal with different subsets of your data: in most cases the implicit iteration in dplyr will take care of that problem for you.)
One part of reducing duplication is writing functions. Functions allow you to identify repeated patterns of code and extract them out into indepdent pieces that you can reuse and easily update as code changes. Iteration helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets. (Generally, you won't need to use explicit iteration to deal with different subsets of your data: in most cases the implicit iteration in dplyr will take care of that problem for you.)

In this chapter you'll learn about two important iteration paradigms: imperative programming and functional programming, and the machinary each provides. On the imperative side you have things like for loops and while loops, which are a great place to start because they make iteration very explicit, so it's obvious what's happening. However, for loops are quite verbose, and include quite a bit of book-keeping code, that is duplicated for every for loop. Functional programming (FP) offers tools to extract out this duplicated code, so each common for loop pattern gets its own function. Once you master the vocabulary of FP, you can solve many common iteration problems with less code, more ease, and fewer errors.
In this chapter you'll learn about two important iteration paradigms: imperative programming and functional programming, and the machinery each provides. On the imperative side you have things like for loops and while loops, which are a great place to start because they make iteration very explicit, so it's obvious what's happening. However, for loops are quite verbose, and include quite a bit of book-keeping code, that is duplicated for every for loop. Functional programming (FP) offers tools to extract out this duplicated code, so each common for loop pattern gets its own function. Once you master the vocabulary of FP, you can solve many common iteration problems with less code, more ease, and fewer errors.

Some people will tell you to avoid for loops because they are slow. They're wrong! (Well at least they're rather out of date, for loops haven't been slow for many years). The chief benefits of using FP functions like `lapply()` or `purrr::map()` is that they are more expressive and make code both easier to write and easier to read.
Some people will tell you to avoid for loops because they are slow. They're wrong! (Well at least they're rather out of date, as for loops haven't been slow for many years). The chief benefits of using FP functions like `lapply()` or `purrr::map()` is that they are more expressive and make code both easier to write and easier to read.

In later chapters you'll learn how to apply these iterating ideas when modelling. You can often use multiple simple models to help understand a complex dataset, or you might have multiple models because you're bootstrapping or cross-validating. The techniques you'll learn in this chapter will be invaluable.

Expand Down Expand Up @@ -116,7 +116,7 @@ That's all there is to the for loop! Now is a good time to practice creating som
1. Compute the mean of every column in the `mtcars`.
1. Determine the type of each column in `nycflights13::flights`.
1. Compute the number of unique values in each column of `iris`.
1. Generate 10 random normals for each of $\mu = -10$, $0$, $10$, and $100$.
1. Generate 10 random normals for each of $mu = -10$, $0$, $10$, and $100$.
Think about output, sequence, and body, __before__ you start writing
the loop.
Expand Down Expand Up @@ -248,7 +248,7 @@ for (i in seq_along(x)) {

### Unknown output length

Sometimes you might know now how long the output will be. For example, imagine you want to simulate some random vectors of random lengths. You might be tempted to solve this problem by progressively growing the vector:
Sometimes you might not know how long the output will be. For example, imagine you want to simulate some random vectors of random lengths. You might be tempted to solve this problem by progressively growing the vector:

```{r}
means <- c(0, 1, 2)
Expand All @@ -261,7 +261,7 @@ for (i in seq_along(means)) {
str(output)
```

But this type of is not very efficient because in each iteration, R has to copy all the data from the previous iterations. In technical terms you get "quadratic" ($O(n^2)$) behaviour which means that a loop with three times as many elements would take nine times ($3^2$) as long to run.
But this is not very efficient because in each iteration, R has to copy all the data from the previous iterations. In technical terms you get "quadratic" ($O(n^2)$) behaviour which means that a loop with three times as many elements would take nine times ($3^2$) as long to run.

A better solution to save the results in a list, and then combine into a single vector after the loop is done:

Expand Down Expand Up @@ -375,7 +375,7 @@ I mention while loops briefly, because I hardly ever use them. They're most ofte
}
```
## For loops vs functionals
## For loops vs. functionals
For loops are not as important in R as they are in other languages because R is a functional programming language. This means that it's possible to wrap up for loops in a function, and call that function instead of using the for loop directly.
Expand Down Expand Up @@ -529,7 +529,7 @@ There are a few differences between `map_*()` and `col_summary()`:
### Shortcuts
There are a few shortcuts that you can use with `.f` in order to save a little typing. Imagine you want to fit a linear model to each group in a dataset. The following toy example splits up the `mtcars` dataset into three pieces (one for each value of cylinder) and fits the same linear model to each piece:
There are a few shortcuts that you can use with `.f` in order to save a little typing. Imagine you want to fit a linear model to each group in a dataset. The following toy example splits the up the `mtcars` dataset in to three pieces (one for each value of cylinder) and fits the same linear model to each piece:
```{r}
models <- mtcars %>%
Expand Down
7 changes: 0 additions & 7 deletions model-assess.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,3 @@ ggplot(, aes(mse)) +
geom_histogram(binwidth = 0.25) +
geom_vline(xintercept = base_mse, colour = "red")
```


### Exercises

1. Given a list of formulas, use purr to fit them.

1. Given a list of model functions, and parameters, use purrr to fit them.
Loading

0 comments on commit 9e5560b

Please sign in to comment.