forked from hadley/r4ds
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathprogram.qmd
75 lines (60 loc) · 4.66 KB
/
program.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# Program {#sec-program-intro .unnumbered}
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
```
In this part of the book, you'll improve your programming skills.
Programming is a cross-cutting skill needed for all data science work: you must use a computer to do data science; you cannot do it in your head, or with pencil and paper.
```{r}
#| label: fig-ds-program
#| echo: false
#| out.width: ~
#| fig-cap: >
#| Programming is the water in which all other components of the data
#| science process swims.
#| fig-alt: >
#| Our model of the data science process with program (import, tidy,
#| transform, visualize, model, and communicate, i.e. everything)
#| highlighted in blue.
knitr::include_graphics("diagrams/data-science/program.png", dpi = 270)
```
Programming produces code, and code is a tool of communication.
Obviously code tells the computer what you want it to do.
But it also communicates meaning to other humans.
Thinking about code as a vehicle for communication is important because every project you do is fundamentally collaborative.
Even if you're not working with other people, you'll definitely be working with future-you!
Writing clear code is important so that others (like future-you) can understand why you tackled an analysis in the way you did.
That means getting better at programming also involves getting better at communicating.
Over time, you want your code to become not just easier to write, but easier for others to read.
Writing code is similar in many ways to writing prose.
One parallel which we find particularly useful is that in both cases rewriting is the key to clarity.
The first expression of your ideas is unlikely to be particularly clear, and you may need to rewrite multiple times.
After solving a data analysis challenge, it's often worth looking at your code and thinking about whether or not it's obvious what you've done.
If you spend a little time rewriting your code while the ideas are fresh, you can save a lot of time later trying to recreate what your code did.
But this doesn't mean you should rewrite every function: you need to balance what you need to achieve now with saving time in the long run.
(But the more you rewrite your functions the more likely your first attempt will be clear.)
In the following three chapters, you'll learn skills to improve your programming skills:
1. Copy-and-paste is a powerful tool, but you should avoid doing it more than twice.
Repeating yourself in code is dangerous because it can easily lead to errors and inconsistencies.
Instead, in @sec-functions, you'll learn how to write **functions** which let you extract out repeated code so that it can be easily reused.
2. Functions extract out repeated code, but you often need to repeat the same actions on different inputs.
You need tools for **iteration** that let you do similar things again and again.
These tools include for loops and functional programming, which you'll learn about in @sec-iteration.
3. As you read more code written by others, you'll see more code that doesn't use the tidyverse.
In @sec-base-r, you'll learn some of the most important base R functions that you'll see in the wild.
These functions tend to be designed to use individual vectors, rather than data frames, often making them a good fit for your programming needs.
## Learning more
The goal of these chapters is to teach you the minimum about programming that you need to practice data science.
Once you have mastered the material in this book, we strongly believe you should continue to invest in your programming skills.
Learning more about programming is a long-term investment: it won't pay off immediately, but in the long term it will allow you to solve new problems more quickly, and let you reuse your insights from previous problems in new scenarios.
To learn more you need to study R as a programming language, not just an interactive environment for data science.
We have written two books that will help you do so:
- [*Hands on Programming with R*](https://rstudio-education.github.io/hopr/), by Garrett Grolemund.
This is an introduction to R as a programming language and is a great place to start if R is your first programming language.
It covers similar material to these chapters, but with a different style and different motivation examples (based in the casino).
It's a useful complement if you find that these four chapters go by too quickly.
- [*Advanced R*](https://adv-r.hadley.nz/) by Hadley Wickham.
This dives into the details of R the programming language.
This is a great place to start if you have existing programming experience.
It's also a great next step once you've internalized the ideas in these chapters.