-
Notifications
You must be signed in to change notification settings - Fork 25
/
Copy pathbase_plots.qmd
592 lines (437 loc) · 25.5 KB
/
base_plots.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
---
output: html_document
editor_options:
chunk_output_type: console
---
# Base plotting environment
```{r echo=FALSE}
source("libs/Common.R")
options(width = 1000)
```
```{r echo = FALSE}
R_ver()
```
## Loading the data
The data files used in this tutorial can be downloaded from the course's website as follows:
```{r download_data}
load(url("https://mgimond.github.io/ES218/Data/dat1_2.RData"))
```
This should load several data frame objects into your R session (note that not all are used in this exercise). The `dat1l` dataframe is a long table version of the crop yield dataset.
```{r}
head(dat1l, 3)
```
The `dat1w` dataframe is a wide table version of the same dataset.
```{r}
head(dat1w, 3)
```
The `dat2` dataframe is a wide table representation of income by county and by various income and educational attainment levels. The first few lines and columns are shown here:
```{r}
dat2[1:3, 1:7]
```
## Base plotting functions
### Point and line plots
The most commonly used plot function in R is `plot()` which is a generic function used for different plot types. For example, to plot male population median income (`dat2$B20004007`) vs female population median income (`dat2$B20004013`) for each county, type:
```{r fig.height=3, fig.width=4, echo=2}
OP <- par(mar=c(4,4,1,1))
plot(B20004007 ~ B20004013, dat = dat2)
par(OP)
```
The above `plot` command takes two arguments: `B20004007 ~ B20004013` which is to be read as "*plot variable B20004007 as a function of B20004013*", and `dat = dat2` which tells the plot function which data frame to extract the variables from. Another way to call this command is to type:
```{r eval=FALSE}
plot(dat2$B20004007 ~ dat2$B20004013)
```
The `plot` function can take on many other arguments to help tweak the default plot options. For example, we may want to change the axis labels to something more descriptive than the table column names:
```{r fig.height=3, fig.width=4, echo=2}
OP <- par(mar=c(4,4,1,1))
plot(B20004007 ~ B20004013, dat = dat2,
xlab = "Female median income ($)",
ylab = "Male median income ($)")
par(OP)
```
There are over 3000 observations in this dataset which makes it difficult the see what may be going on in the cloud of points. We can change the symbol type to solid fill,`pch = 20`, and set its color to 90% transparent (or 10% opaque) using the expression `col = rgb(0, 0, 0, 0.10)`. The `rgb()` function defines the intensities for each of the display's primary colors (on a scale of 0 to 1). The primary colors are red, green and blue. The forth value is optional and provides the fraction opaqueness with a value of `1` being completely opaque.
```{r fig.height=3, fig.width=4, echo=2}
OP <- par(mar=c(4,4,1,1))
plot(B20004007 ~ B20004013, dat = dat2,
xlab = "Female median income ($)",
ylab = "Male median income ($)",
pch = 20, col = rgb(0, 0, 0, 0.10) )
par(OP)
```
The plot could use additional tweaking, but it may be best to build the plot from scratch as will be demonstrated a few sections down.
By default, the `plot` command will plot points and not lines. To plot lines, add the `type="l"` parameter to the `plot` function. For example, to plot oats crop yield as a function of year from our `dat1w` dataset, type:
```{r fig.height=3, fig.width=4, echo=2}
OP <- par(mar=c(4,4,1,1))
plot(Oats ~ Year, dat = dat1w, type="l",
ylab = "Oats yield (Hg/Ha)" )
par(OP)
```
To plot both points and line, set the `type` parameter to `"b"` (for *both*). We'll also set the point symbol to `20`.
```{r fig.height=3, fig.width=4, echo=2}
OP <- par(mar=c(4,4,1,1))
plot(Oats ~ Year, dat = dat1w, type = "b", pch = 20,
ylab = "Oats yield (Hg/Ha)" )
par(OP)
```
The `plot` command can only graph on variable. If you want to add another variable, you will need to call the `lines` function. We will assign a different line type to this second variable (`lty = 2`):
```{r fig.height=3, fig.width=4, echo=2:3,results='hold'}
OP <- par(mar=c(4,4,1,1))
plot(Oats ~ Year, dat = dat1w, type = "l",
ylab = "Oats yield (Hg/Ha)" )
lines(Barley ~ Year, dat = dat1w, lty = 2)
par(OP)
```
Note how the plot does not automatically re-scale to accommodate the new line. The plot is a static object meaning that we need to define the axes limits before calling the original plot function. Both axes limits can be set using the `xlim` and `ylim` parameters. We don't need to set the x-axis range since both variables cover the same year range. We will therefore only focus on the y-axis limits. We can grab both the minimum and maximum values for the variables `Oats` and `Barley` using the `range` function, then pass the range to the `ylim` parameter in the call to `plot`.
```{r fig.height=3, fig.width=4, echo=2:4,results='hold'}
OP <- par(mar=c(4,4,1,1))
y.rng <- range( c(dat1w$Oats, dat1w$Barley) )
plot(Oats ~ Year, dat = dat1w, type = "l", ylim = y.rng,
ylab = "Oats yield (Hg/Ha)")
lines(Barley ~ Year, dat = dat1w, lty = 2)
par(OP)
```
Point plots from different variables can also be combined into a single plot using the `points` function in lieu of the `lines` function. In the following example, male vs. female income for population having a high school degree (blue dots) and a Bachelor's degree (red dots) will be overlaid on the same plot. We'll also add a legend in the top-right corner.
```{r fig.height=3, fig.width=4, echo=2:12,results='hold'}
OP <- par(mar=c(4,4,1,1))
y.rng <- range( c(dat2$B20004009, dat2$B20004011) , na.rm = TRUE)
x.rng <- range( c(dat2$B20004015, dat2$B20004017) , na.rm = TRUE)
# Plot income for HS degree
plot(B20004009 ~ B20004015, dat = dat2, pch = 20, col = rgb(0, 0, 1, 0.10),
xlab = "Female median income ($)",
ylab = "Male median income ($)",
xlim = x.rng, ylim = y.rng)
# Add points for Bachelor's degree
points(dat2$B20004011 ~ dat2$B20004017, dat = dat2, pch = 20,
col = rgb(1,0,0,0.10))
# Add legend
legend("topright", c("HS Degree", "Bachelor's"), pch = 20,
col = c(rgb(0, 0, 1, 1), rgb(1, 0, 0, 1) ))
par(OP)
```
The `na.rm = TRUE` option is added as a parameter in the `range` function to prevent an `NA` value in the data from returning an `NA` value in the range.
### Customizing point symbols
The point size can be adjusted using the `cex` argument. The default value is `1`. Values less than `1` decrease the point size, while values greater than `1` increase it.
Point symbols are defined by a numeric code. The following figure shows the list of point symbols available in R along with their numeric designation as used with the `pch` argument. The symbol's color can be defined using the `col` parameter. For symbols 21 through 25, which have a two-color scheme, `col` applies to the outline color (blue in the following figure) and `bg` parameter applies to the fill color (red in the following figure).
```{r echo=FALSE, fig.height=3, fig.width=4}
OP <- par(mar=c(0,0,0,2))
plot(rep(1:5,5), rep(1:5, each = 5), pch=1:25, col="blue", bg="red",
cex=1.5, xlab = NA, ylab = NA, axes = FALSE, xlim=c(0,6), ylim=c(0,6))
text(rep(1:5,5), rep(1:5, each = 5), as.character(1:25), pos=4)
par(OP)
```
### Defining colors
In R, colors can be specified in several ways, including built-in color names, RGB values, and hexadecimal codes.
#### Using built-in color names
R provides a set of named colors that can be retrieved using the `colors()` function. These can be used directly in plotting functions. For example
```{r fig.height= 0.8, fig.width = 3.5, echo = 2}
OP <- par(mar=c(2.5,0,0,0), yaxt = "n")
barplot(rep(1, 5), col = c("red", "blue", "green", "orange", "purple"),
names.arg = c("Red", "Blue", "Green", "Orange", "Purple"))
par(OP)
```
#### Using the `rgb()` function
Colors can also be defined using the `rgb(red, green, blue, alpha)` function where values for `red`, `green`, and `blue` are specified between 0 and 1. The argument `alpha` is optional and sets the color's transparency with `0` = fully transparent and `1` = fully opaque.
```{r fig.height= 0.8, fig.width = 3.5, echo = 2:4}
OP <- par(mar=c(2.5,0,0,0), yaxt = "n")
color_rgb <- c(rgb(1, 0, 0), # Red
rgb(0, 0, 1), # Blue
rgb(0, 1, 0), # Green
rgb(1, 0.6, 0), # Orange
rgb(0.6, 0.1, 0.9)) # Purple
barplot(rep(1, length(color_rgb)), col = color_rgb,
names.arg = c("Red", "Blue", "Green", "Orange", "Purple"))
par(OP)
```
#### Using hexadecimal color codes
Hexadecimal color codes represent colors using six-character strings prefixed with `#`, where the first two characters control red, the next two green, and the last two blue.
```{r fig.height= 0.8, fig.width = 3.5, echo = 2:4}
OP <- par(mar=c(2.5,0,0,0), yaxt = "n")
color_hex <- c("#FF0000", # Red
"#0000FF", # Blue
"#00FF00", # Green
"#FF9900", # Orange
"#991AE6") # Purple
barplot(rep(1, length(color_hex)), col = color_hex,
names.arg = c("Red", "Blue", "Green", "Orange", "Purple"))
par(OP)
```
### Customizing line symbols
The line width can be adjusted using the `lwd` argument. The default value is `1`. Values less than `1` make the line thinner, while values greater than `1` make it thicker.
Line types can also be customized in the `plot` function using the `lty` argument. There are six different line types, each identified by a number:
```{r echo=FALSE, fig.height=2, fig.width=2}
OP <- par(mar=c(0,3,0,0))
plot(0,pch='',xlab = NA, ylab = NA, axes = F, xlim=c(0,7), ylim=c(0,7))
abline(h=1:6, lty=1:6)
axis(2,at=1:6, as.character(1:6), col="white", las = 2)
par(OP)
```
### Boxplots
A boxplot is one of many graphical tools used to summarize the distribution of a data batch. The graphic consists of a "box" that depicts the range covered by 50% of the data (aka the interquartile range, IQR), a horizontal line that displays the median, and "whiskers" that depict 1.5 times the IQR or the largest (for the top half) or smallest (for the bottom half) values.
For example, we can summarize the income range for all individuals as follows:
```{r fig.height=2.5, fig.width=2, echo=2}
OP <- par(mar=c(0,2,0,0))
boxplot(dat2$B20004001, na.rm = TRUE)
par(OP)
```
Note that the `boxplot` function has no option to specify the data frame as is the case with the `plot` function; we must therefore pass it both the data frame name and the variable as a single argument (i.e. `dat2$B20004001`).
Several variables can be summarized on the same plot.
```{r fig.height=2.5, fig.width=3, echo=2}
OP <- par(mar=c(2,2,1,2))
boxplot(dat2$B20004001, dat2$B20004007, dat2$B20004013,
names = c("All", "Male", "Female"), main = "Median income ($)")
par(OP)
```
The `names` argument labels the x-axis and the `main` argument labels the plot title.
The outliers can be removed from the plot, if desired, by setting the `outline` parameter to `FALSE`:
```{r fig.height=2.5, fig.width=3, echo=2}
OP <- par(mar=c(2,2,1,2))
boxplot(dat2$B20004001, dat2$B20004007, dat2$B20004013,
names = c("All", "Male", "Female"), main = "Median income ($)",
outline = FALSE)
par(OP)
```
The boxplot graph can also be plotted horizontally by setting the `horizontal` parameter to `TRUE`:
```{r fig.height=2.5, fig.width=3, echo=2}
OP <- par(mar=c(2,2,1,1))
boxplot(dat2$B20004001, dat2$B20004007, dat2$B20004013,
names = c("All", "Male", "Female"), main = "Median income ($)",
outline = FALSE, horizontal = TRUE)
par(OP)
```
The last two plots highlight one downside in using a table in a wide format: the long series of column names passed to the boxplot function. It's more practical to store such data in long form. To demonstrate this, let's switch back to the crop data. To plot all columns in `dat1w`, we would need to type:
```{r fig.height=2.5, fig.width=5, echo=2}
OP <- par(mar=c(2,2,1,1))
boxplot(dat1w$Barley, dat1w$Buckwheat, dat1w$Maize, dat1w$Oats,dat1w$Rye,
names=c("Barley", "Buckwheat", "Maize", "Oats", "Rye"))
par(OP)
```
If you use the long version of that table, the command looks like this:
```{r fig.height=2.5, fig.width=5, echo=2}
OP <- par(mar=c(2,2,1,1))
boxplot(Yield ~ Crop, dat1l)
par(OP)
```
where `~ Crop` tells the function to split the boxplots across unique `Crop` levels.
One can order the boxplots based on the median values. By default, `boxplot` will order the boxplots following the factor's level order. In our example, the crop levels are ordered alphabetically. To reorder the levels following the median values of yields across each level, we can use the `reorder()` function:
```{r}
dat1l$Crop.ord <- reorder(dat1l$Crop, dat1l$Yield, median)
```
This creates a new variable called `Crop.ord` whose values mirror those in variable `Crop` but differ in the underlying level order:
```{r}
levels(dat1l$Crop.ord)
```
If we wanted the order to be in descending order, we would prefix the value parameter with the negative operator as in `reorder(dat1l$Crop, -dat1l$Yield, median)`.
The function `reorder` takes three arguments: the factor whose levels are to be reordered (`Crop`), the value whose quantity will determine the new order (`Yield`) and the statistic that will be used to summarize the values across each factor's level (`median`).
The modified boxplot expression now looks like:
```{r fig.height=2.5, fig.width=5, echo=2}
OP <- par(mar=c(2,2,1,1))
boxplot(Yield ~ Crop.ord, dat1l)
par(OP)
```
Alternatively, you could have embedded the `reorder` function directly in the plotting function.
```{r fig.height=2.5, fig.width=5, echo=2}
OP <- par(mar=c(2,2,1,1))
boxplot(Yield ~ reorder(Crop, Yield, median), dat1l)
par(OP)
```
### Histograms
The histogram is another form of data distribution visualization. It consists of partitioning a batch of values into intervals of equal length then tallying their count in each interval. The height of each bar represents these counts. For example, we can plot the histogram of maize yields using the `hist` function as follows:
```{r fig.height=2.5, fig.width=4, echo=2}
OP <- par(mar=c(4,4,1,1))
hist(dat1w$Maize, xlab = "Maize", main = NA)
par(OP)
```
The `main = NA` argument suppresses the plot title.
To control the number of bins add the `breaks` argument. The `breaks` argument can be passed different types of values. The simplest value is the desired number of bins. Note, however, that you might not necessarily get the number of bins defined with the `breaks` argument. For example assigning the value of `10` to `breaks` generates a 14 bin histogram.
```{r fig.height=2.5, fig.width=4, echo=2}
OP <- par(mar=c(4,4,1,1))
hist(dat1w$Maize, xlab = "Maize", main = NA, breaks = 10)
par(OP)
```
The documentation states that the breaks value *"is a suggestion only as the breakpoints will be set to `pretty` values"*. `pretty` refers to a function that rounds values to powers of 1, 2 or 5 times a power of 10.
If you want total control of the bin numbers, manually create the breaks as follows:
```{r fig.height=2.5, fig.width=4, echo=2:7}
OP <- par(mar=c(4,4,1,1))
n <- 10 # Define the number of bin
minx <- min(dat1w$Maize, na.rm = TRUE)
maxx <- max(dat1w$Maize, na.rm = TRUE)
bins <- seq(minx, maxx, length.out = n +1)
hist(dat1w$Maize, xlab = "Maize", main = NA, breaks = bins)
par(OP)
```
### Density plot
Histograms have their pitfalls. For example, the number of bins can drastically affect the appearance of a distribution. One alternative is the density plot which, for a series of x-values, computes the density of observations at each x-value. This generates a "smoothed" distribution of values.
Unlike the other plotting functions, `density` does not generate a plot but instead, it generates a list object. The output of `density` can be wrapped with a `plot` function to generate the plot.
```{r fig.height=2.5, fig.width=4, echo=2:3}
OP <- par(mar=c(4,4,1,1))
dens <- density(dat1w$Maize)
plot(dens, main = "Density distribution of Maize yields")
par(OP)
```
You can control the bandwidth using the `bw` argument. For example:
```{r fig.height=2.5, fig.width=4, echo=2:3}
OP <- par(mar=c(4,4,1,1))
dens <- density(dat1w$Maize, bw = 4000)
plot(dens, main = "Density distribution of Maize yields")
par(OP)
```
The bandwidth parameter adopts the variable's units.
## Customizing plots
So far, you have learned how to customize point and line symbols, but this may not be enough. You might want to modify other graphic elements such as the axes layout and label formats for publication. Let's see how we can further customize a plot of median income for male and female population having attained a HS degree.
First, we plot the points but omit the axes and its labels with the parameters `axes = FALSE, xlab = NA, ylab = NA`. We will want both axes to cover the same range of values, so we will use the `range` function to find min and max values for both male and female incomes.
Next, we draw the x axis using the `axis` function. The first parameter to this function is a number that indicates which axis is to be drawn (i.e. 1=bottom x, 2=left y, 3=top x and 4=right y). We will then use the `mtext` function to place the axis label under the axis line.
```{r customplot, fig.height=3, fig.width=4, echo=c(-1,-12)}
OP <- par(mar=c(4,4,1,4))
# Plot the points without the axes
rng <- range(dat2$B20004009, dat2$B20004015, na.rm = TRUE)
plot(B20004009 ~ B20004015, dat = dat2, pch = 20, col = rgb(0,0,0,0.15),
xlim = rng, ylim = rng, axes = FALSE, xlab = NA, ylab = NA )
# Plot the x-axis
lab <- c("5,000", "25,000", "45,000", "$65,000")
axis(1, at = seq(5000, 65000, length.out = 4), label = lab)
# Plot x label
mtext("Female median income (HS degree)", side = 1, line = 2)
par(OP)
```
Next, we will tackle the y-axis. We will rotate both the tic labels and axis label horizontally and place the axis label at the top of the axis. This will involve a different approach to that used for the x-axis. First, we need to identify each plot region's corner coordinate values using the `par` function. Second, we will use the `text` function instead of the `mtext` function to place the axis label.
First, let's plot the y-axis with the custom tic labels.
```{r, fig.height=3, fig.width=4, echo=c(-1:-9,-12)}
OP <- par(mar=c(4,4,1,4))
# Plot the points without the axes
rng <- range(dat2$B20004009, dat2$B20004015, na.rm=TRUE)
plot(B20004009 ~ B20004015, dat = dat2, pch = 20, col = rgb(0,0,0,0.10),
xlim = rng, ylim = rng, axes = FALSE, xlab = NA, ylab = NA )
# Plot the x-axis
lab <- c("5,000", "25,000", "45,000", "$65,000")
axis(1, at = seq(5000,65000, length.out = 4), label = lab)
# Plot x label
mtext("Female median income (HS degree)", side = 1, line = 2)
# Plot the y-axis
axis(2, las = 1, at = seq(5000,65000, length.out = 4), label = lab)
par(OP)
```
Now let's extract the plot's corner coordinate values.
```{r, fig.show='hide', echo=c(-1:-4, -7)}
OP <- par(mar=c(4,4,1,4))
# Plot the points without the axes
rng <- range(dat2$B20004009, dat2$B20004015, na.rm=TRUE)
plot(B20004009 ~ B20004015, dat=dat2, pch=20, col=rgb(0,0,0,0.10),
xlim=rng, ylim=rng, axes = FALSE, xlab = NA, ylab = NA )
loc <- par("usr")
loc
par(OP)
```
The corner location coordinate values are in the plot's x and y units. We want to place the label in the upper left hand corner whose coordinate values are `loc[1]=` `r loc[1]` and `loc[2]=` `r loc[2]`.
```{r, fig.height=3.5, fig.width=4, echo=12}
OP <- par(mar=c(4,4,3,4))
# Plot the points without the axes
rng <- range(dat2$B20004009, dat2$B20004015, na.rm=TRUE)
plot(B20004009 ~ B20004015, dat=dat2, pch=20, col=rgb(0,0,0,0.10),
xlim=rng, ylim=rng, axes = FALSE, xlab = NA, ylab = NA )
# Plot the x-axis
lab <- c("5,000", "25,000", "45,000", "$65,000")
axis(1, at=seq(5000,65000, length.out=4), label=lab)
# Plot x label
mtext("Female median income (HS degree)", side=1, line=2)
# Plot the y-axis
axis(2, las=1, at=seq(5000,65000, length.out=4), label=lab)
text(loc[1], loc[4], "Male median\nincome", pos = 3, adj = 1, xpd = TRUE)
par(OP)
```
The string `\n` in the text `"Median\nIncome"` is interpreted in R as being a carriage return--i.e it forces the text that follows this string to be printed on the next line. The other parameters of interest are `pos` and `adj` that position and adjust the label location (type `?axis` for more information on axis parameters) and the parameter `xpd=TRUE` allows for the `text` function to display text outside of the plot region.
## R colors palettes
R has built-in color palettes that can be used in your plotting functions. Note that the palettes are finite implying that if you have more unique categories than unique colors in a given palette, R will recycle the colors starting with the first color.
The built-in palettes are listed below:
```{r echo = FALSE, fig.width = 8, fig.height=5, message=FALSE}
library(ggplot2)
cols <- sapply(palette.pals(), palette.colors, n= NULL, recycle = FALSE)
cols.df <- data.frame(ID = rep(names(cols), sapply(cols, length)),
Obs = unlist(cols))
cols.df$ID <- factor(cols.df$ID,names(cols) )
ggplot(cols.df) + aes(x=ID) +
geom_dotplot(fill=cols.df$Obs,col=cols.df$Obs) + coord_flip() +
theme_classic() +
theme(axis.line = element_blank(),
axis.title = element_blank(),
axis.text.x = element_blank(),
axis.ticks = element_blank())
```
The default palette is `R4`.
For example, the following generates a point plot using the first 5 colors of `R4`.
```{r fig.height = 1, fig.width = 3, echo = 2 }
OP <- par(mar = c(1,1,0,0), fg = "grey")
plot(1:5, rep(1,5), type = "p", pch = 16, cex = 3, col = 1:5)
par(OP)
```
To change palette, you will need to pass the palette name as a parameter to the `palette()` function. For example, to color the points using the `Accent` palette, type:
```{r fig.height = 1, fig.width = 3, echo = 2:3 }
OP <- par(mar = c(1,1,0,0), fg = "grey")
palette("Accent")
plot(1:5, rep(1,5), type = "p", pch = 16, cex = 3, col = 1:5)
par(OP)
```
To revert back to the default palette, reset the palette via the `palette("R4")` command.
You can also create a continuous color scheme using R's built-in color ramp tools.
A flexible color ramp function is `colorRampPalette`. It takes as arguments the desired sequence of colors and the number of unique color swatches. For example, to generate 20 color swatches that range from blue to white to red, type:
```{r fig.height = 1, fig.width = 6, echo = 2:3 }
OP <- par(mar = c(1,1,0,0), fg = "grey")
col1 <- colorRampPalette(c("blue", "white", "red"))(20)
plot(1:20, rep(1,20), type = "p", pch = 16, cex = 3, col = col1)
par(OP)
```
You also have access to built-in color ramp functions. These include `terrain.colors`, `heat.colors`, `rainbow`, `cm.colors` and `topo.colors`. These functions take as arguments the number of colors to generate. For example, to generate 20 shades of `terrain.colors` type:
```{r fig.height = 1, fig.width = 6, echo = 2:3 }
OP <- par(mar = c(1,1,0,0), fg = "grey")
col2 <- terrain.colors(20)
plot(1:20, rep(1,20), type = "p", pch = 16, cex = 3, col = col2)
par(OP)
```
You can also leverage the built-in perceptually-based HCL color schemes. These schemes include `qualitative`, `sequential` and `diverging`. For example, to list the color palettes associated with a divergent color scheme, type:
```{r}
hcl.pals("diverging")
```
You can then pass any one of the listed color palettes to the `hcl.colors` function. The function takes two arguments: the desired number of color swatches and the color palette name. For example, to create the `"Vik"` divergent color scheme (note the upper-case `V`), type:
```{r fig.height = 1, fig.width = 6, echo = 2:3 }
OP <- par(mar = c(1,1,0,0), fg = "grey")
col2 <- hcl.colors(20, "Vik")
plot(1:20, rep(1,20), type = "p", pch = 16, cex = 3, col = col2)
par(OP)
```
## Exporting plots to image file formats
You might need to export your plots as standalone image files for publications. R will export to many different raster image file formats such as jpg, png, gif and tiff, and several vector file formats such as PostScript, svg and PDF. You can specify the image resolution (in dpi), the image height and width, and the size of the margins.
The following example saves the last plot as an uncompressed tiff file with a 5"x6" dimension and a resolution of 300 dpi. This is accomplished by simply book-ending the plotting routines between the `tiff()` and `dev.off()` functions:
```{r, eval=FALSE}
tiff(filename = "fig1.tif", width = 5, height = 6, units = "in",
compression = "none", res = 300)
# Plot the points without the axes
rng <- range(dat2$B20004009, dat2$B20004015, na.rm = TRUE)
plot(B20004009 ~ B20004015, dat = dat2, pch = 20, col = rgb(0,0,0,0.10),
xlim = rng, ylim = rng, axes = FALSE, xlab = NA, ylab = NA )
# Plot the x-axis
lab <- c("5,000", "25,000", "45,000", "$65,000")
axis(1, at = seq(5000,65000, length.out = 4), label = lab)
# Plot x label
mtext("Female median income (HS degree)", side = 1, line = 2)
# Plot the y-axis
axis(2, las = 1, at = seq(5000,65000, length.out = 4), label = lab)
text(loc[1], loc[4], "Male median\nincome", pos = 3, adj = 1, xpd = TRUE)
dev.off()
```
To save the same plot to a pdf file format, simply substitute `tiff()` with `pdf()` and adjust the parameters as needed:
```{r, eval=FALSE}
pdf(file = "fig1.pdf", width = 5, height = 6)
# Plot the points without the axes
rng <- range(dat2$B20004009, dat2$B20004015, na.rm = TRUE)
plot(B20004009 ~ B20004015, dat = dat2, pch = 20, col = rgb(0,0,0,0.15),
xlim = rng, ylim = rng, axes = FALSE, xlab = NA, ylab = NA )
# Plot the x-axis
lab <- c("5,000", "25,000", "45,000", "$65,000")
axis(1, at = seq(5000,65000, length.out=4), label = lab)
# Plot x label
mtext("Female median income (HS degree)", side = 1, line = 2)
# Plot the y-axis
axis(2, las = 1, at = seq(5000,65000, length.out = 4), label = lab)
text(loc[1], loc[4], "Male median\nincome", pos = 3, adj = 1, xpd = TRUE)
dev.off()
```