forked from szcf-weiya/ESL-CN
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlattice.Rmd
276 lines (217 loc) · 9.32 KB
/
lattice.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
---
title: "Reproduce Figures with Lattice"
author: "weiya"
date: "5/9/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
`lattice` 包可以很方便地作出 [trellis 图](https://esl.hohoweiya.xyz/10-Boosting-and-Additive-Trees/10.13-Interpretation/index.html)。
```{r}
library(lattice)
```
首先看一个简单的例子,考虑这个包自带的纽约合唱团歌手身高数据,
```{r}
str(singer)
```
可以用下列代码展示不同声部的歌手的身高分布,
```{r}
histogram(~height | voice.part, data = singer,
main = "Distribution of Heights by Voice Pitch",
xlab = "Height (inches)")
```
其中 `voice.part` 被称为 conditioning variable,而 `height` 为 dependent variable。
在 trellis 图中,conditioning variable 的每个 level 都会创建一个独立的 panel。如果有多个 conditioning variable,则 (factor) level 的每个组合都会有一个 panel。每个 panel 会有一个标签,被称之为 strip。用户可以控制
- 每个 panel 中的图象
- strip 的格式和位置
- panel 的排列方式
- 图例 legend 的位置及内容
- 其它图象特点
`lattice` 可以作出
- 单变量图 `~x | A`:dot plots, kernel density plots, histograms, bar charts, box plots
- 双变量图 `y~x | A`:scatter plots, strip plots, parallel box plots
- 多变量图 `z~x*y | A`:3D plots, scatter plot matrices
下面用 `mtcars` 数据集进行详细说明,
```{r}
str(mtcars)
attach(mtcars)
```
```{r}
gear = factor(gear, levels = c(3, 4, 5),
labels = c("3 gears", "4 gears", "5 gears"))
cyl = factor(cyl, levels = c(4, 6, 8),
labels = c("4 cylinders", "6 cylinders", "8 cylinders"))
densityplot(~mpg, main = "Density Plot", xlab = "Miles Per Gallon")
densityplot(~mpg | cyl, main = "Density Plot by Numbers of Cylinders", xlab = "Miles Per Gallon")
bwplot(cyl ~ mpg | gear, main = "Box Plots by Cylinders and Gears",
xlab = "Miles Per Gallon", ylab = "Cylinders")
xyplot(mpg ~ wt | cyl * gear, main = "Scatter Plots by Cylinders and Gears",
xlab = "Car Weight", ylab = "Miles Per Gallon")
cloud(mpg ~ wt * qsec | cyl, main = "3D Scatter Plots by Cylinders")
dotplot(cyl ~ mpg | gear,
main = "Dot Plots by Number of Gears and Cylinders", # gear x cyl!!
xlab = "Miles Per Gallon")
splom(mtcars[c(1, 3, 4, 5, 6)], main = "Scatter Plot Matrix for mtcars Data")
detach(mtcars)
```
与基本作图系统不同的是,`lattice` 可以将图保存为一个变量,而且可以后续更改,
```{r}
mygraph = densityplot(~height | voice.part, data = singer)
update(mygraph, col = "red", pch = 16, cex = 0.8, jitter = 0.05, lwd = 2)
```
## Conditioning Variables
一般地,conditioning variables 是 factors。对于连续变量,可以通过 `cut()` 函数将其转换为离散变量。不过 `lattice` 将连续变量转换成所谓的 `shingle` 结构。具体地,连续变量被划分成若干个(可能)重叠的区域。比如,
```{r}
displacement = equal.count(mtcars$disp, number = 3, overlap = 0)
```
会将连续变量 `mtcars$disp` 划分成 `3` 个区间,有 `proportion=0` 的重叠部分,每个区间观测值的个数相等,然后可以将其作为 conditioning variable 进行作图,
```{r}
xyplot(mpg ~ wt | displacement, data = mtcars,
main = "Miles Per Gallon vs. Weight by Engine Displacement",
xlab = "Weight", ylab = "Miles Per Gallon",
layout = c(3, 1), # 3 columns
aspect = 1.5 # height / width
)
```
其中 strip 的深色部分则表示连续变量作为 conditioning variable 的取值范围。
注意 proportion 是相对于每个区间的比例,而非整个长度的比例,所以确定每个区间观测值个数的公式为
$$
\frac{N}{n(1-\text{overlap}) + \text{overlap}}
$$
具体代码为
```{r}
co.intervals
```
## panel functions
对于每个 high-level 画图函数,`graph_function`,其默认的 panel 函数即为 `panel.graph_function`。比如 `xyplot` 的 panel 函数为 `panel.xyplot`。
我们可以定义自己的 panel 函数使得每个 panel 可以加入更多的东西,比如回归直线。
```{r}
# custom panel function
mypanel = function(x, y) {
panel.xyplot(x, y, pch = 19)
panel.rug(x, y)
panel.grid(h = -1, v = -1) # use negative numbers forces the grid to line up with the axis labels
panel.lmline(x, y, col = "red", lwd = 1, lty = 2)
}
xyplot(mpg ~ wt | displacement, data = mtcars,
main = "Miles Per Gallon vs. Weight by Engine Displacement",
xlab = "Weight", ylab = "Miles Per Gallon",
layout = c(3, 1), # 3 columns
aspect = 1.5, # height / width
panel = mypanel
)
```
考虑另一个例子,
```{r}
mtcars$transmission = factor(mtcars$am, levels = c(0, 1), labels = c("Automatic", "Manual"))
panel.smoother = function(x, y) {
panel.grid(h=-1, v=-1)
panel.xyplot(x, y)
panel.loess(x, y)
panel.abline(h = mean(y), lwd = 2, lty = 2, col = "green")
}
xyplot(mpg ~ disp | transmission, data = mtcars,
scales = list(cex = .8, col = "red"), # scale annotations, or separately list(x=list(), y=list())
panel = panel.smoother,
xlab = "Displacement",
ylab = "Miles per Gallon",
main = "MGP vs Displacement by Transmission Type",
sub = "Dotted lines are Group Means", aspect = 1)
```
## Grouping variables
如果不想分开展示数据,而是叠在一起,则可以用 group variable。
```{r}
densityplot(~mpg, data = mtcars,
group = transmission,
main = "MPG Distribution by Transimission Type",
xlab = "Miles per Gallon",
auto.key = T,
#auto.key = list(space = "right", columns = 1, title = "Transmission")
)
```
图例可以进一步自定义,
```{r}
colors = c("red", "blue")
lines = c(1, 2)
points = c(16, 17)
key.trans = list(title = "Transmission",
space = "bottom", columns = 2,
text = list(levels(mtcars$transmission)),
points = list(pch = points, col = colors),
lines = list(col = colors, lty = lines),
cex.title = 1, cex = .9)
densityplot(~mpg, data = mtcars,
group = transmission,
main = "MPG Distribution by Transimission Type",
xlab = "Miles per Gallon",
pch = points, lty = lines, col = colors,
lwd = 2, jitter = 0.005,
key = key.trans
)
```
下面考虑同时使用 conditioning variable 和 grouping variable。数据集 `CO2` 描述了 12 种植物在 7 种二氧化碳含量的环境 `conc` 中对二氧化碳的吸收率 `uptake`,其中 6 种植物来自 Quebec,6 种来自 Mississippi,即 `Type` 变量,而每个地区各有三种在不同的 `Treatment` (chilled/nonchilled) 进行研究,
```{r}
str(CO2)
```
```{r}
colors = "darkgreen"
symbols = c(1:12)
linetype = c(1:3)
key.species = list(title = "Plant",
space = "right",
text = list(levels(CO2$Plant)),
points = list(pch = symbols, col = colors))
xyplot(uptake ~ conc | Type * Treatment, data = CO2,
group = Plant,
type = "o",
pch = symbols, col = colors, lty = linetype,
main = "Carbon Dioxide Uptake \nin Grass Plants",
ylab = expression(paste("Uptake ",
bgroup("(", italic(frac("umol", "m"^2)), ")"))), # bgroup?
xlab = expression(paste("Concentration ",
bgroup("(", italic(frac(mL, L)), ")"))),
sub = "Grass Species: Echinochloa crus-galli",
key = key.species)
```
## Graphic parameters
类似基本作图的 `par()`,`lattice` 也有类似的设置 graphic parameters 的命令,
```{r}
show.settings()
mysettings = trellis.par.get()
mysettings$superpose.symbol
```
默认时 `pch` 均为 1,将其改成对应的序号,
```{r}
mysettings$superpose.symbol$pch = c(1:10)
trellis.par.set(mysettings)
show.settings()
```
## Page arrangement
同上,`par` 在 `lattice` 不起作用,也就无法设置 `mfrow` 或者 `mfcol`,但这可以通过 `split` 进行实现,
```r
split = c(placement row, placement column, total number of rows, total number of columns)
```
比如
```{r}
graph1 = histogram(~height|voice.part, data = singer,
main = "Heights of Choral Singers by Voice Part")
graph2 = densityplot(~height, data = singer, group = voice.part,
plot.points = F, auto.key = list(columns = 4))
plot(graph1, split=c(1, 1, 1, 2))
plot(graph2, split=c(1, 2, 1, 2), newpage=FALSE)
```
另外也可以通过指定 `position = c(xmin, ymin, xmax, ymax)`,取值为 0 到 1 直接的比例数,原点在左下角,如
```{r}
plot(graph1, position = c(0, 0.3, 1, 1))
plot(graph2, position = c(0, 0, 1, 0.3), newpage = FALSE)
```
一幅 lattice 图中 panel 的顺序可以通过 `index.cond = ` 进行指定,比如将“1”声部放在一行,而“2”声部放在另一行,
```{r}
levels(singer$voice.part)
update(graph1, index.cond = list(c(2, 4, 6, 8, 1, 3, 5, 7)))
```
## Reproduce Fig. 6.9
Tricks on including external R scripts: https://stackoverflow.com/questions/52397430/include-code-from-an-external-r-script-run-in-display-both-code-and-output, and also note that `imgs` and `rmds` are on the same folder level, so the `read.csv()` with relative path works well.
```{r, code = readLines("../imgs/fig.6.9.R")}
```