Skip to content

Commit

Permalink
reproduce fig. 6.9 with lattice and add rmd notes for lattice
Browse files Browse the repository at this point in the history
  • Loading branch information
szcf-weiya committed May 9, 2021
1 parent 83df85d commit 68b5f7d
Show file tree
Hide file tree
Showing 31 changed files with 1,031 additions and 6 deletions.
7 changes: 7 additions & 0 deletions data/Ozone/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Data frame with components: ozone, radiation, temperature, and wind.
Measurements of daily ozone con- centration (ppb), wind speed (mph),
daily maximum tempera- ture (degrees F), and solar radiation
(langleys) on 111 days from May to September 1973 in New York. This
data frame is similar to air in S-PLUS (or library(data) in S), but
has a different definition for ozone (air contains cube-roots of
ozone).
112 changes: 112 additions & 0 deletions data/Ozone/ozone.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
ozone radiation temperature wind
41 190 67 7.4
36 118 72 8
12 149 74 12.6
18 313 62 11.5
23 299 65 8.6
19 99 59 13.8
8 19 61 20.1
16 256 69 9.7
11 290 66 9.2
14 274 68 10.9
18 65 58 13.2
14 334 64 11.5
34 307 66 12
6 78 57 18.4
30 322 68 11.5
11 44 62 9.7
1 8 59 9.7
11 320 73 16.6
4 25 61 9.7
32 92 61 12
23 13 67 12
45 252 81 14.9
115 223 79 5.7
37 279 76 7.4
29 127 82 9.7
71 291 90 13.8
39 323 87 11.5
23 148 82 8
21 191 77 14.9
37 284 72 20.7
20 37 65 9.2
12 120 73 11.5
13 137 76 10.3
135 269 84 4
49 248 85 9.2
32 236 81 9.2
64 175 83 4.6
40 314 83 10.9
77 276 88 5.1
97 267 92 6.3
97 272 92 5.7
85 175 89 7.4
10 264 73 14.3
27 175 81 14.9
7 48 80 14.3
48 260 81 6.9
35 274 82 10.3
61 285 84 6.3
79 187 87 5.1
63 220 85 11.5
16 7 74 6.9
80 294 86 8.6
108 223 85 8
20 81 82 8.6
52 82 86 12
82 213 88 7.4
50 275 86 7.4
64 253 83 7.4
59 254 81 9.2
39 83 81 6.9
9 24 81 13.8
16 77 82 7.4
122 255 89 4
89 229 90 10.3
110 207 90 8
44 192 86 11.5
28 273 82 11.5
65 157 80 9.7
22 71 77 10.3
59 51 79 6.3
23 115 76 7.4
31 244 78 10.9
44 190 78 10.3
21 259 77 15.5
9 36 72 14.3
45 212 79 9.7
168 238 81 3.4
73 215 86 8
76 203 97 9.7
118 225 94 2.3
84 237 96 6.3
85 188 94 6.3
95.9999999999999 167 91 6.9
78 197 92 5.1
73 183 93 2.8
91 189 93 4.6
47 95 87 7.4
32 92 84 15.5
20 252 80 10.9
23 220 78 10.3
21 230 75 10.9
24 259 73 9.7
44 236 81 14.9
21 259 76 15.5
28 238 77 6.3
9 24 71 10.9
13 112 71 11.5
46 237 78 6.9
18 224 67 13.8
13 27 76 10.3
24 238 68 10.3
16 201 82 8
13 238 64 12.6
23 14 71 9.2
36 139 81 10.3
7 49 69 10.3
14 20 63 16.6
30 193 70 6.9
14 191 75 14.3
18 131 76 8
20 223 68 11.5
11 changes: 7 additions & 4 deletions docs/06-Kernel-Smoothing-Methods/6.3-Local-Regression-in-Rp.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
| ---- | ---------------------------------------- |
| 翻译 | szcf-weiya |
| 发布 | 2017-03-01 |
| 更新 | 2018-08-14|
| 更新 | {{ git_revision_date }}|
| 状态 | Done|

核光滑和局部回归可以非常自然地推广到二维或更高维空间中.Nadaraya–Watson 核光滑器局部拟合常值,其中权重由 $p$ 维核得到.局部线性回归通过加权最小二乘来拟合 $X$ 中局部的超平面,参数由 $p$ 维核给出.这可以很简单地实现,并且一般偏爱边界处有更好的表现的局部常值拟合.
Expand All @@ -15,7 +15,7 @@ $$
$$
得到拟合 $\hat f(x_0)=b(x_0)^T\hat \beta(x_0)$.一般地,核是径向函数,比如 radial Epanechnikov 或三次立方核
$$
K_\lambda(x_0,x)=D(\frac{\Vert x-x_0\Vert}{\lambda}),\tag{6.13}
K_\lambda(x_0,x)=D\left(\frac{\Vert x-x_0\Vert}{\lambda}\right),\tag{6.13}
$$
其中 $\Vert \cdot\Vert$ 是欧几里得范数.因为欧式范数取决于每个坐标的单位,所以对每个预测变量进行标准化是有意义的,举个例子,在光滑之前,标准化为单位标准误差.

Expand All @@ -25,10 +25,13 @@ $$

> 图 6.8. 左图显示了三维数据,其中响应变量为星系速度的测量值,两个预测变量记录了在宇宙中的位置.特别的星形设计表示衡量的方式,而且最终得到非常不规则的边界.右图显示了在 $\IR^2$ 中局部线性拟合的结果,采用含 $15\%$ 数据的最近邻窗口.
局部回归在高于 $2$ 或 $3$ 维中不是很有用.例如,在第 $2$ 章中,我们已经详细讨论了维数的问题.当维数增加时,同时维持局部(低偏差)并且邻域中相当大规模的样本是不可能的,总体数据大小没有随着 $p$ 指数增长.$\hat f(X)$ 的可视化在高维中会变得困难,并且这通常是光滑的其中一个主要目标.尽管图 6.8 中的散点云 (scatter-cloud)和线框 (wire-frame) 看起来很吸引人,但除了在总量水平下,结果的解释是很困难的.从数据分析的角度,条件图象更有用.
局部回归在高于 $2$ 或 $3$ 维中不是很有用.例如,在第 $2$ 章中,我们已经详细讨论了维数的问题.当维数增加时,同时维持局部(低偏差)并且邻域中相当大规模的样本是不可能的,总体数据大小没有随着 $p$ 指数增长.$\hat f(X)$ 的可视化在高维中会变得困难,并且这通常是光滑的其中一个主要目标.尽管图 6.8 中的 **散点云 (scatter-cloud)****线框图 (wire-frame)** 看起来很吸引人,但除了在总量水平下,结果的解释是很困难的.从数据分析的角度,条件图象更有用.

图 6.9 显示了三个预测变量的一些环境数据的分析.这里的网格显示出了在其他两个变量,温度和风速的条件下,臭氧作为辐射的函数.然而,在某变量的值的条件下意味着确实表明对这个值是局部的(正如在局部回归中一样).图 6.9 中的每个图是在该图中在每个条件值下显示出值的范围.在图本身,显示了数据子集(响应变量相对于剩余变量),以及一个对数据的一维局部线性回归.尽管当观察拟合的 3 维表明不是完全一样的,但可能对理解数据的联合行为是有用的.

![](../img/06/fig6.9.png)

> 图 6.9. 三维光滑例子.响应变量是臭氧浓度(立方根),并且这三个预测变量分别是温度,风速和辐射.网格显示了在温度区间和风速条件下(由深绿或橘黄阴影条表示)臭氧浓度作为辐射的函数.每个图包含每个条件变量大概 $40\%$ 的区间.每个图中的曲线是对图中数据的单变量局部线性回归拟合.
> 图 6.9. 三维光滑例子.响应变量是臭氧浓度(立方根),并且这三个预测变量分别是温度,风速和辐射.网格显示了在温度区间和风速条件下(由深绿或橘黄阴影条表示)臭氧浓度作为辐射的函数.每个图包含每个条件变量大概 $40\%$ 的区间.每个图中的曲线是对图中数据的单变量局部线性回归拟合.
!!! info "weiya 注:"
已重现图 6.9,详见 [Reproduce Figures with Lattice](https://esl.hohoweiya.xyz/rmds/lattice.html)
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
| ---- | ---------------------------------------- |
| 翻译 | szcf-weiya |
| 发布 | 2016-09-30 |
| 更新|2019-07-30 09:44:04|
| 更新|{{ git_revision_date }}|
| 状态| Done|


Expand All @@ -14,7 +14,7 @@

!!! note "weiya 注:Recall"
$$
K_\lambda(x_0,x)=D(\frac{\Vert x-x_0\Vert}{\lambda})\tag{6.13}\label{6.13}
K_\lambda(x_0,x)=D\left(\frac{\Vert x-x_0\Vert}{\lambda}\right)\tag{6.13}\label{6.13}
$$

一种方式是修改核.默认的球面核 \eqref{6.13} 对每个坐标给出了相等的权重,所以一种自然的默认策略是对每个变量标准化得到单位标准误差.更一般的方式是使用半正定矩阵 $\mathbf A$ 来对不同的坐标进行赋予权重:
Expand Down
18 changes: 18 additions & 0 deletions imgs/fig.6.9.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
data = read.csv("../data/Ozone/ozone.csv", sep = "\t")
# calculate the overlap 4*0.4-3x=1 => x = 0.2
# the overlap in w.r.t. each interval instead of the whole interval
# prop = 0.2 / 0.4 = 0.5
Wind = equal.count(data$wind, number = 4, overlap = 0.5)
Temp = equal.count(data$temp, number = 4, overlap = 0.5)
mypanel = function(x, y) {
panel.xyplot(x, y)
panel.grid()
panel.loess(x, y)
}
xyplot(I(ozone^(1/3)) ~ radiation | Temp * Wind, data = data,
panel = mypanel,
xlab = "Solar Radiation (langleys)",
ylab = "Cube Root Ozone (cube root ppb)")

coplot(I(ozone^(1/3)) ~ radiation | temperature * wind, data = data,
number = 4, overlap = 0.5)
1 change: 1 addition & 0 deletions rmds/_site/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,7 @@ <h1 class="title toc-ignore">Rmd Gallery</h1>
<li><a href="tree-based-methods.html">Tree-Based Methods</a></li>
<li><a href="resampling-methods.html">Resampling Methods</a></li>
<li><a href="non-linear-modeling.html">Non-linear Modeling</a></li>
<li><a href="lattice.html">Reproduce Figures with Lattice</a></li>
</ul>

<p>Copyright &copy; 2016-2021 weiya</p>
Expand Down
Loading

0 comments on commit 68b5f7d

Please sign in to comment.