forked from perlatex/R_for_Data_Science
-
Notifications
You must be signed in to change notification settings - Fork 0
/
eda_lazyman.Rmd
380 lines (258 loc) · 7.29 KB
/
eda_lazyman.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
# 懒人系列 {#eda-lazyman}
R社区上很多大神,贡献了很多非常优秀的工具,节省了我们的时间,也给我们的生活增添了无限乐趣。我平时逛github的时候时整理一些,现在分享出来供像我一样的懒人用,因此本文档叫“懒人系列”。欢迎大家补充。
## 列名太乱了
```{r lazyman01, message = FALSE, warning = FALSE}
library(tidyverse)
library(janitor)
## install.packages("janitor")
## https://github.com/sfirke/janitor
```
```{r lazyman-1}
fake_raw <- tibble::tribble(
~id, ~`count/num`, ~W.t, ~Case, ~`time--d`, ~`%percent`,
1L, "china", 3L, "w", 5L, 25L,
2L, "us", 4L, "f", 6L, 34L,
3L, "india", 5L, "q", 8L, 78L
)
fake_raw
```
```{r lazyman-2}
fake_raw %>% janitor::clean_names()
```
## 比count()更懂我的心
```{r lazyman-3}
mtcars %>%
dplyr::count(cyl, sort = TRUE) %>%
mutate(percent = 100 * n / sum(n))
```
```{r lazyman-4, eval=FALSE}
mtcars %>%
janitor::tabyl(cyl)
```
## 比distinct()更知我心
```{r lazyman-5}
df <- tribble(
~id, ~date, ~store_id, ~sales,
1, "2020-03-01", 1, 100,
2, "2020-03-01", 2, 100,
3, "2020-03-01", 3, 150,
4, "2020-03-02", 1, 110,
5, "2020-03-02", 3, 101
)
df %>%
janitor::get_dupes(store_id)
df %>%
janitor::get_dupes(date)
```
## 代码太乱了,谁帮我整理下
```{r lazyman-6}
## install.packages("styler")
```
```{r lazyman-7, out.width = '100%', echo = FALSE}
knitr::include_graphics("images/styler.png")
```
安装后,然后这两个地方点两下,就发现你的代码整齐很多了。或者直接输入
```{r lazyman-8, eval=FALSE}
styler:::style_active_file()
```
## 用datapasta粘贴小表格
有时候想把excel或者网页上的小表格,放到R里测试下,如果用readr读取excel小数据,可能觉得麻烦,或者大材小用。比如网页<https://en.wikipedia.org/wiki/Table_(information)>有个表格,有偷懒的办法弄到R?
```{r, out.width = '100%', echo = FALSE}
knitr::include_graphics("images/datapaste1.png")
```
推荐一个[方法](https://github.com/MilesMcBain/datapasta)
1、安装`install.packages("datapasta")`
2、鼠标选中并复制网页中的表格
3、在 Rstudio 中的`Addins`找到`datapasta`,并点击`paste as tribble`
```{r, out.width = '100%', echo = FALSE}
knitr::include_graphics("images/datapaste2.png")
```
## 谁帮我敲模型的公式
```{r lazyman-9}
library(equatiomatic)
## https://github.com/datalorax/equatiomatic
```
```{r lazyman-10}
mod1 <- lm(mpg ~ cyl + disp, mtcars)
```
```{r lazyman-11, results="asis"}
extract_eq(mod1)
```
```{r lazyman-12, results="asis"}
extract_eq(mod1, use_coefs = TRUE)
```
## 模型有了,不知道怎么写论文?
```{r lazyman-13}
library(report)
## https://github.com/easystats/report
```
```{r lazyman-report, results="asis"}
model <- lm(Sepal.Length ~ Species, data = iris)
report(model)
```
## 模型评估一步到位
```{r lazyman-performance}
library(performance)
model <- lm(mpg ~ wt * cyl + gear, data = mtcars)
performance::check_model(model)
```
## 统计表格不用愁
```{r lazyman-gtsummary1, results="asis", eval=FALSE}
library(gtsummary)
## https://github.com/ddsjoberg/gtsummary
gtsummary::trial %>%
dplyr::select(trt, age, grade, response) %>%
gtsummary::tbl_summary(
by = trt,
missing = "no"
) %>%
gtsummary::add_p() %>%
gtsummary::add_overall() %>%
gtsummary::add_n() %>%
gtsummary::bold_labels()
```
直接复制到论文即可
```{r lazyman-gtsummary2, results="asis", eval=FALSE}
t1 <-
glm(response ~ trt + age + grade, trial, family = binomial) %>%
gtsummary::tbl_regression(exponentiate = TRUE)
t2 <-
survival::coxph(survival::Surv(ttdeath, death) ~ trt + grade + age, trial) %>%
gtsummary::tbl_regression(exponentiate = TRUE)
gtsummary::tbl_merge(
tbls = list(t1, t2),
tab_spanner = c("**Tumor Response**", "**Time to Death**")
)
```
## 统计结果写图上
```{r lazyman-statsExpressions, out.width='99%', eval=FALSE, include=TRUE}
library(ggplot2)
library(statsExpressions)
# https://github.com/IndrajeetPatil/statsExpressions
ggplot(mtcars, aes(x = mpg, y = wt)) +
geom_point() +
geom_smooth(method = "lm") +
labs(
title = "Spearman's rank correlation coefficient",
subtitle = expr_corr_test(mtcars, mpg, wt, type = "nonparametric")
)
```
## 正则表达式太南了
```{r lazyman-14}
library(inferregex)
## remotes::install_github("daranzolin/inferregex")
```
```{r lazyman-15}
s <- "abcd-9999-ab9"
infer_regex(s)$regex
```
有了它,妈妈再也不担心我的正则表达式了
## 颜控怎么配色?
```{r lazyman-16}
library(ggthemr) ## devtools::install_github('cttobin/ggthemr')
ggthemr("dust")
```
```{r lazyman-17}
mtcars %>%
mutate(cyl = factor(cyl)) %>%
ggplot(aes(x = mpg, fill = cyl, colour = cyl)) +
geom_density(alpha = 0.75) +
labs(fill = "Cylinders", colour = "Cylinders", x = "MPG", y = "Density") +
legend_top()
```
用完别忘了
```{r lazyman-18}
ggthemr_reset()
```
## 画图颜色好看不
scales也是大神的作品,功能多多
```{r lazyman-19}
## https://github.com/r-lib/scales
library(scales)
show_col(viridis_pal()(10))
```
不推荐个人配色,因为我们不专业。直接用专业的配色网站
[colorbrewer](https://colorbrewer2.org/)
先看看颜色,再选择
## 宏包太多
```{r lazyman-20, eval=FALSE}
library(pacman)
## p_load(lattice, foreign, boot, rpart)
```
唉,这个`library()`都要偷懒,真服了你们了
## 犹抱琵琶半遮面
```{r lazyman-gganonymize}
## https://github.com/EmilHvitfeldt/gganonymize
library(ggplot2)
library(gganonymize)
ggg <-
ggplot(mtcars, aes(as.factor(cyl))) +
geom_bar() +
labs(
title = "Test title",
subtitle = "Test subtitle, this one have a lot lot lot lot lot more text then the rest",
caption = "Test caption",
tag = 1
) +
facet_wrap(~vs)
gganonomize(ggg)
```
你可以看我的图,但就不想告诉你图什么意思,因为我加密了
## 整理Rmarkdown
```{r lazyman-21}
# remotes::install_github("tjmahr/WrapRmd")
# remotes::install_github("fkeck/quickview")
# remotes::install_github("mwip/beautifyR")
```
## 如何有效的提问
直接看官方网站,这里不举例了
```{r lazyman-22}
## install.packages("reprex")
## https://reprex.tidyverse.org/
```
## 程序结束后记得提醒我
```{r lazyman-23}
## beepr::beep(sound = "mario")
```
你听到了声音吗?
## 多张图摆放
```{r lazyman-patchwork}
library(patchwork)
p1 <- ggplot(mtcars) +
geom_point(aes(mpg, disp))
p2 <- ggplot(mtcars) +
geom_boxplot(aes(gear, disp, group = gear))
p3 <- ggplot(mtcars) +
geom_smooth(aes(disp, qsec))
p1 + p2 + p3
```
## 缺失值处理
```{r lazyman-naniar}
library(naniar)
## https://github.com/njtierney/naniar
airquality %>%
group_by(Month) %>%
naniar::miss_var_summary()
```
## 看看数据什么情况
```{r lazyman-24}
library(visdat)
vis_dat(airquality)
```
## 管道都不想
管道都不想写, 写代码还有美感?
```{r lazyman-25}
## library(nakepipe)
```
## 各种插件,任君选取
```{r lazyman-26}
## https://github.com/daattali/addinslist
```
```{r lazyman-27, echo = F}
# remove the objects
# rm(list=ls())
rm(df, fake_raw, ggg, mod1, model, p1, p2, p3, s)
```
```{r lazyman-28, echo = F, message = F, warning = F, results = "hide"}
pacman::p_unload(pacman::p_loaded(), character.only = TRUE)
```