Skip to content

Commit

Permalink
Merge branch 'master' of github.com:erikaduan/r_tips
Browse files Browse the repository at this point in the history
  • Loading branch information
erikaduan committed May 28, 2022
2 parents 96f7a74 + 7d857b6 commit e512754
Show file tree
Hide file tree
Showing 7 changed files with 74 additions and 31 deletions.
30 changes: 6 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@ Many kudos to [Dr Chuanxin Liu](https://github.com/codetrainee), my former PhD s
+ [Introduction to hypergeometric, geometric, negative binomial and multinomial distributions](https://github.com/erikaduan/R_tips/blob/master/tutorials/2020-09-22_hypergeometric-and-other-discrete-distributions/2020-09-22_hypergeometric-and-other-discrete-distributions.md)


# Other resources
These resources also cover a comprehensive range of practical R usage tutorials.

+ [Statistical Computing](https://36-750.github.io/) by Alex Reinhart and Christopher Genovese
+ [Data Science Toolkit](https://benkeser.github.io/info550/lectures/) by David Benkeser

# Tutorial style guide

A painful form of technical debt is inconsistent code style. This repository now contains the following file naming and code style rules.
Expand Down Expand Up @@ -81,31 +87,7 @@ A painful form of technical debt is inconsistent code style. This repository now
version 1.4.0.
https://CRAN.R-project.org/package=stringr

+ Max Kuhn. (2019). `caret`: Classification and Regression
Training. R package version 6.0-84. https://CRAN.R-project.org/package=caret
+ Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony
Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem,
Luca Scrucca, Yuan Tang, Can Candan and Tyler Hunt.

+ Jacob Kaplan (2020). `fastDummies`: Fast Creation of Dummy (Binary) Columns and Rows from Categorical
Variables. R package version 1.6.1. https://CRAN.R-project.org/package=fastDummies

+ Kirill Müller (2017). `here`: A Simpler Way to Find Your Files. R package version 0.1.
https://CRAN.R-project.org/package=here

+ Paul Murrell (2015). `compare`: Comparing Objects for Differences. R package version 0.2-6.
https://CRAN.R-project.org/package=compare

+ A. Liaw and M. Wiener (2002). Classification and Regression by `randomForest`. R News 2(3), 18--22.

+ Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, Rory
Mitchell, Ignacio Cano, Tianyi Zhou, Mu Li, Junyuan Xie, Min Lin, Yifeng Geng and Yutian Li (2020).
`xgboost`: Extreme Gradient Boosting. R package version 1.0.0.2. https://CRAN.R-project.org/package=xgboost

+ Alexandros Karatzoglou, Alex Smola, Kurt Hornik, Achim Zeileis (2004). `kernlab` - An S4 Package for Kernel
Methods in R. Journal of Statistical Software 11(9), 1-20. URL http://www.jstatsoft.org/v11/i09/

+ Microsoft Corporation and Steve Weston (2019). `doParallel`: Foreach Parallel Adaptor for the `parallel`
Package. R package version 1.0.15. https://CRAN.R-project.org/package=doParallel

+ Richard Iannone (2020). `DiagrammeR`: Graph/Network Visualization. R package version 1.0.6.1. https://CRAN.R-project.org/package=DiagrammeR
22 changes: 15 additions & 7 deletions tutorials/dc-data_table_vs_dplyr/dc-data_table_vs_dplyr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,25 +7,27 @@ output:
toc: true
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, results = 'hide', message = FALSE)
```{r setup, include=FALSE}
# Set up global environment ----------------------------------------------------
knitr::opts_chunk$set(echo=TRUE, results="hide", message=FALSE)
```

```{r, message = FALSE}
#-----load required packages-----
```{r, message=FALSE}
# Load required packages -------------------------------------------------------
if (!require("pacman")) install.packages("pacman")
pacman::p_load(here,
ids, # for generating random ids
ids, # Generate random ids
tidyverse,
data.table,
compare, # compare between data frames
compare, # Compare between data frames
microbenchmark)
```


# Introduction

One of the great benefits of following Rstats conversations on Twitter is its access to user insights. I became curious about `data.table` after reading conversations about its superior performance yet decreased visibility compared to `tidyverse`.
I became curious about `data.table` after reading Twitter conversations about its superior performance yet decreased visibility compared to `tidyverse`. Because


Fast forward a few years and the [data processing efficiency](https://h2oai.github.io/db-benchmark/) of `data.table` has become extremely handy:

Expand Down Expand Up @@ -960,6 +962,8 @@ In contrast, `data.table` is efficient because it contains a very fast ordering

# Other resources

+ https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

+ The definitive [stack overflow discussion](https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly/27840349#27840349) about the best use cases for data.table versus dplyr (from tidyverse).

+ A great side by side comparison of data.table versus dplyr operations by [Atrebas](https://atrebas.github.io/post/2019-03-03-datatable-dplyr/).
Expand All @@ -974,3 +978,7 @@ In contrast, `data.table` is efficient because it contains a very fast ordering
Robin Lovelace](https://csgillespie.github.io/efficientR/data-processing-with-data-table.html).

+ A more detailed explanation of the usage of binary search based subset in `data.table` by [Arun Srinivasan](https://gist.github.com/arunsrinivasan/dacb9d1cac301de8d9ff).

+ https://bookdown.org/rdpeng/rprogdatascience/parallel-computation.html

+ http://www.john-ros.com/Rcourse/parallel.html
Empty file.
13 changes: 13 additions & 0 deletions tutorials/dc-using_arrow/dc-using_arrow.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: "Using arrow with tidyverse and data.table"
author: Erika Duan
date: "`r Sys.Date()`"
output:
github_document:
toc: true
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
38 changes: 38 additions & 0 deletions tutorials/dc-using_arrow/dc-using_arrow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
Using arrow with tidyverse and data.table
================
Erika Duan
2022-03-05

- [R Markdown](#r-markdown)
- [Including Plots](#including-plots)

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax
for authoring HTML, PDF, and MS Word documents. For more details on
using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that
includes both content as well as the output of any embedded R code
chunks within the document. You can embed an R code chunk like this:

``` r
summary(cars)
```

## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00

## Including Plots

You can also embed plots, for example:

![](dc-using_arrow_files/figure-gfm/pressure-1.png)<!-- -->

Note that the `echo = FALSE` parameter was added to the code chunk to
prevent printing of the R code that generated the plot.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -320,3 +320,5 @@ jobs:
+ A [YouTube tutorial](https://www.youtube.com/watch?v=NwUijrm2U2w) by DVC on using GitHub Actions with R to automate data visualisation tasks.
+ A useful (online resource](https://explainshell.com/) for explaining shell commands required to create components of the GitHub Actions YAML workflow.
+ https://amitlevinson.com/blog/automated-plot-with-github-actions/
+ https://rstats.wtf/index.html
+ https://goodresearch.dev/pipelines.html

0 comments on commit e512754

Please sign in to comment.