README.Rmd

---
output: github_document
title: "estimatr: Fast Estimators for Design-Based Inference"
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
set.seed(42)
knitr::opts_chunk$set(
  collapse = TRUE,
  message = FALSE,
  comment = "#>",
  fig.path = "README-"  
)
options(digits = 2)
```

[![CRAN status](https://www.r-pkg.org/badges/version/estimatr)](https://cran.r-project.org/package=estimatr)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/grand-total/estimatr?color=green)](https://r-pkg.org/pkg/estimatr)
[![Build status](https://github.com/DeclareDesign/estimatr/workflows/R-CMD-check/badge.svg)](https://github.com/DeclareDesign/estimatr/actions)
[![Code coverage](https://codecov.io/gh/DeclareDesign/estimatr/branch/master/graph/badge.svg?token=x9MpkuKobc)](https://codecov.io/gh/DeclareDesign/estimatr)

**estimatr** is an `R` package providing a range of commonly-used linear estimators, designed for speed and for ease-of-use. Users can easily recover robust, cluster-robust, and other design appropriate estimates. We include two functions that implement means estimators, `difference_in_means()` and `horvitz_thompson()`, and three linear regression estimators, `lm_robust()`, `lm_lin()`, and `iv_robust()`. In each case, users can choose an estimator to reflect cluster-randomized, block-randomized, and block-and-cluster-randomized designs. The [Getting Started Guide](https://declaredesign.org/r/estimatr/articles/getting-started.html) describes each estimator provided by **estimatr** and how it can be used in your analysis.

You can also see the multiple ways you can [get regression tables out of estimatr](https://declaredesign.org/r/estimatr/articles/regression-tables.html) using commonly used `R` packages such as `texreg` and `stargazer`. Fast estimators also enable fast simulation of research designs to learn about their properties (see [DeclareDesign](https://declaredesign.org)).

## Installing estimatr

To install the latest stable release of **estimatr**, please ensure that you are running version 3.5 or later of R and run the following code:

```{r, eval=F}
install.packages("estimatr")
```

## Easy to use

Once the package is installed, getting appropriate estimates and standard errors is now both fast and easy.

```{r, eval = TRUE, echo=-1}
set.seed(42)
library(estimatr)

# sample data from cluster-randomized experiment
library(fabricatr)
library(randomizr)
dat <- fabricate(
  N = 100,
  y = rnorm(N),
  clusterID = sample(letters[1:10], size = N, replace = TRUE),
  z = cluster_ra(clusterID)
)

# robust standard errors
res_rob <- lm_robust(y ~ z, data = dat)
# tidy dataframes on command!
tidy(res_rob)

# cluster robust standard errors
res_cl <- lm_robust(y ~ z, data = dat, clusters = clusterID)
# standard summary view also available
summary(res_cl)

# matched-pair design learned from blocks argument
data(sleep)
res_dim <- difference_in_means(extra ~ group, data = sleep, blocks = ID)
```

The [Getting Started Guide](/r/estimatr/articles/getting-started.html) has more examples and uses, as do the reference pages. The [Mathematical Notes](/r/estimatr/articles/mathematical-notes.html) provide more information about what each estimator is doing under the hood.

## Fast to use

Getting estimates and robust standard errors is also faster than it used to be. Compare our package to using `lm()` and the `sandwich` package to get HC2 standard errors. More speed comparisons are available [here](https://declaredesign.org/r/estimatr/articles/benchmarking-estimatr.html). Furthermore, with many blocks (or fixed effects), users can use the `fixed_effects` argument of `lm_robust` with HC1 standard errors to greatly improve estimation speed. More on [fixed effects here](https://declaredesign.org/r/estimatr/articles/absorbing-fixed-effects.html).

```{r, echo=-1}
set.seed(1)
dat <- data.frame(X = matrix(rnorm(2000*50), 2000), y = rnorm(2000))

library(microbenchmark)
library(lmtest)
library(sandwich)
mb <- microbenchmark(
  `estimatr` = lm_robust(y ~ ., data = dat),
  `lm + sandwich` = {
    lo <- lm(y ~ ., data = dat)
    coeftest(lo, vcov = vcovHC(lo, type = 'HC2'))
  }
)
```
```{r, echo = FALSE}
d <- summary(mb)[, c("expr", "median")]
names(d) <- c("estimatr", "median run-time (ms)")
knitr::kable(d)
```

---

This project is generously supported by a grant from the [Laura and John Arnold Foundation](http://www.arnoldfoundation.org) and seed funding from [Evidence in Governance and Politics (EGAP)](http://egap.org).