CausalStump

This package fits a Gaussian process for treatment effect estimation. It is part of my MSc project in Computational Statistics and Machine Learning at UCL in London 2016/17.

This package can be installed using following command in R:

install.packages("https://github.com/mazphilip/CausalStump/raw/master/tar/CausalStump_0.2.0.tar.gz", repos = NULL, type = "source")

This package requires the R-packages Rcpp, RcppArmadillo, and mvnfast (for the sampling of the student-t posterior).

Replicate thesis results

To replicate the results in my thesis, you can find the simulation files in the test folder. You need the sim.data file of Hill (2011) that is available in the supplement material here (stable link). On an AWS c4.8xlarge instance with an Rstudio AMI (set export OMP_NUM_THREADS=2) each file takes between 4-6h. Use following package version to run the simulations:

install.packages("https://github.com/mazphilip/CausalStump/raw/master/tar/CausalStump_0.2.0.tar.gz", repos = NULL, type = "source")

Example

As there is only a very limited and incomplete documentation so far, here is an example of how to use the package to estimate treatment effects

library(CausalStump)

#simualtion as in Hill (2011) and Hill & Su (2013)
set.seed(1231)
n=120
Z = rbinom(n, 1, 0.3)
X1 = rnorm(sum(Z), mean = 40, sd = 10)
X0 = rnorm(n-sum(Z), mean = 20, sd = 10)
X0[X0<0] = 0.01
X = data.frame(matrix(NaN,n,1))
X[Z==1,] = X1; X[Z==0,] = X0
mysort = sort(X[,1],index.return=TRUE)

y0_true = as.matrix(72 + 3 * sqrt(X))
y1_true = as.matrix(90 + exp(0.06 * X))
y0_true[is.nan(y0_true)] = 60
y0_true[y0_true<60] = 60; y1_true[y1_true<60] = 60
y0_true[y0_true>120] = 120; y1_true[y1_true>120] = 120

#add observation noise (this is different to Hill's example but preserves the additive normal noise assumption)
Y0 = rnorm(n, mean = y0_true, sd = 1)
Y1 = rnorm(n, mean = y1_true, sd = 1)
Y = Y0*(1-Z) + Y1*Z 

#parameter learning of Gaussian process (prior = FALSE), student-t process (prior = TRUE)
mycs = CausalStump(Y,X,Z,prior=FALSE)

#predict response surfaces
mypred0 = predict(mycs,z=0)
mypred1 = predict(mycs,z=1)

#plot surface fit
par(mfrow=c(1,1))
plot(X[,1],mypred1$map,ylim = c(70,120),ylab="Y",xlab="X")
points(X[,1],mypred0$map)
points(X[,1],Y,col=2,pch=20)
lines(mysort$x,y1_true[mysort$ix]); lines(mysort$x,y0_true[mysort$ix])
legend("topleft",c("true","estimates","osbervations"),pch=c(NA,1,20),lty=c(1,NA,NA),col=c(1,1,2) )

#predict treatment effect of training sample (GP: exact, TP: sampling)
mytreat = treatment(mycs)

# the PEHE metric is used as in Hill (2011):
PEHE = sqrt(mean( (y1_true-y0_true - mytreat$map)^2)); PEHE

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
R		R
man		man
src		src
tar		tar
test		test
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CausalStump.Rproj		CausalStump.Rproj
DESCRIPTION		DESCRIPTION
H11_lim_weights_100_18_august.RData		H11_lim_weights_100_18_august.RData
H11_lim_weights_100_18_august2.RData		H11_lim_weights_100_18_august2.RData
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CausalStump

Replicate thesis results

Example

About

Releases

Packages

Languages

License

mazphilip/CausalStump

Folders and files

Latest commit

History

Repository files navigation

CausalStump

Replicate thesis results

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages