-
Notifications
You must be signed in to change notification settings - Fork 0
/
MrSFit.Rd
174 lines (142 loc) · 6.21 KB
/
MrSFit.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mrsguide.R
\name{MrSFit}
\alias{MrSFit}
\title{Multiple responses subgroup identification}
\usage{
MrSFit(
dataframe,
role,
bestK = 1,
bootNum = 0L,
alpha = 0.05,
maxDepth = 5,
minTrt = 5,
minData = max(c(minTrt * maxDepth, NROW(Y)/20)),
batchNum = 1L,
CVFolds = 10L,
CVSE = 0,
faster = FALSE,
display = FALSE,
treeName = paste0("tree_", format(Sys.time(), "\%m\%d"), ".yaml"),
nodeName = paste0("node_", format(Sys.time(), "\%m\%d"), ".txt"),
bootName = paste0("boot_", format(Sys.time(), "\%m\%d"), ".txt"),
impName = paste0("imp_", format(Sys.time(), "\%m\%d"), ".txt"),
writeTo = FALSE,
remove = TRUE
)
}
\arguments{
\item{dataframe}{The data frame used for subgroup identification in a \code{\link[base]{data.frame}} format.
The data frame should contains covariates, treatment assignment and outcomes. The order of variables does not matter.}
\item{role}{role follows 'GUIDE' role. role should be a \code{\link[base]{vector}},
with same length as \code{dataframe}'s column. The role serves for providing usage of each column in \code{dataframe}.
In current implementation, we have following available roles.
\itemize{
\item{\strong{Covariates roles}}
\itemize{
\item \strong{c} \strong{C}ategorical variable used for splitting only.
\item
\strong{f} Numerical variable used only for \strong{f}itting the
regression models in the nodes of tree. It will not be used for splitting
the nodes.
\item
\strong{h} Numerical variable always \strong{h}eld in fitting the
regression models in the nodes of tree.
\item
\strong{n} \strong{N}umerical variable used both for splitting the
nodes and fitting the node regression model.
\item
\strong{s} Numerical variable only used for \strong{s}plitting the
node. It will not be used for fitting the regression model.
\item
\strong{x} E\strong{x}clude variable. Variable will not be used in tree building.
}
\item{\strong{Outcome role}}
\itemize{
\item
\strong{d} \strong{D}ependent variable. If there is only one
\strong{d} variable, function will do single response
subgroup identification.}
\item{\strong{Treatment role}}
\itemize{
\item \strong{r} Categorical t\strong{R}eatment variable used only for
fitting the linear models in the nodes of tree. It is not used for
splitting the nodes.}
}}
\item{bestK}{number of covariates in the regression model}
\item{bootNum}{bootstrap number}
\item{alpha}{desire alpha levels for confidence interval with respect to treatment parameters}
\item{maxDepth}{maximum tree depth}
\item{minTrt}{minimum treatment and placebo sample in each node}
\item{minData}{minimum sample in each node}
\item{batchNum}{related with exhaustive search for numerical split variable}
\item{CVFolds}{cross validation times}
\item{CVSE}{cross validation SE}
\item{faster}{related with tree split searching}
\item{display}{Whether display tree in the end}
\item{treeName}{yaml file for save the tree}
\item{nodeName}{file same for each node}
\item{bootName}{file save bootstrap calibrate alpha}
\item{impName}{important variable file name}
\item{writeTo}{debug option reserve for author...}
\item{remove}{whether to remove extra files}
}
\value{
An object of class \code{"guide"}
\item{treeRes}{Tree structure result.}
\item{node}{Predicted node of each observation.}
\item{imp}{A raw importance score, can used \code{\link[MrSGUIDE]{MrSImp}} for more accurate result.}
\item{cLevels}{Categorical features level mapping.}
\item{tLevels}{Treatment assignment level mapping.}
\item{yp}{Number of outcomes.}
\item{tp}{Number of treatment assignment levels.}
\item{role}{Role used for data frame.}
\item{varName}{Variable names.}
\item{numName}{Numerical variable names.}
\item{catName}{Categorical variable names.}
\item{trtName}{Treatment assignment variable name.}
\item{nodeMap}{A map from node id to node information.}
\item{TrtL}{Treatment level mapping.}
\item{Settings}{Current tree setting.}
\item{trtNode}{Treatment effect summary.}
}
\description{
Multiple responses subgroup identification using 'GUIDE' 'Gi' option for tree building
}
\details{
This function uses 'GUIDE' Gi option for tree building, it can provide subgroup identification tree
and confidence intervals of treatment effect based on bootstrap calibration.
'Gi' option is testing the interaction between covariate \eqn{x_i} and treatment assignment \eqn{z}.
With in each tree node \eqn{t}, if \eqn{x_i} is a continuous variable, the function will discretize it into
four parts as \eqn{h_i} based on sample quartiles. If \eqn{x_i} is a categorical variable,
function will set \eqn{h_i} = \eqn{x_i}.
If \eqn{x_i} contains missing value, the function will add missing as a new level into \eqn{H_i}.
Then, we test the full model against the main effect model.
\deqn{H_0 = \beta_0 + \sum\limits_{i=2}^{H}\beta_{hi}I(h_i = i) + \sum\limits_{j=2}^{G}\beta_{zj}I(Z_j = j)}
\deqn{H_A = \beta_0 + \sum\limits_{i=2, j=2}\beta_{ij}I(h_i = i, Z_j = j)}
Then choose the most significant \eqn{x_i}. The details algorithm can be found in Loh, W.-Y. and Zhou, P. (2020).
The bootstrap confidence interval of treatment can be fond in Loh et al. (2019).
}
\examples{
library(MrSGUIDE)
set.seed(1234)
N = 200
np = 3
numX <- matrix(rnorm(N * np), N, np) ## numerical features
gender <- sample(c('Male', 'Female'), N, replace = TRUE)
country <- sample(c('US', 'UK', 'China', 'Japan'), N, replace = TRUE)
z <- sample(c(0, 1), N, replace = TRUE) # Binary treatment assignment
y1 <- numX[, 1] + 1 * z * (gender == 'Female') + rnorm(N)
y2 <- numX[, 2] + 2 * z * (gender == 'Female') + rnorm(N)
train <- data.frame(numX, gender, country, z, y1, y2)
role <- c(rep('n', 3), 'c', 'c', 'r', 'd', 'd')
mrsobj <- MrSFit(dataframe = train, role = role)
printTree(mrsobj)
}
\references{
Loh, W.-Y. and Zhou, P. (2020). The GUIDE approach to subgroup identification.
In Design and Analysis of Subgroups with Biopharmaceutical Applications, N. Ting, J. C. Cappelleri, S. Ho, and D.-G. Chen (Eds.) Springer, in press.
Loh, W.-Y., Man, M. and Wang, S. (2019). Subgroups from regression trees with adjustment for prognostic effects and post-selection inference.
Statistics in Medicine, vol. 38, 545-557. doi:10.1002/sim.7677 \url{http://pages.stat.wisc.edu/~loh/treeprogs/guide/sm19.pdf}
}