modify vignette & data

epijim · Sep 27, 2021 · b19c66e · b19c66e
1 parent 014d4c2
commit b19c66e
Show file tree

Hide file tree

Showing 5 changed files with 40 additions and 29 deletions.
diff --git a/R/data.R b/R/data.R
@@ -1,17 +1,23 @@
 #' Antidepressant trial data.
 #'
-#' A dataset containing the data from a public available antidepressant clinical trial of an active drug versus placebo.
+#' A dataset containing data from a publicly available antidepressant clinical trial of an active drug versus placebo.
+#' The dataset is available [here](https://www.lshtm.ac.uk/research/centres-projects-groups/missing-data#dia-missing-data).
 #' The relevant endpoint is the Hamilton 17-item rating scale for depression (HAMD17) which was assessed at baseline and weeks 1, 2, 4, and 6.
-#' Study drug discontinuation occurred in 24% (20/84) for the active drug and 26% (23/88) for placebo.
+#' Study drug discontinuation occurred in 24% subjects from the active drug and 26% from placebo.
 #' All data after study drug discontinuation are missing and there is a single additional intermittent missing observation.
 #'
 #' @format A data frame with 608 rows and 11 variables:
 #'   - `PATIENT`: patients IDs.
+#'   - `HAMATOTL`: total score Hamilton Anxiety Rating Scale.
+#'   - `PGIIMP`: patient’s Global Impression of Improvement Rating Scale.
+#'   - `RELDAYS`: number of days between visit and baseline.
 #'   - `VISIT`: post-baseline visit. Has levels 4,5,6,7.
 #'   - `THERAPY`: the treatment group variable. It is equal to `PLACEBO` for observations
 #'   from the placebo arm, or `DRUG` for observations from the active arm.
-#'   - `basval`: baseline outcome value.
+#'   - `GENDER`: patient's sex.
+#'   - `POOLINV`: pooled investigator.
+#'   - `BASVAL`: baseline outcome value.
 #'   - `HAMDTL17`: Hamilton 17-item rating scale value.
-#'   - `change`: change from baseline in the Hamilton 17-item rating scale.
-#'   - ...
+#'   - `CHANGE`: change from baseline in the Hamilton 17-item rating scale.
+#'
 "antidepressant_data"
diff --git a/data/antidepressant_data.rda b/data/antidepressant_data.rda
diff --git a/man/antidepressant_data.Rd b/man/antidepressant_data.Rd
diff --git a/man/method.Rd b/man/method.Rd
diff --git a/vignettes/quickstart.Rmd b/vignettes/quickstart.Rmd
@@ -30,7 +30,7 @@ In particular the core functions are:
 
 ## The Data
 
-In order to demonstrate the package we will use a publicly available example data set from an antidepressant clinical trial of an active drug versus placebo. The relevant endpoint is the Hamilton 17-item rating scale for depression (HAMD17) which was assessed at baseline and weeks 1, 2, 4, and 6. Study drug discontinuation occurred in 24% (20/84) for the active drug and 26% (23/88) for placebo. All data after study drug discontinuation are missing and there is a single additional intermittent missing observation.
+In order to demonstrate the package we will use a publicly available example data set from an antidepressant clinical trial of an active drug versus placebo. The relevant endpoint is the Hamilton 17-item rating scale for depression (HAMD17) which was assessed at baseline and weeks 1, 2, 4, and 6. Study drug discontinuation occurred in 24% subjects from the active drug and 26% subjects from placebo. All data after study drug discontinuation are missing and there is a single additional intermittent missing observation.
 
 ```{r}
 library(rbmi)
@@ -40,10 +40,10 @@ data("antidepressant_data")
 dat <- antidepressant_data
 ```
 
-We consider an imputation model with the mean change from baseline in the HAMD17 score as the outcome (variable `change` in the dataset), included the treatment group (`THERAPY`), the (categorical) visit (`VISIT`), treatment-by-visit interactions, the baseline HAMD17 score (`basval`), and baseline HAMD17-by-visit interactions as covariates, and assumed a common unstructured covariance matrix in both groups. The chosen analysis model is ANCOVA which adjusts for the baseline HAMD17 value.
+We consider an imputation model with the mean change from baseline in the HAMD17 score as the outcome (variable `CHANGE` in the dataset), included the treatment group (`THERAPY`), the (categorical) visit (`VISIT`), treatment-by-visit interactions, the baseline HAMD17 score (`BASVAL`), and baseline HAMD17-by-visit interactions as covariates, and assumed a common unstructured covariance matrix in both groups. The chosen analysis model is ANCOVA which adjusts for the baseline HAMD17 value.
 
 `rbmi` expects its input dataset to be complete; that is that there must be 1 row
-per patient per visit. Missing outcome values should be coded as `NA`, while missing values in the covariates are not allowed. If your dataset is incomplete then the `expand_locf()` helper function can be used to add in any missing rows, using LOCF imputation to impute the covariate values. In our dataset the rows corresponding to missing outcomes are missing, so we use the `expand_locf()` function
+per patient per visit. Missing outcome values should be coded as `NA`, while missing values in the covariates are not allowed. If your dataset is incomplete then the `expand_locf()` helper function can be used to add in any missing rows, using LOCF imputation to impute the covariate values. In our dataset the rows corresponding to missing outcomes are not present, to address this we will therefore use the `expand_locf()` function
 as follows:
 
 ```{r}
@@ -53,7 +53,7 @@ dat <- expand_locf(
     dat,
     PATIENT = levels(dat$PATIENT), # expand by PATIENT and VISIT 
     VISIT = levels(dat$VISIT),
-    vars = c("basval", "THERAPY"), # fill with LOCF basval and THERAPY
+    vars = c("BASVAL", "THERAPY"), # fill with LOCF BASVAL and THERAPY
     group = c("PATIENT"),
     order = c("PATIENT", "VISIT")
 )
@@ -68,7 +68,7 @@ function include:
 - `data` the primary longitudinal data.frame containing the outcome variable and all covariates
 - `data_ice` a data.frame specifying which visit (if any) the patient's intercurrent 
 event (ICE) occurred on, or more precisely the first visit in which the outcome has been affected by the ICE. If the patient had multiple ICEs this should 
-specify the first visit affected by the ICE addressed with a non-MAR imputation strategy. It also 
+specify the first visit affected by the ICE which is to be imputed by a non-MAR. It also 
 specifies which reference based imputation strategy we want to use.
 - `method` specifies what method we want to use to fit our imputation models as well as what
 method we want to use to generate our imputed values. 
@@ -77,22 +77,22 @@ In our example the patients ICE visit is the
 first visit in which a missing value has occurred. We assume that all patients will be
 imputed under the Jump To Reference (JR) strategy. We will create 150 imputation models using Bayesian
 methods to sample the model coefficients from their posterior distributions for a model 
-of `change ~ 1 + basval * VISIT + THERAPY * VISIT`.
+of `CHANGE ~ 1 + BASVAL * VISIT + THERAPY * VISIT`.
 
 ```{r}
 # create data_ice setting the imputation method to JR for
 # each patient with at least one missing value
 dat_ice <- dat %>% 
     arrange(PATIENT, VISIT) %>% 
-    filter(is.na(change)) %>% 
+    filter(is.na(CHANGE)) %>% 
     group_by(PATIENT) %>% 
     slice(1) %>%
     ungroup() %>% 
     select(PATIENT, VISIT) %>% 
     mutate(strategy = "JR")
 
-# The patient with id 3618 is the unique one that has an intermittent missing values.
-# Actually he does not stop the treatment -> remove from data_ice
+# The patient with id 3618 is the unique one that has an intermittent missing values ->
+# remove from data_ice since he does not experience any ICE.
 # (it will be automatically imputed under MAR assumption)
 dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618),]
 
@@ -101,18 +101,18 @@ dat_ice
 # Define the names of key variables in our dataset using `set_vars()`
 # Note that covariates argument can contain interactions
 vars <- set_vars(
-    outcome = "change",
+    outcome = "CHANGE",
     visit = "VISIT",
     subjid = "PATIENT",
     group = "THERAPY",
-    covariates = c("basval*VISIT", "THERAPY*VISIT")
+    covariates = c("BASVAL*VISIT", "THERAPY*VISIT")
 )
 
 # Define what method we want to use e.g. here we specify we 
-# want to use Baysian methods to create 150 samples
+# want to use Bayesian methods to create 100 samples
 method <- method_bayes(
     burn_in = 200,
-    burn_between = 10,
+    burn_between = 5,
     n_samples = 150,
     verbose = FALSE
 )
@@ -167,7 +167,7 @@ imputeObj
 ```
 
 In this instance we are specifying that group `PLACEBO` should use itself as its reference group and that group `DRUG` should
-use group `PLACEBO` as its reference group (as standard for imputation using reference-based methods). 
+use the group `PLACEBO` as its reference group (as standard for imputation using reference-based methods). 
 
 Generally speaking, there is no need to see or directly interact with any of the imputed
 datasets. However if you do wish to inspect them they can be extracted from the imputation
@@ -200,10 +200,10 @@ anaObj <- analyse(
     ancova,
     vars = set_vars(
         subjid = "PATIENT",
-        outcome = "change",
+        outcome = "CHANGE",
         visit = "VISIT",
         group = "THERAPY",
-        covariates = c("basval")
+        covariates = c("BASVAL")
     )
 )
 anaObj
@@ -244,10 +244,10 @@ anaObj_delta <- analyse(
     delta = delta_df,
     vars = set_vars(
         subjid = "PATIENT",
-        outcome = "change",
+        outcome = "CHANGE",
         visit = "VISIT",
         group = "THERAPY",
-        covariates = c("basval")
+        covariates = c("BASVAL")
     )
 )
 ```