MoCo is an R package designed to remove motion artifacts in brain phenotype analysis. Please note that MoCo is still under development.
Ensure you have the following R packages installed:
- SuperLearner
- haldensify
- MASS
- devtools
You can install them by running the following code:
if(!require(c("SuperLearner","haldensify", "MASS", "devtools"))){
install.packages(c("SuperLearner","haldensify", "MASS", "devtools"))
}
Then, you can install MoCo from GitHub using the following code:
library(devtools)
install_github("thebrisklab/MoCo")
library(MoCo)
The moco()
function serves as the main function of MoCo. The input and output of the function are illustrated in the figure below.
moco(
X, Z, A, M, Y,
Delta_M,
thresh = NULL,
Delta_Y,
SL_library = c("SL.earth","SL.glmnet","SL.gam","SL.glm", "SL.glm.interaction", "SL.step","SL.step.interaction","SL.xgboost","SL.ranger","SL.mean"),
SL_library_customize = list(
gA = NULL,
gDM = NULL,
gDY_AX = NULL,
gDY_AXZ = NULL,
mu_AMXZ = NULL,
eta_AXZ = NULL,
eta_AXM = NULL,
xi_AX = NULL
),
glm_formula = list(gA = NULL,
gDM = NULL,
gDY_AX = NULL,
gDY_AXZ = NULL,
mu_AMXZ = NULL,
eta_AXZ = NULL,
eta_AXM = NULL,
xi_AX = NULL,
pMX = NULL,
pMXZ = NULL),
HAL_pMX = TRUE,
HAL_pMXZ = TRUE,
HAL_options = list(max_degree = 3, lambda_seq = exp(seq(-1, -10, length = 100)), num_knots = c(1000, 500, 250)),
cross_fit = TRUE,
cv_folds = 5,
test = TRUE,
fwer = 0.05,
seed_rgn = 1,
...
)
-
X
: A dataframe or matrix containing demographic confounders that would ideally be balanced in a randomized controlled trial. -
Z
: A dataframe or matrix of covariates representing brain phenotypes. -
A
: A binary vector of length n (number of participants), serving as a group indicator, such as diagnosis group or control group. -
M
: A numeric vector of length n representing continuous motion values for each participant. -
Y
: A matrix of dimension n$\times$ p, where n is the number of participants, and p is the number of regions of interest.- If it represents seed-based association measures: Each (i, j) element denotes participant i's association measure between the seed region and region j. The column representing the association measure of the seed region with itself should be filled with NA values to indicate its position.
- If it represents other types of association measures: Each (i, j) element denotes participant i's association measure between two brain regions of interest, such as the upper diagonal part of the functional connectivity matrix. No NA values are allowed in Y in this case.
-
Delta_M
: A binary vector of length n indicating whether motion is available and meets inclusion criteria. If motion meets inclusion criteria for analysis, set Delta_M = 1; otherwise, set Delta_M = 0. -
thresh
: A numeric value used to threshold M to produce Delta_M. One can specify either Delta_M or thresh. -
Delta_Y
: A binary vector indicating the non-missingness and whether the brain image dataY
passes quality control after preprocessing. SetDelta_Y = 1
ifY
is usable; otherwise, setDelta_Y = 0
. -
SL_library
: SuperLearner library for estimating nuisance regressions. Defaults to c("SL.earth","SL.glmnet","SL.gam","SL.glm", "SL.glm.interaction", "SL.step","SL.step.interaction","SL.xgboost","SL.ranger","SL.mean") if not specified. -
SL_library_customize
: Customize SuperLearner library for estimating each nuisance regression.-
gA
: SuperLearner library for estimating the propensity score. -
gDM
: SuperLearner library for estimating the probability P(Delta_M = 1 | A, X). -
gDY_AX
: SuperLearner library for estimating the probability P(Delta_Y = 1 | A, X). -
gDY_AXZ
: SuperLearner library for estimating the probability P(Delta_Y = 1 | A, X, Z). -
mu_AMXZ
: SuperLearner library for estimating the outcome regression E(Y | Delta_Y = 1, A, M, X, Z). -
eta_AXZ
: SuperLearner library for estimating E(mu_AMXZ pMXD / pMXZD | A, X, Z, Delta_M = 1). -
eta_AXM
: SuperLearner library for estimating E(mu_AMXZ pMX/pMXZ gDY_AX/gDY_AXZ | A, M, X, Delta_Y = 1). -
xi_AX
: SuperLearner library for estimating E(eta_AXZ | A, X).
-
-
glm_formula
: All glm formulas default to NULL, indicating SuperLearner will be used for nuisance regressions.-
gA
: GLM formula for estimating the propensity score. -
gDM
: GLM formula for estimating the probability P(Delta_M = 1 | A, X). -
gDY_AX
: GLM formula for estimating the probability P(Delta_Y = 1 | A, X). -
gDY_AXZ
: GLM formula for estimating the probability P(Delta_Y = 1 | A, X, Z). -
mu_AMXZ
: GLM formula for estimating the outcome regression E(Y | Delta_Y = 1, A, M, X, Z). -
eta_AXZ
: GLM formula for estimating E(mu_AMXZ pMXD / pMXZD | A, X, Z, Delta_M = 1). -
eta_AXM
: GLM formula for estimating E(mu_AMXZ pMX/pMXZ gDY_AX/gDY_AXZ | A, M, X, Delta_Y = 1). -
xi_AX
: GLM formula for estimating E(eta_AXZ | A, X). -
pMX
: GLM formula for estimating p(m | a, x, Delta_Y = 1) and p(m | a, x, Delta_M = 1), assuming M follows a log normal distribution. -
pMXZ
: GLM formula for estimating p(m | a, x, z, Delta_Y = 1) and p(m | a, x, z, Delta_M = 1), assuming M follows a log normal distribution.
-
-
HAL_pMX
: Specifies whether to estimate p(m | a, x, Delta_Y = 1) and p(m | a, x, Delta_M=1) using the highly adaptive lasso conditional density estimation method. Defaults to TRUE. If set to FALSE, please specify the pMX option in glm_formula, such as pMX = ".". -
HAL_pMXZ
: Specifies whether to estimate p(m | a, x, z, Delta_Y = 1) and p(m | a, x, z, Delta_M=1) using the highly adaptive lasso conditional density estimation method. Defaults to TRUE. If set to FALSE, please specify the pMXZ option in glm_formula, such as pMXZ = ".". -
HAL_options
: Additional options for the highly adaptive lasso (HAL) method. These will be passed to the haldensify function in the haldensify package.-
max_degree
: The highest order of interaction terms for generating basis functions. -
lambda_seq
: A numeric sequence of values for the regularization parameter of Lasso regression. -
num_knots
: The maximum number of knot points (i.e., bins) for any covariate for generating basis functions.
-
-
cross_fit
: Logical indicating whether to develop the estimator based on cross-fitting. Defaults to TRUE. -
test
: Logical indicating whether to conduct hypothesis testing based on simultaneous confidence band. Defaults to TRUE. -
fwer
: A vector of family-wise error rates (FWER) to control for multiple hypothesis testing. Defaults to c(0.05). Set to NULL iftest
is FALSE. -
seed_rgn
: Specifies the value of seed(s) for nuisance regression calculation using super learner. Can be a vector. Defaults to value 1.
In this tutorial, we demonstrate the application of the MoCo package with a straightforward example analysis. We generate a simulated dataset based on the Autism Brain Imaging Data Exchange (ABIDE). Our seed region of interest is the default mode network (DMN). We are interested in studying the functional connectivity between DMN and the other six regions defined using the Yeo 7 parcellation. We use MoCo to compute the motion-controlled mean functional connectivity and associations.
# library
library(MoCo)
# load data
data(data)
# inspect the data
str(data)
The dataset includes a total of
Demographic confounders X and behavioral phenotypes Z are represented as data frames, each containing multiple variables. Data frame X is of size
The functional connectivity matrix Y has dimensions
Then, we apply the moco()
function is utilized to compute motion-controlled functional connectivity and associations. For illustrative purposes, we choose a simple setting. Here we select a basic SL_Library, employ cross-fitting with cv_folds = 5, and utilize glm for motion density estimation. In the end, we provide a suggested setting for conducting a more comprehensive analysis based on more accurate density estimation using highly adaptive lasso density estimation.
# computing motion-controlled functional connectivity and associations
result = moco(
X = data$X,
Z = data$Z,
A = data$A,
M = data$M,
Y = data$Y,
Delta_M = data$Delta_M,
Delta_Y = data$Delta_Y,
SL_library = c("SL.mean", "SL.glm","SL.glm.interaction"),
glm_formula = list(pMX = ".",
pMXZ = "."),
HAL_pMX = FALSE,
HAL_pMXZ = FALSE,
cross_fit = TRUE,
cv_folds = 5,
seed_rgn = 1,
test = TRUE,
fwer = 0.05
)
The result will be a list of 4 elements. The motion-controlled mean functional connectivity is stored in the est
element. The first row corresponds to the adjusted mean functional connectivity for the non-ASD group (
# motion-controlled mean functional connectivity
round(result$est, 4)
# est_A0 -0.2180 -0.1632 -0.1823 0.0535 0.0388 0.0828 NA
# est_A1 -0.2194 -0.1654 -0.1813 0.0513 -0.0084 0.0141 NA
The motion-controlled association is stored in the adj_association
vector of length seven, where the first six elements represent the adjusted association for the corresponding region. MoCo achieves satisfactory accuracy when comparing the estimated motion-controlled association with the ground truth.
# motion-controlled association
round(result$adj_association, 4)
# -0.0014 -0.0023 0.0010 -0.0022 -0.0472 -0.0687 NA
In addition to the motion-controlled functional connectivity and association, it contains 2 other elements: z-scores and a binary indicator indicating significance for each of the regions.
# z-scores
round(result$z_score, 4)
# -0.0586 -0.0853 0.0413 -0.0933 -1.9196 -3.3033 NA
# significant regions
result$significant_regions
# FALSE FALSE FALSE FALSE FALSE TRUE NA
Below is the code for a comprehensive setting recommended for running MoCo. This setting uses highly adaptive lasso for conditional density motion estimation, offering more flexible modeling of the conditional motion distribution. It utilizes all default parameters provided by the function, which uses the default Super Learner library for flexible nuisance regression estimation. Users only need to specify the definition of each variable to obtain results.
# computing motion-controlled functional connectivity and associations (recommended)
result = moco(
X = data$X,
Z = data$Z,
A = data$A,
M = data$M,
Y = data$Y,
Delta_M = data$Delta_M,
Delta_Y = data$Delta_Y
)