Add method description

mancusolab · Oct 25, 2022 · 7fafe7c · 7fafe7c
1 parent e055222
commit 7fafe7c
Showing 1 changed file with 16 additions and 0 deletions.
diff --git a/README.rst b/README.rst
@@ -42,6 +42,22 @@ uncertainty of contributing features for each latent component through posterior
 implement the model with the `JAX <https://github.com/google/jax>`_ library developed by Google which enable the fast
 training on CPU, GPU or TPU.
 
+Model Description
+=================
+We extend the sum of single effects model (`SuSiE <https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12388>`_) to principal component analysis. Assume $X_{N \\times P}$ is the observed data, $Z_{N \\times K}$ is the latent factors, and $W_{K \\times P}$ is the factor loading matrix, then the SuSiE PCA model is given by:
+
+$$X | Z,W \\sim \\mathcal{MN}_{N,P}(ZW,I_N,I_P)  $$
+
+where the $\\mathcal{MN}_{N,P}$ is the matrix normal distribution with dimension $N \\times P$,
+mean $ZW$, row-covariance $I_N$, and column-covariance $I_P$. The column vector of $Z$ follows a standard normal distribution. The above model setting is the same as the `Probabilistic PCA <https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9868.00196?casa_token=eP85q23j9GoAAAAA:D1pr7dLuzTcs1wDglsDMu14QEmoJttwjdPzrlYI5_QWISOKvcyeeQVW3k4aEuOim5uXXcnt-na_QkGM>`_. The most distinguished part is that we integrate the SuSiE setting into the row vector $\\mathbf{w}_k$ of factor loading matrix $W$, such that each $\\mathbf{w}_k$ only contains at most $L$ number of non-zero effects. That is,
+$$\\mathbf{w}_k = \\sum_{l=1}^L \\mathbf{w}_{kl} $$
+$$\\mathbf{w}_{kl} = w_{kl} \\gamma_{kl}$$
+$$w_{kl} \\sim \\mathcal{N}(0,\\sigma^2_{0kl})$$
+$$\\gamma_{kl} | \\pi \\sim \\text{Multi}(1,\\pi) $$
+
+Notice that each row vector $\\mathbf{w}_k$ is a sum of single effect vector $\\mathbf{w}_{kl}$, which is length $P$ vector contains only one non-zero effect $w_{kl}$ and zero elsewhere. And the coordinate of the non-zero effect is determined by $\\gamma_{kl}$ that follows a multinomial distribution with parameter $\\pi$. By construction, each factor inferred from the SuSiE PCA will have at most $L$ number of associated features from the original data. Moreover, we can quantify the probability of the strength of association through the posterior inclusion probabilities (PIPs). Suppose the posterior distribution of $\\gamma_{kl} \\sim \\text{Multi}(1,\\mathbf{\\alpha_{kl}})$, then the probability the feature $i$ contributing to the fatctor $\\mathbf{w}_k$ is given by:
+$$\\text{PIP}_{ki} = 1-\\prod_{l=1}^L \\alpha_{kli}$$
+where the $\\alpha_{kli}$ is the $i_{th}$ entry of the $\\mathbf{\\alpha_{kl}}$.
 
 Install SuSiE PCA
 =================