Skip to content

Commit

Permalink
Add method description
Browse files Browse the repository at this point in the history
  • Loading branch information
Dong555 authored Oct 25, 2022
1 parent e055222 commit 7fafe7c
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,22 @@ uncertainty of contributing features for each latent component through posterior
implement the model with the `JAX <https://github.com/google/jax>`_ library developed by Google which enable the fast
training on CPU, GPU or TPU.

Model Description
=================
We extend the sum of single effects model (`SuSiE <https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12388>`_) to principal component analysis. Assume $X_{N \\times P}$ is the observed data, $Z_{N \\times K}$ is the latent factors, and $W_{K \\times P}$ is the factor loading matrix, then the SuSiE PCA model is given by:

$$X | Z,W \\sim \\mathcal{MN}_{N,P}(ZW,I_N,I_P) $$

where the $\\mathcal{MN}_{N,P}$ is the matrix normal distribution with dimension $N \\times P$,
mean $ZW$, row-covariance $I_N$, and column-covariance $I_P$. The column vector of $Z$ follows a standard normal distribution. The above model setting is the same as the `Probabilistic PCA <https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9868.00196?casa_token=eP85q23j9GoAAAAA:D1pr7dLuzTcs1wDglsDMu14QEmoJttwjdPzrlYI5_QWISOKvcyeeQVW3k4aEuOim5uXXcnt-na_QkGM>`_. The most distinguished part is that we integrate the SuSiE setting into the row vector $\\mathbf{w}_k$ of factor loading matrix $W$, such that each $\\mathbf{w}_k$ only contains at most $L$ number of non-zero effects. That is,
$$\\mathbf{w}_k = \\sum_{l=1}^L \\mathbf{w}_{kl} $$
$$\\mathbf{w}_{kl} = w_{kl} \\gamma_{kl}$$
$$w_{kl} \\sim \\mathcal{N}(0,\\sigma^2_{0kl})$$
$$\\gamma_{kl} | \\pi \\sim \\text{Multi}(1,\\pi) $$

Notice that each row vector $\\mathbf{w}_k$ is a sum of single effect vector $\\mathbf{w}_{kl}$, which is length $P$ vector contains only one non-zero effect $w_{kl}$ and zero elsewhere. And the coordinate of the non-zero effect is determined by $\\gamma_{kl}$ that follows a multinomial distribution with parameter $\\pi$. By construction, each factor inferred from the SuSiE PCA will have at most $L$ number of associated features from the original data. Moreover, we can quantify the probability of the strength of association through the posterior inclusion probabilities (PIPs). Suppose the posterior distribution of $\\gamma_{kl} \\sim \\text{Multi}(1,\\mathbf{\\alpha_{kl}})$, then the probability the feature $i$ contributing to the fatctor $\\mathbf{w}_k$ is given by:
$$\\text{PIP}_{ki} = 1-\\prod_{l=1}^L \\alpha_{kli}$$
where the $\\alpha_{kli}$ is the $i_{th}$ entry of the $\\mathbf{\\alpha_{kl}}$.

Install SuSiE PCA
=================
Expand Down

0 comments on commit 7fafe7c

Please sign in to comment.