Skip to content

Vassago1911/randomised_pca

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Principal component analysis.

Constructs a nearly optimal rank-k approximation U diag(s) Va to A,
centering the columns of A first when raw is False, using n_iter
normalized power iterations, with block size l, started with a
min(m,n) x l random matrix, when A is m x n; the reference PCA_
below explains "nearly optimal." k must be a positive integer <=
the smaller dimension of A, n_iter must be a nonnegative integer,
and l must be a positive integer >= k.

The rank-k approximation U diag(s) Va comes in the form of a
singular value decomposition (SVD) -- the columns of U are
orthonormal, as are the rows of Va, and the entries of s are all
nonnegative and nonincreasing. U is m x k, Va is k x n, and
len(s)=k, when A is m x n.

Increasing n_iter or l improves the accuracy of the approximation
U diag(s) Va; the reference PCA_ below describes how the accuracy
depends on n_iter and l. Please note that even n_iter=1 guarantees
superb accuracy, whether or not there is any gap in the singular
values of the matrix A being approximated, at least when measuring
accuracy as the spectral norm || A - U diag(s) Va || of the matrix
A - U diag(s) Va (relative to the spectral norm ||A|| of A, and
accounting for centering when raw is False).

Notes
-----
To obtain repeatable results, reset the seed for the pseudorandom
number generator.

The user may ascertain the accuracy of the approximation
U diag(s) Va to A by invoking diffsnorm(A, U, s, Va), when raw is
True. The user may ascertain the accuracy of the approximation
U diag(s) Va to C(A), where C(A) refers to A after centering its
columns, by invoking diffsnormc(A, U, s, Va), when raw is False.

Parameters
----------
A : array_like, shape (m, n)
    matrix being approximated
k : int, optional
    rank of the approximation being constructed;
    k must be a positive integer <= the smaller dimension of A,
    and defaults to 6
raw : bool, optional
    centers A when raw is False but does not when raw is True;
    raw must be a Boolean and defaults to False
n_iter : int, optional
    number of normalized power iterations to conduct;
    n_iter must be a nonnegative integer, and defaults to 2
l : int, optional
    block size of the normalized power iterations;
    l must be a positive integer >= k, and defaults to k+2

Returns
-------
U : ndarray, shape (m, k)
    m x k matrix in the rank-k approximation U diag(s) Va to A or
    C(A), where A is m x n, and C(A) refers to A after centering
    its columns; the columns of U are orthonormal
s : ndarray, shape (k,)
    vector of length k in the rank-k approximation U diag(s) Va to
    A or C(A), where A is m x n, and C(A) refers to A after
    centering its columns; the entries of s are all nonnegative and
    nonincreasing
Va : ndarray, shape (k, n)
    k x n matrix in the rank-k approximation U diag(s) Va to A or
    C(A), where A is m x n, and C(A) refers to A after centering
    its columns; the rows of Va are orthonormal

Examples
--------
>>> from numpy.random import uniform
>>> A = uniform(low=-1.0, high=1.0, size=(100, 2))
>>> A = A.dot(uniform(low=-1.0, high=1.0, size=(2, 100)))

Then we can either do:
>>> from scipy.linalg import svd
>>> (U, s, Va) = svd(A, full_matrices=False)
>>> A = A / s[0]

Or, as provided by this file:
>>> from randomised_pca pca
>>> (U, s, Va) = pca(A, 2, True)

This example produces a rank-2 approximation U diag(s) Va to A such
that the columns of U are orthonormal, as are the rows of Va, and
the entries of s are all nonnegative and are nonincreasing.

References
----------
.. [PCA] Nathan Halko, Per-Gunnar Martinsson, and Joel Tropp,
         Finding structure with randomness: probabilistic
         algorithms for constructing approximate matrix
         decompositions, arXiv:0909.4061 [math.NA; math.PR], 2009
         (available at `arXiv <http://arxiv.org/abs/0909.4061>`_).

About

Fast Randomized PCA/SVD

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%