Use PCA to create eigenfaces.
Principle Component Analysis (PCA) is a statistical method that transforms an n-dimensional data set (via orthogonal matrix multiplication) into its "closest" k-dimensional approximation.
Within the context of image classification (here, specifically face recognition), the data is a set of images. The features (i.e., the dimensions) of this dataset are the images themselves, each represented by a vector
Then, given a sufficiently large set of face vectors ,
any face can be approximately reconstructed as a linear combination of these
s. For example,
Of course, the more face vectors used, the better the approximation.
Now, PCA can be applied to the feature matrix X, transforming each face vector into an eigenface vector. There are still p of these vectors, each eigenface vector still contains pixel intensity values, and a facial image can still be reconstructed as a linear combination of the eigenfaces. However, an eigenface does not correspond to a specific face from the original data set. Instead, it is an amalgamation of all the faces, now representing the presence and prevalence of a given image feature.
The advantage to using eigenface vectors (rather than the original face vectors) to approximate facial images is that the set of eigenface vecors is linearly independent. This means that if we drop an eigenface vector from the set, we lose the information contained in just that single vector (this is not the case with the original face vectors, in which dropping a single vector results in reduced information for many features). Thus, we can easily reduce the dimensionality of the set of faces by retaining only the k most important eigenfaces. This transforms a data set of dimension p into a set of dimension k.