Selecting the number of components in PCA using cross-validation approximations

Abstract : Cross-validation is a tried and tested approach to select the number of components in principal component analysis (PCA), however, its main drawback is its computational cost. In a regression (or in a non parametric regression) setting, criteria such as the general cross-validation one (GCV) provide convenient approximations to leave-one-out cross-validation. They are based on the relation between the prediction error and the residual sum of squares weighted by elements of a projection matrix (or a smoothing matrix). Such a relation is then established in PCA using an original presentation of PCA with a unique projection matrix. It enables the definition of two cross-validation approximation criteria: the smoothing approximation of the cross-validation criterion (SACV) and the GCV criterion. The method is assessed with simulations and gives promising results.
Complete list of metadatas

https://hal-agrocampus-ouest.archives-ouvertes.fr/hal-00729614
Contributor : Céline Martel <>
Submitted on : Friday, September 7, 2012 - 3:47:39 PM
Last modification on : Thursday, November 15, 2018 - 11:56:25 AM

Identifiers

Citation

Julie Josse, François Husson. Selecting the number of components in PCA using cross-validation approximations. Computational Statististics and Data Analysis, 2012, 56 (6), pp.1869-1879. ⟨10.1016/j.csda.2011.11.012⟩. ⟨hal-00729614⟩

Share

Metrics

Record views

621