About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IEEE TNNLS
Paper
Noise Level Estimation for Model Selection in Kernel PCA Denoising
Abstract
One of the main challenges in unsupervised learning is to find suitable values for the model parameters. In kernel principal component analysis (kPCA), for example, these are the number of components, the kernel, and its parameters. This paper presents a model selection criterion based on distance distributions (MDDs). This criterion can be used to find the number of components and the σ2 parameter of radial basis function kernels by means of spectral comparison between information and noise. The noise content is estimated from the statistical moments of the distribution of distances in the original dataset. This allows for a type of randomization of the dataset, without actually having to permute the data points or generate artificial datasets. After comparing the eigenvalues computed from the estimated noise with the ones from the input dataset, information is retained and maximized by a set of model parameters. In addition to the model selection criterion, this paper proposes a modification to the fixed-size method and uses the incomplete Cholesky factorization, both of which are used to solve kPCA in large-scale applications. These two approaches, together with the model selection MDD, were tested in toy examples and real life applications, and it is shown that they outperform other known algorithms.