Whitening

Data whitening is the process of converting a random vector \(X\) with only first-order correlation into a new random vector \(Z\) such that the covariance matrix of \(Z\) is an identity matrix. Data whitening usually has two steps: the decorrelation step and the standardization step.

To do it, we shall first apply the orthogonal diagonalization to \(X\)’s covariance matrix \(\Sigma_X\): \[ \Sigma_X \Phi = \Phi \Lambda \] where \(\Phi\) contains the normalized eigenvectors and \(\Phi^{-1} = \Phi^T\), \(\Lambda\) is diagonal and contains the eigenvalues. Now let \(Y = \Phi^T X\), we can verify that \[ \begin{aligned} \Sigma_Y &= \E \{ \Phi^T (\x - \mu_X) [\Phi^T (\x - \mu_X)]^T \} \\ &= \E [\Phi^T (\x - \mu_X) (\x - \mu_X)^T \Phi] \\ &= \Phi^T \E [(\x - \mu_X) (\x - \mu_X)^T] \Phi \\ &= \Phi^T \Sigma_X \Phi = \Phi^T \Phi \Lambda = \Lambda \end{aligned} \] \(\Sigma_Y\) is diagonal and we finish the decorrelation step. To further make it an identity matrix (the standardization step), we apply \(Z = \Lambda^{-1/2} Y = \Lambda^{-1/2} \Phi^T X\) to give \[ \begin{aligned} \Sigma_Z &= \E \{\Lambda^{-1/2} \Phi^T (\x - \mu_X) [\Lambda^{-1/2} \Phi^T (\x - \mu_X)]^T \} \\ &= \E [\Lambda^{-1/2} \Phi^T (\x - \mu_X) (\x - \mu_X)^T \Phi \Lambda^{-1/2}] \\ &= \Lambda^{-1/2} \Phi^T \Sigma_X \Phi \Lambda^{-1/2} = I \end{aligned} \] The inverse of data whitening can be used to derive density function of first-order correlated random variables, e.g. for Gaussian case. Data whitening looks a lot like PCA: they both compute the eigen pairs; they both project the original data onto the basis formed by eigenvectors; they both can be solved with SVD. But unlike PCA, data whitening uses all the eigenvectors as the basis instead of \(K\) most prominent ones. Therefore, data whitening does not reduce the data’s dimensionality as PCA does.

Last updated on Aug 11, 2022