Principle Component Analysis — Jemoka Knowledge Base

assumptions of PCA this makes sense only when the data is linear large variations have important structure (i.e. not noise) motivation Consider dataset \left\{x^{(i)}\right\}_{i=1}^{n} such that x^{(i)} \in \mathbb{R}^{d}, where d < n. Our goal: we want to automatically identify such correlations between axes of data. This boils down to two goals: automatically detect and remove redudancies find features that explain the variation preprocessing before PCA normalize: x_{j}^{(i)} = \frac{x^{(i)}_{j} - \mu_{j}}{\sigma_{j}} derivation Find a hyperplane u such that if data is projected onto u, the variance of the projected data is maximized. Namely, for unit vector u representing the hyperplane, we want to find u such that we ca maximize:

\begin{align} \frac{1}{n} \sum_{i=1}^{n} \left(x^{(i)}^{T} u\right)^{2} &= \frac{1}{n} \sum_{i=1}^{n} u^{T} x^{(i)} x^{(i)}^{T} u \\ &= u^{T} \left(\frac{1}{n} \sum_{i=1}^{n} x^{(i)} x^{(i)}^{T}\right) u \end{align}

Notice the maximizing choice of u should be the principle eigenvector of \frac{1}{n} \sum_{i=1}^{n} x^{(i)} x^{(i)}^{T} (its the vector that does no rotation and just stretch.) To show this for yourself, use the method of Lagrange multipliers. Once you do this, you can then project:

\begin{equation} x \approx \left(x^{T}u \right)u \end{equation}

you can find more than one dimensions by choosing increasingly smaller eigenvectors corresponding to u:

\begin{equation} x \approx \sum_{i=1}^{k} \left(x^{T}u_{i}\right) u_{i} \end{equation}

connecting SVD and PCA Recall in PCA we obtain the matrix:

\begin{equation} \left(\frac{1}{n} \sum_{i=1}^{n} x^{(i)} x^{(i)}^{T}\right) \end{equation}

for which we need the eigenvalue. Notice, we can write via SVD:

\begin{equation} X = U S V^{T} \end{equation}

Now, the transposes are:

\begin{equation} X^{T} = V S U^{T} \end{equation}

this gives:

\begin{align} X X^{T} &=U S V^{T} V S U^{T} \\ &= U S S U^{T} \\ &= \left(US\right) \left(US\right)^{T} \end{align}

This gives that U is the eigenvectors of XX^{T}, this is nice because then we can just use the SVD to give us the PCA vectors. error We can consider the error of PCA as:

\begin{equation} x^{(i)} - \sum_{j=1}^{k} \left(x^{(i)}^{T}u_{j}\right) u_{j} \end{equation}

applications visualization preprocessing to Prudence dimensions reduce data noise (which is getting rid of extra axes some recent work Candes, Li, Ma, Wright: “Robust PCA” — minimize \mid X - L\mid such that \text{rank}\left(L\right) \leq K; we find that the right minimizer is \mid L\mid_{*} + \lambda \mid S\mid_{1}, where \mid L\mid_{*} is the sum of the singular values of L, and \mid S\mid_{1} is just a rank 1 norm