5 - NIPALS
Last time, we talked about Principle Component Analysis. PCA requires to evaluate covariance matrix of data. And the problem is when the data gets big or has high dimension, it is difficult to estimate the covariance matrix. So alternative way of getting principle component is Nonlinear Iterative Partial Least Squares(NIPALS). In this post, we will talk about NIPALS.
For simplicity, we define data \(X\) has zero mean.
\[X = \left(\begin{array}) x_1^T \\ x_2^T \\ ... \\ x_N^T \end{array}\right)\]where each row vector represents each data vector. Now, suppose we want to find the first principal component, \(u\). That is, we approximate \(x_i\) by taking \(w_i u\). We can stack \([w_1, w_2, ..., w_N] = W\). And the matrix \(Wu^T\) is a good approximation of data \(X\).
We use Frobenius norm is a matrix norm obtained by summing squared entries of the matrix. By using Frobenius norm, we can define cost
\[C(w, u ) = \| X - wu^T \|_F^2\] \[= \sum_{ij} (x_{ij} - w_i u_j)^2\]We need to find proper pairs of \(w\) and \(u\). However, there is no unique choice, since the pair \((sw, \frac{1}{s}w)\), and the pair \((w, u)\) gives same cost. So, we will set \(u\) as an unit vector, where \(\|u\| = 1\).
Taking partial derivatives of the cost function gives us solutions of parameters.
\[\frac{\partial C} {\partial w_k} = \sum_j (x_{jk} - w_k u_j) u_j\] \[\nabla_w C = (X - wu^T)u\]For \(u\),
\[\frac{\partial C} { \partial u_l} = \sum_k (x_il - w_i u_l) w_i\] \[\nabla_u C = (X^T - uw^T)w\]Setting these partial derivatives zero, we can update \(w\) and \(u\) iteratively:
\[\hat w = \frac{Xu^{(n)})} {(u^{(n)})^T u^{(n)} }\] \[\hat u = \frac {X^T \hat w}{\hat w^T \hat w}\]By rescaling, we can ensure \(u\) is unit vector. Let \(s = \sqrt{(\hat u)^T \hat u}\), then
\[u^{(n+1)} = \frac {\hat u}{s}\] \[w^{(n+1)} = s \hat w\]We can check for convergence by checking whether below value is small
\[\mid u^{(n+1)} - u^{(n)} \mid\]Note that NIPALS is helpful for finding first few principal components. However, if many components are computed, numerical errors will overwhelm, giving incorrect results.
Code
Now lets’ use NIPALS to get principal components of iris data. Recall the steps:
- Process data to have zero mean
- Randomly initiate \(w\)
- For i=0 to number of principal components
- Repeat until \(\|w^{(n)}\| - \|w^{(n-1)}\| \leq threshold\)
- Set \(u = \frac{(X^T w^{(n-1)})}{(w^{(n-1)})^T w^{(n-1)}}\)
- Normalize \(u^* = \frac{u}{\|u\|}\)
- Set \(w^{(n)} = \frac {X u}{\|u \|}\)
- Set \(X^* = X - wu^T\)
- Repeat until \(\|w^{(n)}\| - \|w^{(n-1)}\| \leq threshold\)
Here the threshold is usually 0.00001.
def NIPALS(data, n_comp=2):
# mean center
mean = np.mean(data)
x = data - mean
# return values
u_list = []
w_list = []
# iterate over number of principal components
for i in range(n_comp):
steps = 1000
w = np.random.rand(150,1).reshape(-1)
prev_w = 0
threshold = 1e-6
# repeat until convergence
for i in range(steps):
# set u
u = np.dot(w.T, x) / np.dot(w.T,w)
# normalize u
u = u / np.sqrt(np.dot(u,u.T))
# get w
w = np.dot(x, u) / np.sqrt(np.dot(u,u.T))
# convergence check
if i != 0 and i %10 ==0:
if prev_w - np.linalg.norm(w) > threshold:
prev_w = np.linalg.norm(w)
else:
break
u_list.append(u)
w_list.append(w)
recon = np.dot(w.reshape(-1,1), u.reshape(1,-1))
wu = np.dot(w.reshape(-1,1), u.reshape(1,-1))
x = x - wu
return w_list, u_list
w, u = NIPALS(data)
u
[array([ 0.36138659, -0.08452251, 0.85667061, 0.3582892 ]),
array([ 0.65659012, 0.73016005, -0.17337284, -0.07548229])]
Compare this result with sklearn PCA method.
pca = PCA(n_components=2)
components = pca.fit_transform(data)
pca.components_
array([[ 0.36138659, -0.08452251, 0.85667061, 0.3582892 ],
[ 0.65658877, 0.73016143, -0.17337266, -0.07548102]])
We can see the first component from NIPALS and that from PCA are the same, while the next one components differ little. That is since NIPALS is to approximate the components, more and more components it estimates, the more errors cumulate.
Reference
- CS498 Applied Machine Learning by Professor Trevor Walker
- Bishop, C. M. (2006). Pattern recognition and machine learning. springer.
Leave a comment