5 - NIPALS

3 minute read

Last time, we talked about Principle Component Analysis. PCA requires to evaluate covariance matrix of data. And the problem is when the data gets big or has high dimension, it is difficult to estimate the covariance matrix. So alternative way of getting principle component is Nonlinear Iterative Partial Least Squares(NIPALS). In this post, we will talk about NIPALS.

For simplicity, we define data \(X\) has zero mean.

\[X = \left(\begin{array}) x_1^T \\ x_2^T \\ ... \\ x_N^T \end{array}\right)\]

where each row vector represents each data vector. Now, suppose we want to find the first principal component, \(u\). That is, we approximate \(x_i\) by taking \(w_i u\). We can stack \([w_1, w_2, ..., w_N] = W\). And the matrix \(Wu^T\) is a good approximation of data \(X\).

We use Frobenius norm is a matrix norm obtained by summing squared entries of the matrix. By using Frobenius norm, we can define cost

\[C(w, u ) = \| X - wu^T \|_F^2\] \[= \sum_{ij} (x_{ij} - w_i u_j)^2\]

We need to find proper pairs of \(w\) and \(u\). However, there is no unique choice, since the pair \((sw, \frac{1}{s}w)\), and the pair \((w, u)\) gives same cost. So, we will set \(u\) as an unit vector, where \(\|u\| = 1\).

Taking partial derivatives of the cost function gives us solutions of parameters.

\[\frac{\partial C} {\partial w_k} = \sum_j (x_{jk} - w_k u_j) u_j\] \[\nabla_w C = (X - wu^T)u\]

For \(u\),

\[\frac{\partial C} { \partial u_l} = \sum_k (x_il - w_i u_l) w_i\] \[\nabla_u C = (X^T - uw^T)w\]

Setting these partial derivatives zero, we can update \(w\) and \(u\) iteratively:

\[\hat w = \frac{Xu^{(n)})} {(u^{(n)})^T u^{(n)} }\] \[\hat u = \frac {X^T \hat w}{\hat w^T \hat w}\]

By rescaling, we can ensure \(u\) is unit vector. Let \(s = \sqrt{(\hat u)^T \hat u}\), then

\[u^{(n+1)} = \frac {\hat u}{s}\] \[w^{(n+1)} = s \hat w\]

We can check for convergence by checking whether below value is small

\[\mid u^{(n+1)} - u^{(n)} \mid\]

Note that NIPALS is helpful for finding first few principal components. However, if many components are computed, numerical errors will overwhelm, giving incorrect results.

Code

Now lets’ use NIPALS to get principal components of iris data. Recall the steps:

  • Process data to have zero mean
  • Randomly initiate \(w\)
  • For i=0 to number of principal components
    • Repeat until \(\|w^{(n)}\| - \|w^{(n-1)}\| \leq threshold\)
      • Set \(u = \frac{(X^T w^{(n-1)})}{(w^{(n-1)})^T w^{(n-1)}}\)
      • Normalize \(u^* = \frac{u}{\|u\|}\)
      • Set \(w^{(n)} = \frac {X u}{\|u \|}\)
    • Set \(X^* = X - wu^T\)

Here the threshold is usually 0.00001.

def NIPALS(data, n_comp=2):

    # mean center
    mean = np.mean(data)
    x = data - mean
    
    # return values
    u_list = []
    w_list = []
    # iterate over number of principal components
    for i in range(n_comp):
        steps = 1000
        w = np.random.rand(150,1).reshape(-1)
        prev_w = 0
        threshold = 1e-6
        
        # repeat until convergence
        for i in range(steps):
            # set u
            u = np.dot(w.T, x) / np.dot(w.T,w)
            # normalize u
            u = u / np.sqrt(np.dot(u,u.T))
            # get w
            w = np.dot(x, u) / np.sqrt(np.dot(u,u.T))

            # convergence check
            if i != 0 and i %10 ==0:
                if prev_w - np.linalg.norm(w) > threshold:
                    prev_w = np.linalg.norm(w)
                else:
                    break
        
        u_list.append(u)
        w_list.append(w)
        recon = np.dot(w.reshape(-1,1), u.reshape(1,-1))
        wu = np.dot(w.reshape(-1,1), u.reshape(1,-1))
        x = x - wu
        
    return w_list, u_list

w, u = NIPALS(data)
u

[array([ 0.36138659, -0.08452251,  0.85667061,  0.3582892 ]),
 array([ 0.65659012,  0.73016005, -0.17337284, -0.07548229])]

Compare this result with sklearn PCA method.

pca = PCA(n_components=2)
components = pca.fit_transform(data)
pca.components_

array([[ 0.36138659, -0.08452251,  0.85667061,  0.3582892 ],
       [ 0.65658877,  0.73016143, -0.17337266, -0.07548102]])

We can see the first component from NIPALS and that from PCA are the same, while the next one components differ little. That is since NIPALS is to approximate the components, more and more components it estimates, the more errors cumulate.

Reference

  • CS498 Applied Machine Learning by Professor Trevor Walker
  • Bishop, C. M. (2006). Pattern recognition and machine learning. springer.

Categories:

Updated:

Leave a comment