y = b_0 + b_1 x_1 + ... + b_{p-1} x_{p-1} + error
y = Xb + errorLetting X^T represent the transpose of the matrix X, the normal equations are formed in this way.
X^T y = X^T X bNotice that there are now exactly p linear equations with p unknowns. If the matrix X is full rank (p if p < n), then X^T X will be invertible and the solution to the normal equations is
(X^T X)^{-1} X^T y = bwhere b is the estimate of the parameters that minimizes the residual sum of squares. (A residual is the difference between the actual value of y and the value of y that is predicted by the model.
On the surface, it appears that this requires the explicit inversion of a matrix, which requires substantial computation. A better algorithm for regression is found by using the QR decomposition.
Beginning with the normal equations, see how the QR decomposition simplifies them.
X^T X b = X^T y (QR)^T (QR) b = (QR)^T y R^T (Q^T Q) R b = R^T Q^T y R^T R b = R^T Q^T y (R^T)^{-1} R^T R b = (R^T)^{-1} R^T Q^T y R b = Q^T yIf we let z = Q^T y,
R b = zThis is simply an upper triangular system of equations which may be quickly solved by back substitution.
This algorithm will be efficient if the QR decomposition is fast. This algorithm will create the matrix Q by overwriting X and create a new matrix R.
for j = 1 to p { define r[j,j] = sqrt( sum_i x[i,j]^2 ) # r[j,j] is the norm of the jth column of X for i = 1 to n { x[i,j] = x[i,j] / r[j,j] } for k = j+1 to p { r[j,k] = sum_{i=1}^n x[i,j]x[i,k] for i = 1 to n { x[i,k] = x[i,k] - x[i,j] r[j,k] } } }
Bret Larget, larget@mathcs.duq.edu