linear algebra - How to approximate a solution to a matrix equation?

Monday, 1 April 2013

linear algebra - How to approximate a solution to a matrix equation?

I want to correct something in both Anton's ans las3rjock's answers. I hope that I am merely correcting bad notation.

The way to find x' (using the notation in the question) is, as correctly stated in each of the answers above, to derive the associated problem A^T Ax' = A^T b and solve that via Gaussian Elimination (also called Gauss-Jordan Elimination, the difference is technical and not important here). This is guaranteed to have a solution.

The fact that this is the correct solution to the problem relies on the properties of the "closest point" of a point to a subspace. We want the closest point to b on the subspace Im A. Call this b'. The properties of the "closest point" imply that the difference, b - b', is orthogonal to everything in Im A. It's simple to draw a picture to convince yourself that if c is in Im A and b - c is not orthogonal to everything in Im A then it is possible to "nudge" c a little, either away or towards the origin, to c' so that b - c' is shorter than b - c.

So b - b' is orthogonal to everything in Im A. Since being orthogonal to something is a linear condition, it is sufficient to check that b - b' is orthogonal to a spanning set for Im A, for which we can take the columns of A. As we are using the standard inner product, this means that for each column, say a of A, a^T(b - b') = 0. Putting these together, we obtain the relation A^T(b - b') = 0. As b' is in the image of A, there is some x' such that b' = A x', whence we see that x' satisfies A^Tb - A^TA x' = 0, and get the desired formula on rearranging. This also guarantees the existence of a solution to this equation.

Using the fact that the closest point is the unique point b' in Im A such that b - b' is orthogonal to everything in Im A, we can run this argument backwards to see that if x' is a solution of A^T A x' = A^T b then Ax' is the closest point to b in Im A.

Where the above answers go wrong is to then talk about the matrix (A^T A)^-1 A^T. The problem with this is that A^TA may not be invertible (take A to be the 2 by 3 zero matrix). There is a matrix which when A^TA is invertible is (A^TA)^-1A^T and this is called the pseudo-inverse. Essentially, A^T misses the kernel of A^TA meaning that the composition is always well-defined but it might not be decomposable as the notation suggests.

This notation may be standard, of that I don't know, but if it is then it is bad notation because it suggests a property that may not hold. At the least, it should always carry a rider to make clear that the notation is merely suggestive and not to be taken literally.

Answer Desk

Monday, 1 April 2013

linear algebra - How to approximate a solution to a matrix equation?

No comments:

Post a Comment