I want to correct something in both Anton's ans las3rjock's answers. I hope that I am merely correcting bad notation.
The way to find x' (using the notation in the question) is, as correctly stated in each of the answers above, to derive the associated problem AT Ax' = AT b and solve that via Gaussian Elimination (also called Gauss-Jordan Elimination, the difference is technical and not important here). This is guaranteed to have a solution.
The fact that this is the correct solution to the problem relies on the properties of the "closest point" of a point to a subspace. We want the closest point to b on the subspace Im A. Call this b'. The properties of the "closest point" imply that the difference, b - b', is orthogonal to everything in Im A. It's simple to draw a picture to convince yourself that if c is in Im A and b - c is not orthogonal to everything in Im A then it is possible to "nudge" c a little, either away or towards the origin, to c' so that b - c' is shorter than b - c.
So b - b' is orthogonal to everything in Im A. Since being orthogonal to something is a linear condition, it is sufficient to check that b - b' is orthogonal to a spanning set for Im A, for which we can take the columns of A. As we are using the standard inner product, this means that for each column, say a of A, aT(b - b') = 0. Putting these together, we obtain the relation AT(b - b') = 0. As b' is in the image of A, there is some x' such that b' = A x', whence we see that x' satisfies ATb - ATA x' = 0, and get the desired formula on rearranging. This also guarantees the existence of a solution to this equation.
Using the fact that the closest point is the unique point b' in Im A such that b - b' is orthogonal to everything in Im A, we can run this argument backwards to see that if x' is a solution of AT A x' = AT b then Ax' is the closest point to b in Im A.
Where the above answers go wrong is to then talk about the matrix (AT A)-1 AT. The problem with this is that ATA may not be invertible (take A to be the 2 by 3 zero matrix). There is a matrix which when ATA is invertible is (ATA)-1AT and this is called the pseudo-inverse. Essentially, AT misses the kernel of ATA meaning that the composition is always well-defined but it might not be decomposable as the notation suggests.
This notation may be standard, of that I don't know, but if it is then it is bad notation because it suggests a property that may not hold. At the least, it should always carry a rider to make clear that the notation is merely suggestive and not to be taken literally.
No comments:
Post a Comment