About
A†
These notes summarize what transpired in class during the discussion of how, given a k×n matrix A , to concoct a "best n×k attempt" A† at finding an inverse for A . The means used for concocting A†
(to be described below, in a moment) include the techniques both of elementary row-reduction and of elementary column-reduction, and result in a matrix A† for which the following two matrix equations hold:
(1) A A† A = A ,
(2) A† A A† = A† .
Q.: How close can such a matrix
A† actually be to an inverse for A ?
A. #1: When A has rank n , it will follow from equation (1) that
(3) A† A = In×n .
Indeed, when A has rank n , all the variables in any equation Ax = b are bound, and so any such equation will
never have more than one solution. Another way to say this is that from Ax = Ay
it always follows that x = y . Now equation (1) assures us
at least that, for each column vector x in Rn , we must have
A (A† A x) = A x ,
and so it will follow further that
A† A x = x , for all x
in Rn ,
from which equation (3) follows.
A. #2: When A has rank k , it will follow from equation (1) that
(4) A A†
= Ik×k .
Indeed, when A has rank k , all k rows of A are
linearly independent, so that the only way two linear combinations xA
and yA can be equal, xA = yA (with x and y row vectors in Rk ) is for
x and y to have been equal: x = y . From (1), of course, it follows that
(x A A†) A = x A ,
and so x A A† = x , for all
x in Rn , which establishes equation (4).
So how do we concoct our
n×k matrix A† , given
an arbitrary k×n matrix A ? We begin by row-reducing
A to its fully reduced row-echelon form frref(A) , and keeping track of the k×k invertible matrix E that "records"
that row-reduction, i.e., that achieves E A = frref(A) .
Now frref(A) has a certain number of non-zero
rows -- r of them, say (and of course rank(A) is exactly that r
) -- whose "leading 1s" appear in columns j1 ,
j2 , ..., ji , ... ,
jr , say. Let us now elementarily column-reduce
frref(A) by first exchanging columns # 1 and # j1 ,
then exchanging columns # 2 and # j2 , then exchanging ...
columns # i and # ji , ... and finally
exchanging columns # r and jr .
The result is a k×n matrix whose left
upper r×r block is Ir×r ,
whose bottom k−r rows are entirely filled with zeroes,
and whose right upper r×(n−r) block is some
matrix B or other. And there is an n×n invertible matrix F1 that "records" this column-reduction:
|
|
| Ir×r
|
| Br×(n−r)
|
|
|
frref(A) F1 = E A F1 =
|
|
|
|
|
|
| .
|
|
|
| O(k−r)×r
|
| O(k−r)×(n−r)
|
|
|
But we're not quite done: further elementary column operations can eliminate
the block B . Indeed, if we let F2 be
the n×n matrix given by
|
|
| Ir×r
|
| − Br×(n−r)
|
|
|
F2 =
|
|
|
|
|
|
| ,
|
|
|
| O(n−r)×r
|
| I(n−r)×(n−r)
|
|
|
then F2 is invertible (being superdiagonal, its determinant
is easily seen to be just 1; as an aside, can you figure out its inverse?), and
the fully column-reduced "column-echelon" form of frref(A) comes
out, if we write F = F1 F2 , as
|
|
| Ir×r
|
| Or×(n−r)
|
|
|
frref(A) F1
F2 = E A F =
|
|
|
|
|
|
| .
|
|
|
| O(k−r)×r
|
| O(k−r)×(n−r)
|
|
|
We now divulge
our recipe for A† : as A†
take the matrix product of the matrix Lr(F)
consisting of the first ( left-most) r columns of F with the
matrix Tr(E) consisting of the first ( top-most) r rows of E :
A† =
Lr(F) Tr(E) .
As Lr(F) is an n×r matrix
and Tr(E) is an r×k matrix,
their product A† is indeed n×k ,
as desired, and it only remains to verify equations (1) and (2).
To this end, a little preliminary spade-work is needed. This can be summarized
in the observations that, for products of two appropriately sized matrices
M and N , we have
Tr(M N)
= Tr(M) N
and Lr(M N) =
M Lr(N) ,
and that, so far as the product EAF itself is concerned (and you should check this for yourself!),
Lr(EAF) Tr(EAF) = EAF .
As for computing A A† A , let's take on the seemingly harder task
of evaluating E A A† A F :
E A A† A F =
E A Lr(F) Tr(E) A F =
Lr(EAF) Tr(EAF) = EAF .
Once we remember that E and F are invertible, with inverses
E−1 and F−1 , respectively,
we can finally conclude that
A A† A
= E−1 E
A A† A F F−1
= E−1 EAF F−1 = A ,
thus confirming equation (1).
Equation (2) is easier: the only preliminary spade-work needed is the observation that
Lr(Tr(EAF))
= Ir×r
= Tr(Lr(EAF)) .
With this, we simply calculate:
A† A A† =
(Lr(F) Tr(E)) A (Lr(F) Tr(E)) =
= Lr(F) (Tr(E) A Lr(F)) Tr(E)
= Lr(F)
Lr(Tr(E) A F) Tr(E) =
= Lr(F)
Lr(Tr(E A F)) Tr(E)
= Lr(F) Ir×r
Tr(E) = Lr(F)
Tr(E) = A† ,
as needed to be shown.
Amusingly enough, A and A† have the same rank.
This follows easily from equations (1) and (2) once one knows, in general, for
appropriately sized matrices M and N , that both
rank(M N) < rank(M) and rank(M N) < rank(N) . For from (1) you can thereby deduce that
rank(A) < rank(A†) , while from
(2) you can deduce that rank(A†) < rank(A) .
Moreover, AT and A† have not only
the same rank, but the same nullity as well (take that as an exercise :-) ).
Last revised: 26 April 2004