About A

      These notes summarize what transpired in class during the discussion of how, given a k×n matrix A , to concoct a "best n×k attempt" A at finding an inverse for A . The means used for concocting A (to be described below, in a moment) include the techniques both of elementary row-reduction and of elementary column-reduction, and result in a matrix A for which the following two matrix equations hold:

(1)      A A A = A ,
(2)      A A A = A .

      Q.: How close can such a matrix A actually be to an inverse for A ?

      A. #1: When A has rank n , it will follow from equation (1) that

(3)      A A = In×n .

Indeed, when A has rank n , all the variables in any equation Ax = b are bound, and so any such equation will never have more than one solution. Another way to say this is that from Ax = Ay it always follows that x = y . Now equation (1) assures us at least that, for each column vector x in Rn , we must have
    A (A A x) = A x ,
and so it will follow further that
    A A x = x , for all x in Rn ,
from which equation (3) follows.

      A. #2: When A has rank k , it will follow from equation (1) that

(4)      A A = Ik×k .
Indeed, when A has rank k , all k rows of A are linearly independent, so that the only way two linear combinations xA and yA can be equal, xA = yA (with x and y row vectors in Rk ) is for x and y to have been equal: x = y . From (1), of course, it follows that (x A A) A = x A , and so x A A = x , for all x in Rn , which establishes equation (4).

      So how do we concoct our n×k matrix A , given an arbitrary k×n matrix A ? We begin by row-reducing A to its fully reduced row-echelon form frref(A) , and keeping track of the k×k invertible matrix E that "records" that row-reduction, i.e., that achieves E A = frref(A) .

      Now frref(A) has a certain number of non-zero rows -- r of them, say (and of course rank(A) is exactly that r ) -- whose "leading 1s" appear in columns j1 , j2 , ..., ji , ... , jr , say. Let us now elementarily column-reduce frref(A) by first exchanging columns # 1 and # j1 , then exchanging columns # 2 and # j2 , then exchanging ... columns # i and # ji , ... and finally exchanging columns # r and jr .

      The result is a k×n matrix whose left upper r×r block is Ir×r , whose bottom kr rows are entirely filled with zeroes, and whose right upper r×(nr) block is some matrix B or other. And there is an n×n invertible matrix F1 that "records" this column-reduction:
  Ir×r Br×(nr)  
frref(A) F1 = E A F1 =     .
  O(krr O(kr)×(nr)  
But we're not quite done: further elementary column operations can eliminate the block B . Indeed, if we let F2 be the n×n matrix given by
  Ir×r Br×(nr)  
F2 =     ,
  O(nrr I(nr)×(nr)  
then F2 is invertible (being superdiagonal, its determinant is easily seen to be just 1; as an aside, can you figure out its inverse?), and the fully column-reduced "column-echelon" form of frref(A) comes out, if we write F = F1 F2 , as
  Ir×r Or×(nr)  
frref(A) F1 F2 = E A F =     .
  O(krr O(kr)×(nr)  
      We now divulge our recipe for A : as A take the matrix product of the matrix Lr(F) consisting of the first ( left-most) r columns of F with the matrix Tr(E) consisting of the first ( top-most) r rows of E :

    A = Lr(F) Tr(E) .
As Lr(F) is an n×r matrix and Tr(E) is an r×k matrix, their product A is indeed n×k , as desired, and it only remains to verify equations (1) and (2).

      To this end, a little preliminary spade-work is needed. This can be summarized in the observations that, for products of two appropriately sized matrices M and N , we have

    Tr(M N) = Tr(MN   and   Lr(M N) = M Lr(N) ,
and that, so far as the product EAF itself is concerned (and you should check this for yourself!),

    Lr(EAF) Tr(EAF) = EAF .
      As for computing A A A , let's take on the seemingly harder task of evaluating E A A A F :
    E A A A F = E A Lr(F) Tr(E) A F = Lr(EAF) Tr(EAF) = EAF .
Once we remember that E and F are invertible, with inverses E−1 and F−1 , respectively, we can finally conclude that
A A A = E−1 E A A A F F−1 = E−1 EAF F−1 = A ,
thus confirming equation (1).

      Equation (2) is easier: the only preliminary spade-work needed is the observation that
Lr(Tr(EAF)) = Ir×r = Tr(Lr(EAF)) .
With this, we simply calculate:
A A A = (Lr(F) Tr(E)) A (Lr(F) Tr(E)) =
= Lr(F) (Tr(E) A Lr(F)) Tr(E) = Lr(F) Lr(Tr(E) A F) Tr(E) =
= Lr(F) Lr(Tr(E A F)) Tr(E) = Lr(F) Ir×r Tr(E) = Lr(F) Tr(E) = A ,
as needed to be shown.

      Amusingly enough, A and A have the same rank. This follows easily from equations (1) and (2) once one knows, in general, for appropriately sized matrices M and N , that both rank(M N) < rank(M) and rank(M N) < rank(N) . For from (1) you can thereby deduce that rank(A) < rank(A) , while from (2) you can deduce that rank(A) < rank(A) . Moreover, AT and A have not only the same rank, but the same nullity as well (take that as an exercise :-) ).

Last revised: 26 April 2004