About A†

About A^†
These notes summarize what transpired in class during the discussion of how, given a k×n matrix A , to concoct a "best n×k attempt" A^† at finding an inverse for A . The means used for concocting A^† (to be described below, in a moment) include the techniques both of elementary row-reduction and of elementary column-reduction, and result in a matrix A^† for which the following two matrix equations hold:

(1) A A^† A = A ,
(2) A^† A A^† = A^† .

Q.: How close can such a matrix A^† actually be to an inverse for A ?

A. #1: When A has rank n , it will follow from equation (1) that

(3) A^† A = I_n×n .

Indeed, when A has rank n , all the variables in any equation Ax = b are bound, and so any such equation will never have more than one solution. Another way to say this is that from Ax = Ay it always follows that x = y . Now equation (1) assures us at least that, for each column vector x in Rⁿ , we must have
A (A^† A x) = A x ,
and so it will follow further that
A^† A x = x , for all x in Rⁿ ,
from which equation (3) follows.

A. #2: When A has rank k , it will follow from equation (1) that

(4) A A^† = I_k×k .
Indeed, when A has rank k , all k rows of A are linearly independent, so that the only way two linear combinations xA and yA can be equal, xA = yA (with x and y row vectors in R_k ) is for x and y to have been equal: x = y . From (1), of course, it follows that (x A A^†) A = x A , and so x A A^† = x , for all x in R_n , which establishes equation (4).

So how do we concoct our n×k matrix A^† , given an arbitrary k×n matrix A ? We begin by row-reducing A to its fully reduced row-echelon form frref(A) , and keeping track of the k×k invertible matrix E that "records" that row-reduction, i.e., that achieves E A = frref(A) .

Now frref(A) has a certain number of non-zero rows -- r of them, say (and of course rank(A) is exactly that r ) -- whose "leading 1s" appear in columns j₁ , j₂ , ..., j_i , ... , j_r , say. Let us now elementarily column-reduce frref(A) by first exchanging columns # 1 and # j₁ , then exchanging columns # 2 and # j₂ , then exchanging ... columns # i and # j_i , ... and finally exchanging columns # r and j_r .

The result is a k×n matrix whose left upper r×r block is I_r×r , whose bottom k−r rows are entirely filled with zeroes, and whose right upper r×(n−r) block is some matrix B or other. And there is an n×n invertible matrix F₁ that "records" this column-reduction:

I_r×r B_r×(n−r)
frref(A) F₁ = E A F₁ = .
O_(k−r)×r O_{(k−r)×(n−r)}
But we're not quite done: further elementary column operations can eliminate the block B . Indeed, if we let F₂ be the n×n matrix given by

I_r×r − B_r×(n−r)
F₂ = ,
O_(n−r)×r I_{(n−r)×(n−r)}
then F₂ is invertible (being superdiagonal, its determinant is easily seen to be just 1; as an aside, can you figure out its inverse?), and the fully column-reduced "column-echelon" form of frref(A) comes out, if we write F = F₁ F₂ , as

I_r×r O_r×(n−r)
frref(A) F₁ F₂ = E A F = .
O_(k−r)×r O_{(k−r)×(n−r)}
We now divulge our recipe for A^† : as A^† take the matrix product of the matrix L_r(F) consisting of the first ( left-most) r columns of F with the matrix T_r(E) consisting of the first ( top-most) r rows of E :

A^† = L_r(F) T_r(E) .
As L_r(F) is an n×r matrix and T_r(E) is an r×k matrix, their product A^† is indeed n×k , as desired, and it only remains to verify equations (1) and (2).

To this end, a little preliminary spade-work is needed. This can be summarized in the observations that, for products of two appropriately sized matrices M and N , we have

T_r(M N) = T_r(M) N and L_r(M N) = M L_r(N) ,
and that, so far as the product EAF itself is concerned (and you should check this for yourself!),

L_r(EAF) T_r(EAF) = EAF .
As for computing A A^† A , let's take on the seemingly harder task of evaluating E A A^† A F :
E A A^† A F = E A L_r(F) T_r(E) A F = L_r(EAF) T_r(EAF) = EAF .
Once we remember that E and F are invertible, with inverses E⁻¹ and F⁻¹ , respectively, we can finally conclude that
A A^† A = E⁻¹ E A A^† A F F⁻¹ = E⁻¹ EAF F⁻¹ = A ,
thus confirming equation (1).

Equation (2) is easier: the only preliminary spade-work needed is the observation that
L_r(T_r(EAF)) = I_r×r = T_r(L_r(EAF)) .
With this, we simply calculate:
A^† A A^† = (L_r(F) T_r(E)) A (L_r(F) T_r(E)) =
= L_r(F) (T_r(E) A L_r(F)) T_r(E) = L_r(F) L_r(T_r(E) A F) T_r(E) =
= L_r(F) L_r(T_r(E A F)) T_r(E) = L_r(F) I_r×r T_r(E) = L_r(F) T_r(E) = A^† ,
as needed to be shown.

Amusingly enough, A and A^† have the same rank. This follows easily from equations (1) and (2) once one knows, in general, for appropriately sized matrices M and N , that both rank(M N) < rank(M) and rank(M N) < rank(N) . For from (1) you can thereby deduce that rank(A) < rank(A^†) , while from (2) you can deduce that rank(A^†) < rank(A) . Moreover, A^T and A^† have not only the same rank, but the same nullity as well (take that as an exercise :-) ).

Last revised: 26 April 2004

	I_r×r	B_r×(n−r)
frref(A) F₁ = E A F₁ =			.
	O_(k−r)×r	O_{(k−r)×(n−r)}

	I_r×r	− B_r×(n−r)
F₂ =			,
	O_(n−r)×r	I_{(n−r)×(n−r)}

	I_r×r	O_r×(n−r)
frref(A) F₁ F₂ = E A F =			.
	O_(k−r)×r	O_{(k−r)×(n−r)}