9.6. Orthogonality#

import numpy as np
from matplotlib import pyplot as plt

9.6.1. Inner Products and Orthogonality#

In this section, we explore inner products and orthogonality. Inner products allow us to define intuitive geometric concepts such as length, distance, and perpendicularity in n dimensional space Rn.

Inner product#

Let u and v be vectors in Rn. We can view them as n×1 matrices. The transpose of u, denoted as uT, is a 1×n matrix. The matrix product of uT and v yields a 1×1 matrix, which can be written as a real number (a scalar) without brackets. This scalar is known as the inner product or dot product of u and v. To compute it, we multiply the corresponding elements of u and v and sum the results. More precisely, if

u=[u1u2un]

and

v=[v1v2vn],

then the dot product uv is given by:

uv=uTv=[u1u2un][v1v2vn]=u1v1+u2v2++unvn

Example 1

Compute uv for u=[112] and v=[234].

Solution

We use numpy.dot() to compute the dot product in Python:

import numpy as np

u = np.array([1, -1, 2])
v = np.array([-2, 3, 4])

uv = np.dot(u,v)
uv
3

Theorem 1 (Properties of the dot product)

Let u, v and w be vectors in Rn, and cR be a scalar. Then

  1. uv=vu.

  2. (u+v)w=uw+vw.

  3. c(uv)=(cu)v=u(cv).

  4. uv0, and uu=0 if and only if u=0.

combining (2) and (3), and using induction we can show:

(c1u1+c2u2cpup)w=c1wu1+c2wu2+cpwup

Length of vectors

The dot product can be used to define the length of vectors: let uRn then the magnitude or the length of u, denoted by u is defined by

u=uu.
  • Note that this definition of length coincides with the standard notion of length in R2 and R3.

  • A vector with a length of 1 is called a unit vector. For any vector u, there exists a unit vector in the direction of u. To obtain this vector, we first calculate the length of u and then divide u by its length u. The resulting vector is referred to as the unit vector of u, and commonly denoted as eu:

eu=uu

This process is often called normalizing a vector, as it transforms u into a unit vector by scaling it to have a length of 1.

Example 2

Let u=[123]. Compute the following:

  1. u

  2. 2u

  3. eu

Solution:

# (1)

u = np.array([1,2,3])

uu = np.dot(u,u)

length_u = np.sqrt(uu)

length_u
3.7416573867739413
# (2)

v = -2*u

vv = np.dot(v,v)

length_v = np.sqrt(vv)

length_v
7.483314773547883

We could also use the properties of the dot products to compute 2u:

2u=2u2u=2(uu)=2u
#(3) computing the unit vector of u

e_u = u/length_u

e_u
array([0.26726124, 0.53452248, 0.80178373])

Let’s check that the length of eu is exactly 1:

length_e = np.sqrt(np.dot(e_u,e_u))

length_e
1.0

Example 3

Given vectors

u=[112]

and

v=[234],

and the subspace W spanned by u and v, find unit vectors w1 and w2 that form a basis for W.

Solution:

Since u and v are linearly independent (they are not scalar multiples of each other), we can proceed to normalize them to get unit vectors.

# unit vector of u
u = np.array([1,-1,2])
e_u = u / np.sqrt(np.dot(u,u))


# unit vector of v
v = np.array([-2, 3, 4])
e_v = v / np.sqrt(np.dot(v,v))


print("e_u = ", e_u)
print("e_v = ", e_v)
e_u =  [ 0.40824829 -0.40824829  0.81649658]
e_v =  [-0.37139068  0.55708601  0.74278135]

Distance in Rn

For u and v in Rn, the distance between u and v, denoted by dist (v,u), is the length of their difference vector uv. That is,

dist (u,v)=uv.

Note that this definition coincides with the standard definition of distances in R2 and R3.

Example 4

Compute the distance between vectors in Example 2.

Solution:

w = u - v

dist = np.sqrt(np.dot(w,w))

print("The distance between ", u, " and ", v, " is ", dist)
The distance between  [1 2 3]  and  [-2 -4 -6]  is  11.224972160321824

Angles and Orthogonality#

Theorem 2 (Cauchy-Schwarz inequality)

For u and v in Rn we have

|uv|uv.

The Cauchy-Schwarz inequality gives us a way to define the notion of an angle between two n dimensional vectors u and v. This notion also coincides with our intuition in R2 and R3:

Suppose u,vRn are non-zero vectors. Note that

1uvuv1

Therefore, there is a unique θ[0,π] such that

cos(θ)=uvuv

θ is referred to as the angle between u and v and is equal to

θ=arccos(uvuv)

Example 5

Find the angle between vectors in Example 2.

Solution:

We first compute the length of u and v and then substitute them in above formula:

u_len = np.sqrt(np.dot(u,u))
v_len = np.sqrt(np.dot(v,v))

z = u_len * v_len

t = np.arccos(np.dot(u,v)/z)       
print("The angle between u and v is ", t ," rad")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 u_len = np.sqrt(np.dot(u,u))
      2 v_len = np.sqrt(np.dot(v,v))
      4 z = u_len * v_len

NameError: name 'u' is not defined

Two vectors u and v are orthogonal, denoted by uv, if the angle θ between them is π2. Alternatively, this condition is satisfied if and only if uv=0. It is worth noting that the zero vector 0 is orthogonal to every vector because 0Tu=0 for any vector u.

Observe that for any two vectors u and v we have

u+v2=(u+v)(u+v)=u2+v2+2uv.

The above calculation implies the famous Pythagorean Theorem:

Theorem 3 (The Pythagorean Theorem)

Two vectors u and v are orthogonal if and only if

u+v2=u2+v2

The orthogonal complement of a subspace

Let WRn be a subspace. If a vector u is orthogonal to every vector in the subspace W, then u is said to be orthogonal to W. The set of all vectors u that are orthogonal to W form a subspace which is commonly called the orthogonal complement of W and is denoted by W (or simply W):

W={uRn:uw=0 for all wW}

Example 6

Let W be the xy-plane in R3, and let L be the z-axis. Our intuition says L is orthogonal to W. In fact, if z and w are nonzero vectors such that zL and wW, then z is orthogonal to w. Since w and z were chosen arbitrarily, every vector on L is orthogonal to every w in W. In fact, L consists of all vectors that are orthogonal to the w’s in W, and W consists of all vectors w orthogonal to the z’s in L. Mathematically,

L=WandW=L

Example 7

If W=span{[100]}, find a basis for W.

Solution:

Note that W is simply the x-axis in R3. Intuitively, W should be the yz-plane, and a basis for it is

B={[010],[001]}.

Alternatively, we can find the solutions to the linear system:

[100][xyz]=0.

We have:

[xyz]=[0yz]=y[010]+z[001].

We conclude this section with a theorem summarizing the properties of the orthogonal complement of a subspace.

Theorem 4

Let W be a subspace of Rn.

  1. The orthogonal complement W is also a subspace of Rn.

  2. The sum of the dimensions of W and W is equal to the dimension of the ambient space Rn, i.e., dim(W)+dim(W)=n.

  3. If W is the column space of some matrix A, i.e., W=col(A), then W is precisely the null space of the transpose of A, i.e., W=null(AT).

Exercises#

Exercises

  1. Let a=[1234].

    a. Find a vector w that is in the opposite direction of a and has a magnitude of 2.

    b. Find two non-parallel vectors u and v which are both orthogonal to a

  2. Find two non-parallel vectors u and v which are both orthogonal to [123] and [210].

  3. Let

v=[127]

and

w=[111].

a. Find z=v(vwww)w.

b. Check that z is orthogonal to w.

  1. If

W=span{ [1234] }

find a basis for W.

9.6.2. Orthogonal Projection#

In this section, we explore the construction of orthogonal projections, which is a key step in many calculations involving orthogonality, and properties of projection maps.

Orthogonal Set#

A set of vectors {u1,u2,,up} in Rn is called an orthogonal set if each pair of distinct vectors from the set is orthogonal, meaning that uiuj=0 whenever ij. It is important to note that an orthogonal set, consisting of nonzero vectors, is always a linearly independent set. For that reason, if a basis for a subspace W of Rn happens to be an orthogonal set, it is called an orthogonal basis.

In general, an orthogonal basis makes computations easier. The following theorem demonstrates how:

Theorem 5

Let

B={u1,u2,,up}

be an orthogonal basis for a subspace WRn. Moreover, suppose yW can be expressed as a linear combination of the basis elements:

y=c1u1+c2u2++cpup.

Then the weights ci are given by

ci=uiyuiuii{1,2,,p}.

Example 1

Write the vector

y=[618]

as a linear combination of the vectors in

S={ [311],[121],[147] }.

Solution:

Since S is an orthogonal set with three nonzero elements, it forms a basis for R3. Therefore, we can express y in terms of the vectors in S:

y=c1[311]+c2[121]+c3[147].

We can solve this vector equation for cis using techniques we learned in section 1.2. However, Theorem 1 suggests an alternative way of computing the coefficients:

ci=uiyuiuifor all i{1,2,3}.

By substituting the appropriate values of ui and y into the formula, we can compute the coefficients cis.

y = np.array([6, 1, -8])

u1 = np.array([3, 1, 1])

u2 = np.array([-1, 2, 1])

u3 = np.array([-1,-4, 7])

#computing the weights:

c1 = np.dot(y,u1)/ np.dot(u1,u1)

c2 = np.dot(y,u2)/ np.dot(u2,u2)

c3 = np.dot(y,u3)/ np.dot(u3,u3)

print("c1 = ", c1, ", c2 = ", c1, ", c3 = ", c3)
c1 =  1.0 , c2 =  1.0 , c3 =  -1.0

Orthogonal Projections onto 1-dimensional spaces#

Given a nonzero vector y in Rn, decomposing y into a sum of two vectors is a straightforward task. For example, for any choice of α,βR such that α+β=1, we have:

y=αy+βy.

A more challenging question is when we want to express y as the sum of two vectors, where one is a scalar multiple of a given vector u and the other is orthogonal to u. More specifically, we seek to write:

y=p+z,

where p=αu for some scalar α, and zp. In other words, we are looking for a scalar α such that z:=yαu is orthogonal to u, which can be expressed as:

(yαu)u=0.

Simplifying the equation above, we find:

α=yuuu,and thusp=yuuu u,andz=yp.

The vector p, sometimes denoted as proju(y), is called the orthogonal projection of y onto u, and z is referred to as the component of y orthogonal to u.

Example 2

Suppose

y=[76] and u=[42].

Compute the proju(y) and the component of y orthogonal to u

Solution:

# setup vectors

y = np.array([7,6])

u = np.array([4,2])

# projection of y onto u

proj_y_u = (np.dot(y,u)/ np.dot(u,u))* u
print( "the projection of y onto u is ", proj_y_u) 

z = y - proj_y_u

print("the component of y orthogonal to u is", z )
the projection of y onto u is  [8. 4.]
the component of y orthogonal to u is [-1.  2.]

Let’s plot proju(y) and the component of y orthogonal to u in xy-plane:

# Plot the coordinates in two separate plots
fig, ax = plt.subplots()

figsize=(20, 10)

# vector y
ax.quiver(0, 0, 7, 6, angles='xy', scale_units='xy', scale=1, color='blue', label ='y')
ax.text(7.1,6.1,'$y$')

# vector u
ax.quiver(0, 0, 4, 2, angles='xy', scale_units='xy', scale=1, color='black')
ax.text(4.1,1.6,'$u$')

# orthogonal projection p
ax.quiver(0, 0, 8, 4, angles='xy', scale_units='xy', scale=1, color='red')
ax.text(8.1,4.1,'$p$')

# vector z
ax.quiver(0, 0, -1, 2, angles='xy', scale_units='xy', scale=1, color='green')
ax.text(-1.1,2.2,'$z = y - p$')

# A copy of vector z starting at the end of vector p
ax.quiver(8, 4, -1, 2, angles='xy', scale_units='xy', scale=1, color='green')



ax.set_xlim([-4, 10])
ax.set_ylim([-4, 10])
ax.set_aspect('equal')
ax.axhline(y=0, color='black', linestyle='--', linewidth=0.5)
ax.axvline(x=0, color='black', linestyle='--', linewidth=0.5)

ax.set_title('y = p + z')


plt.show()
../../_images/aa2fa491c123788abb7f99edf507c333ccd23fce7e069b00448f220de3a5d7d7.png

Orthogonal Projections onto a general subspace#

Let’s take another look at the above Figure from Example 2. The projection of y onto any vector in the direction of u, results in the same vector proju(y). In fact, what matters to us is the line that contains u (span({u})). With this in mind, we can interpret proju(y) as the projection of y onto a subspace of Rn generated by u, and we can write it as projspan(L)(y). In fact, the notion of projection can be generalized to any subspace of Rn.

Theorem 6

Let W be a subspace of Rn. Then each yRn can be written uniquely as

y=p+z

where p is in W and z is in W. Moreover, if B={u1,u2,up} is any orthogonal basis of W, then

p=yu1u1u1+yu2u2u2++yupupup()andz=yp

The vector p, sometimes denoted by projW(y) is called the orthogonal projection of y onto W, and z is called the component of y orthogonal to W.

Example 3

Suppose

u1=[251],
u2=[211],

W=span(u1,u2) and

y=[123].

Compute the projW(y) and the component of y orthogonal to u.

Solution:

u1 and u2 are orthogonal and form a basis for W; thus, we can use Theorem 3 to compute the projection of y onto W:

# setup vectors

u1 = np.array([2,5,-1])

u2 = np.array([-2,1,1])

y = np.array([1,2,3])

# projection of y onto W

proj_y_W = (np.dot(y,u1)/ np.dot(u1,u1))* u1  +  (np.dot(y,u2)/ np.dot(u2,u2))* u2

print( "The projection of y onto W is ", proj_y_W, '\n') 

# component of y orthogonal to p
z = y - proj_y_W

print("The component of y orthogonal to u is", z )
The projection of y onto W is  [-0.4  2.   0.2] 

The component of y orthogonal to u is [1.4 0.  2.8]

Note that if y happens to be in W then projW(y)=y. We can verify this fact for u1 which we know belongs to W:

proj_u_W = (np.dot(u1,u1)/ np.dot(u1,u1))* u1  +  (np.dot(u1,u2)/ np.dot(u2,u2))* u2
print(proj_u_W)
[ 2.  5. -1.]

The Best Approximation Theorem#

Projections can be viewed as linear transformations:

Consider a subspace W of Rn, and let TW:RnW be the mapping that sends a vector y to its orthogonal projection projW(y). It is known that TW is a linear transformation. If P represents the matrix of TW, then P2=P. More precisely,

PwWwW

Informally, we say P does not move elements of W. The following theorem states that if u is not in W, then P sends it to the closest point in W from u.

Theorem 7 (The Best Approximation Theorem)

Let W be a subspace of R2, and let yRn. The projection of y onto W, denoted as p=projW(y), is the closest point in W to y in the following sense:

ypywwW

In particular, if yW, then the closest point is exactly y. We refer to p as the best approximation to y by elements of W. The real number yp) is interpreted as the distance between the vector y and the subspace W.

Example 4

Suppose

u1=[521],
u2=[121],

W=span(u1,u2)

and

y=[1510].
  1. What is the closest point to y in W?

  2. Find the distance between y and W.

Solution

  1. The closest point is the projection of y onto W:

# setup vectors

y = np.array([-1, -5, 10])

u1 = np.array([5, -2, 1])

u2 = np.array([1 , 2, -1])

# projection of y onto W

p = (np.dot(y,u1)/ np.dot(u1,u1))* u1  +  (np.dot(y,u2)/ np.dot(u2,u2))* u2

print( "The projection of y onto W is ", p, '\n') 


# component of y orthogonal to p
z = y - p

print("The component of y orthogonal to u is", z, '\n' )

# distance between y and W
r = np.sqrt(np.dot(z,z))
print("The distance between y and W is", r )
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 3
      1 # setup vectors
----> 3 y = np.array([-1, -5, 10])
      5 u1 = np.array([5, -2, 1])
      7 u2 = np.array([1 , 2, -1])

NameError: name 'np' is not defined

It is possible to simplify the () formula in Theorem 6 which computes projection. First, we need the following definition:

A basis B for a subspace W is called an orthonormal basis if B is an orthogonal basis and every element of B is a unit vector. Any orthogonal basis can be turned into an orthonormal basis by normalizing every basis element.

Theorem 8

Let W be a subspace of Rn and B={u1,u2,up} be an orthonormal basis of W. Then, for every yRn, the projection of y onto W is

projW(y)=yu1+yu2++yup

We can also represent this formula even in a more compact form:

if U=[u1,u2,,up] then

projW(y)=UUTy()

A matrix whose columns are orthonormal, such as U, is called an orthogonal matrix. For an n×n orthogonal matrix U, besides (*), we have

UTU=In

Exercises#

Exercises

  1. Let

B={[11],[11]}

be a basis of R2.

a. Check that this is an orthogonal basis.

b. Find the coordinates of the vector

[35]

with respect to the basis B

  1. Suppose

u1=[714],
u2=[112],

W=span(u1,u2) and

y=[916].

a. Show that u1 and u2 are orthogonal.

b. What are the closest points to y in W?

c. Find the distance between W and y.

d. Convert {u1,u2} into an orthonormal basis for W.

e. Compute the projection of y onto W using the () formula given in Theorem 4.

f. Write y as a sum of two vectors one in W and the other in W.

9.6.3. The Gram–Schmidt Process#

In the previous sections, we saw the importance of working with orthogonal and orthonormal bases. In this section, we explore the Gram–Schmidt process which is a simple algorithm to construct an orthogonal or orthonormal basis from any nonzero subspace of Rn.

Idea: construction of an orthogonal basis from a basis with two elements#

To see the idea behind the Gram–Schmidt process let’s review Example 2 in Section 5.2:

Suppose y=[76] and u=[42]. y and u are linearly independent and form a basis for R2. Clearly, as shown in below figure, y and u are not orthogonal, however y and z (the component of y orthogonal to u) form an orthogonal basis for R2

# setup vectors

y = np.array([7,6])

u = np.array([4,2])

# projection of y onto u

proj_y_u = (np.dot(y,u)/ np.dot(u,u))* u
print( "The projection of y onto u is ", proj_y_u) 

z = y - proj_y_u

print("The component of y orthogonal to u is", z )


# Plot the coordinates in two separate plots

fig, ax = plt.subplots()

figsize=(10, 5)

# vector y
ax.quiver(0, 0, 7, 6, angles='xy', scale_units='xy', scale=1, color='blue', label ='y')
ax.text(7.1,6.1,'$y$')

# vector u
ax.quiver(0, 0, 4, 2, angles='xy', scale_units='xy', scale=1, color='black')
ax.text(4.1,1.6,'$u$')

# orthogonal projection p
ax.quiver(0, 0, 8, 4, angles='xy', scale_units='xy', scale=1, color='red')
ax.text(8.1,4.1,'$p$')

# vector z
ax.quiver(0, 0, -1, 2, angles='xy', scale_units='xy', scale=1, color='green')
ax.text(-1.1,2.2,'$z = y - p$')

# A copy of vector z starting at the end of vector p
ax.quiver(8, 4, -1, 2, angles='xy', scale_units='xy', scale=1, color='green')



ax.set_xlim([-4, 10])
ax.set_ylim([-4, 10])
ax.set_aspect('equal')
ax.axhline(y=0, color='black', linestyle='--', linewidth=0.5)
ax.axvline(x=0, color='black', linestyle='--', linewidth=0.5)

ax.set_title('y = p + z')


#plt.tight_layout()
plt.show()
The projection of y onto u is  [8. 4.]
The component of y orthogonal to u is [-1.  2.]
../../_images/aa2fa491c123788abb7f99edf507c333ccd23fce7e069b00448f220de3a5d7d7.png

Therefore, we have converted a basis for R2 into an orthogonal basis for R2.

Construction of an orthogonal basis from a basis with three elements#

Example 1

Let

x1=[1111],
x2=[0111],
x3=[0011],

and

W=span({x1,x2,x3}).

Construct an orthogonal basis for W.

Solution:

The set {x1,x2,x3} is a linearly independent and forms a basis for W. We will construct an orthogonal set of vectors {v1,v2,v3} such that W=span({v1,v2,v3}).

Step 1:

Define v1:=x1 and W1:=span(v1).

Step 2: We seek a vector v2 orthogonal to v1. Project x2 onto W1; the component of x2 orthogonal to the subspace W1 has the desired property:

v2=x2projW1x2v2v1

Now, let W2=span(v1,v2).

# setup vectors

x1 = np.array([1,1,1,1])

x2 = np.array([0,1,1,1])

x3 = np.array([0,0,1,1])

v1 = x1
# projection of x2 onto x1

proj_x2_v1 = (np.dot(x2,v1)/ np.dot(v1,v1))* v1
print( "The projection of x2 onto x1 is ", proj_x2_v1, '\n') 

v2 = x2 - proj_x2_v1

print("The component of x1 orthogonal to W1 is", v2 )
The projection of x2 onto x1 is  [0.75 0.75 0.75 0.75] 

The component of x1 orthogonal to W1 is [-0.75  0.25  0.25  0.25]

Step 3:

We can use a similar approach to construct v3. We find the orthogonal projection of x3 onto W2 and define v3 as the component of x3 that is orthogonal to the subspace W2:

v3=x3projW2(x3)

Since both x3 and projW2(x3) are elements of W, v3 is also in W. Moreover, by definition, v3 is orthogonal to both v1 and v2. Thus, {v1,v2,v3} forms an orthogonal basis for W.

# projection of x3 onto W2

proj_x3_W2 = (np.dot(x3,v1)/ np.dot(v1,v1))* v1  +  (np.dot(x3,v2)/ np.dot(v2,v2))* v2

print( "the projection of x3 onto W2 is ", proj_x3_W2, '\n') 

# component of y orthogonal to p
v3 = x3 - proj_x3_W2

print("the component of x3 orthogonal to W2 is", v3 )
the projection of x3 onto W2 is  [0.         0.66666667 0.66666667 0.66666667] 

the component of x3 orthogonal to W2 is [ 0.         -0.66666667  0.33333333  0.33333333]

Step 4 (Optional)

Normalizing the basis elements v1, v2, and v3, we will get unit vectors e1, e2, and e3 respectively. {e1,e2,e3} is an orthonormal basis for W.

# The unit vector of v1
e1 = v1 / np.sqrt(np.dot(v1,v1))

# The unit vector of v1
e2 = v2 / np.sqrt(np.dot(v2,v2))

# The unit vector of v1
e3 = v3 / np.sqrt(np.dot(v3,v3))

print("e1 = ", e1)
print("e2 = ", e2)
print("e3 = ", e3)
e1 =  [0.5 0.5 0.5 0.5]
e2 =  [-0.8660254   0.28867513  0.28867513  0.28867513]
e3 =  [ 0.         -0.81649658  0.40824829  0.40824829]

The Gram_Schmidt Process#

Theorem 8 (the Gram-Schmidt algorithm)

Let WRn be a non-zero subspace, and {x1,x2,,xp} be a basis for W. Define:

(9.2)#v1=x1andW1=span(v1),v2=x2projW1(x2)andW2=span({v1,v2}),vp=xpprojWp1(xp)andWp=span({v1,v2,,v3}).

Then W=Wp, and {v1,v2,,vp} is an orthogonal basis for W. Moreover, if e1,e2,,ep are the unit vectors of v1,v2,,vp respectively, then {e1,e2,,ep} is an orthonormal basis for W.

# setup A
A = np.transpose([x1, x2, x3])

# setup Q
Q = np.transpose([e1, e2, e3])

# Compute R= Q^T*A
R = np.transpose(Q) @ A
R
array([[2.00000000e+00, 1.50000000e+00, 1.00000000e+00],
       [1.11022302e-16, 8.66025404e-01, 5.77350269e-01],
       [1.11022302e-16, 1.11022302e-16, 8.16496581e-01]])

Note that elements below the diagonal in R are very small and close to zero. In fact, R should be an upper triangular matrix. The reason why Python won’t display the exact decimal numbers we expect is that some decimal fractions cannot be represented exactly as binary fractions. To address this issue, we can set very small elements of R to zero.

eps = 0.000001 
R[np.abs(R) < eps] = 0
R
array([[2.        , 1.5       , 1.        ],
       [0.        , 0.8660254 , 0.57735027],
       [0.        , 0.        , 0.81649658]])

Numerical Note

When the Gram–Schmidt process is run on a computer, round-off error can build up as the vectors vk are calculated, one by one. Specifically, for large and unequal values of k and j, the dot products vkTuj may not be sufficiently close to zero, leading to a loss of orthogonality. This loss of orthogonality can be reduced substantially by rearranging the order of the calculations.

We can also utilize numpy.linalg.qr in Python to compute the QR factorization of a matrix. This function provides an efficient and accurate implementation of the Gram-Schmidt process avoiding unnecessary loss of orthogonality.

Q, R = np.linalg.qr(A)

print('Q = \n', Q, '\n\n')

print('R = \n', R)
Q = 
 [[-0.5         0.8660254   0.        ]
 [-0.5        -0.28867513  0.81649658]
 [-0.5        -0.28867513 -0.40824829]
 [-0.5        -0.28867513 -0.40824829]] 


R = 
 [[-2.         -1.5        -1.        ]
 [ 0.         -0.8660254  -0.57735027]
 [ 0.          0.         -0.81649658]]

Exercises#

Exercises

  1. Let

x1=[3113],
x2=[5157],
x3=[1128],

and

W=span({x1,x2,x3}).

Construct an orthonormal basis for W.

  1. Let A be a matrix whose columns are x1, x2, x3 from Exercise 1:

A=[351111152378].

Find a QR decomposition for A.

  1. Let A=QR, where Q is an m×n matrix with orthogonal columns, and R is an n×n matrix. Show that if the columns of A are linearly dependent, then R cannot be invertible.

9.6.4. Least-Squares Problems#

Linear systems arising in applications are often inconsistent. In such situations, the best one can do is to find a vector x that makes Ax as close as possible to b. We think of Ax as an approximation of b. The smaller bAx, the better the approximation. Therefore, we are looking for a vector y^ such that bAx^ is as small as possible. Such y is called the least square solution of Ax=b. The name is motivated by the fact that bAx^ is the square root of a sum of squares. In this section, we explore this idea further.

Let Ax=b be inconsistent, which implies bcol(A). Note that no matter what x is, Ax lies in col(A). From Section 5.2, we know that the closest point to b in col(A) is the projection of b onto col(A) (the best approximation problem). Let b^=projcol(A)(b). Since Ax=b^ is consistent, there are x^ such that Ax^=b^. x^ is a least square solution of Ax=b. Recall that bb^ is orthogonal to col(A), and thus so is bAx^. In other words, bAx^ is orthogonal to each column of A, and we have:

AT(bAx^)=0orATAx=ATb.

The equation ATAx=ATb is called the normal equation for Ax=b.

Theorem 9

The set of least-squares solutions of Ax=b coincides with the nonempty set of solutions of the normal equation ATAx=ATb.

Example 1

Find a least-squares solution of the inconsistent system Ax=b for

A=[420211]

and

b=[2011].

Solution: First, let’s find ATA and ATb:

A = np.array([[4,0], [0,2], [1,1]])

b = np.array([[2], [0], [11]])

# compute A^TA
ATA = A.transpose() @ A 

print("A^TA = \n", ATA)

# compute A^Tb
ATb = A.transpose() @ b 

print("\n A^Tb = \n", ATb)
A^TA = 
 [[17  1]
 [ 1  5]]

 A^Tb = 
 [[19]
 [11]]

Thus, the normal equation for Ax=b is given by:

[17115][x1x2]=[1911].

To solve this equation, we can use row operations; alternatively, since ATA is invertible and 2×2, we can use the invertible matrix theorem. In many calculations, ATA is invertible, but this is not always the case. In Theorem 2, we will see when this is true. The least square solution x^ is given by:

x^=(ATA)1ATb.
# computing x_hat

x_hat = np.linalg.inv(ATA) @ ATb
x_hat
array([[1.],
       [2.]])

Example 2

Find a least-squares solution of the inconsistent system

Ax=b

for

A=[110011001010101010011001]

and

b=[310251].

Solution

Let’s set up the normal equation for Ax=b, and find a solution for it.

A = np.array([[1,1,0,0], [1,1,0,0], [1,0,1,0],[1,0,1,0], [1,0,0,1],[1,0,0,1]])

b = np.array([[-3,-1,0,2,5,1]])


# compute A^TA
ATA = A.T @ A
print('A^TA = \n', ATA)

# compute A^Tb
ATb = A.T @ b.T
print('\n A^Tb = \n', ATb)
A^TA = 
 [[6 2 2 2]
 [2 2 0 0]
 [2 0 2 0]
 [2 0 0 2]]

 A^Tb = 
 [[ 4]
 [-4]
 [ 2]
 [ 6]]

Thus, the normal equation is

[6222220020202002][x1x2x3x4]=[4426]

To solve this equation we form its augmented matrix:

#aumented 
M = np.concatenate((ATA, ATb), axis=1)
M
array([[ 6,  2,  2,  2,  4],
       [ 2,  2,  0,  0, -4],
       [ 2,  0,  2,  0,  2],
       [ 2,  0,  0,  2,  6]])

Call row operations:

# Swap two rows

def swap(matrix, row1, row2):
    
    copy_matrix=np.copy(matrix).astype('float64') 
  
    copy_matrix[row1,:] = matrix[row2,:]
    copy_matrix[row2,:] = matrix[row1,:]
    
    return copy_matrix


# Multiply all entries in a row by a nonzero number


def scale(matrix, row, scalar):
    copy_matrix=np.copy(matrix).astype('float64') 
    copy_matrix[row,:] = scalar*matrix[row,:]  
    return copy_matrix

# Replacing a row1 by the sum of itself and a multiple of rpw2 

def replace(matrix, row1, row2, scalar):
    copy_matrix=np.copy(matrix).astype('float64')
    copy_matrix[row1] = matrix[row1]+ scalar * matrix[row2] 
    return copy_matrix
M1 = swap(M, 0, 2)
M1
array([[ 2.,  0.,  2.,  0.,  2.],
       [ 2.,  2.,  0.,  0., -4.],
       [ 6.,  2.,  2.,  2.,  4.],
       [ 2.,  0.,  0.,  2.,  6.]])
M2 = scale(M1, 0, 1/2)
M2
array([[ 1.,  0.,  1.,  0.,  1.],
       [ 2.,  2.,  0.,  0., -4.],
       [ 6.,  2.,  2.,  2.,  4.],
       [ 2.,  0.,  0.,  2.,  6.]])
M3 = replace(M2, 1, 0, -2)
M3
array([[ 1.,  0.,  1.,  0.,  1.],
       [ 0.,  2., -2.,  0., -6.],
       [ 6.,  2.,  2.,  2.,  4.],
       [ 2.,  0.,  0.,  2.,  6.]])
M4 = replace(M3, 2, 0, -6)
M4
array([[ 1.,  0.,  1.,  0.,  1.],
       [ 0.,  2., -2.,  0., -6.],
       [ 0.,  2., -4.,  2., -2.],
       [ 2.,  0.,  0.,  2.,  6.]])
M5 = replace(M4, 3, 0, -2)
M5
array([[ 1.,  0.,  1.,  0.,  1.],
       [ 0.,  2., -2.,  0., -6.],
       [ 0.,  2., -4.,  2., -2.],
       [ 0.,  0., -2.,  2.,  4.]])
M6 = scale(M5, 1, 1/2)
M6
array([[ 1.,  0.,  1.,  0.,  1.],
       [ 0.,  1., -1.,  0., -3.],
       [ 0.,  2., -4.,  2., -2.],
       [ 0.,  0., -2.,  2.,  4.]])
M7 = replace(M6, 2, 1, -2)
M7
array([[ 1.,  0.,  1.,  0.,  1.],
       [ 0.,  1., -1.,  0., -3.],
       [ 0.,  0., -2.,  2.,  4.],
       [ 0.,  0., -2.,  2.,  4.]])
M8 = scale(M7, 2, -1/2)
M8
array([[ 1.,  0.,  1.,  0.,  1.],
       [ 0.,  1., -1.,  0., -3.],
       [-0., -0.,  1., -1., -2.],
       [ 0.,  0., -2.,  2.,  4.]])
M9 = replace(M8, 3, 2, 2)
M9
array([[ 1.,  0.,  1.,  0.,  1.],
       [ 0.,  1., -1.,  0., -3.],
       [-0., -0.,  1., -1., -2.],
       [ 0.,  0.,  0.,  0.,  0.]])
M10 = replace(M9, 0, 2, -1)
M10
array([[ 1.,  0.,  0.,  1.,  3.],
       [ 0.,  1., -1.,  0., -3.],
       [-0., -0.,  1., -1., -2.],
       [ 0.,  0.,  0.,  0.,  0.]])
M11 = replace(M10, 1, 2, 1)
M11
array([[ 1.,  0.,  0.,  1.,  3.],
       [ 0.,  1.,  0., -1., -5.],
       [-0., -0.,  1., -1., -2.],
       [ 0.,  0.,  0.,  0.,  0.]])

The general solution is x1=3x4, x2=5+x4, x3=2+x4, and x4 is a free parameter. So the general least-squares solution of Ax=b has the form:

x^=[3520]+x4[1110].

Any linear system Ax=b admits at least one least-squares solution (the orthogonal projection b^). For example, the least-squares solution of Ax=b in Example 1 was unique, while the linear system in Example 2 has infinitely many least-squares solutions.

The next theorem gives useful criteria for determining when there is only one least-squares solution.

Theorem 10

Let A be an m×n matrix. The following statements are equivalent:

  1. The equation Ax=b has a unique least-squares solution for each bRn.

  2. The columns of A are linearly independent.

  3. The matrix ATA is invertible.

In any of these cases, the least-squares solution x^ is given by:

x^=(ATA)1ATb.

Moreover, if A=QR is a QR-factorization of A, then the least-squares solution x^ is given by:

x^=R1QTb.()

Example 3

Let A=[135110112133] and b=[3573]. Find a least-squares solution of Ax=b.

Solution:

A QR-factorization of A can be obtained as in Section 5.3 using numpy.linalg.qr():

A = np.array([[1,3,5], [1,1,0], [1,1,2],[1,3,3]])

# QR factorization

Q, R = np.linalg.qr(A)

print('Q = \n', Q, '\n')

print('R = \n', R)
Q = 
 [[-0.5  0.5 -0.5]
 [-0.5 -0.5  0.5]
 [-0.5 -0.5 -0.5]
 [-0.5  0.5  0.5]] 

R = 
 [[-2. -4. -5.]
 [ 0.  2.  3.]
 [ 0.  0. -2.]]

Now we compute x^=R1QTb:

# setup
b = np.array([[3, 5, 7, -3]]).T

#computing x_hat

x_hat = np.linalg.inv(R) @ Q.T @ b
x_hat
array([[10.],
       [-6.],
       [ 2.]])

Numerical Note#

Since R in () is an upper triangular matrix, we can alternatively compute x^ by finding the exact solutions of:

Rx^=QTb.()

For large matrices, solving () by back-substitution or row operations is faster than computing R1 and using ().

Exercises#

Exercises

  1. Let

A=[133151112172]

and

b=[535].

a. Find a least-squares solution of Ax=b.

b. Compute the associated least-squares error bAx^.

  1. Describe all least-squares solutions of the system

x+y=2x+y=4