9.6. Orthogonality#
import numpy as np
from matplotlib import pyplot as plt
9.6.1. Inner Products and Orthogonality#
In this section, we explore inner products and orthogonality. Inner products allow us to define intuitive geometric concepts such as length, distance, and perpendicularity in
Inner product#
Let
and
then the dot product
Example 1
Compute
Solution
We use numpy.dot()
to compute the dot product in Python:
import numpy as np
u = np.array([1, -1, 2])
v = np.array([-2, 3, 4])
uv = np.dot(u,v)
uv
3
Theorem 1 (Properties of the dot product)
Let
, and if and only if
combining (2) and (3), and using induction we can show:
Length of vectors
The dot product can be used to define the length of vectors: let
Note that this definition of length coincides with the standard notion of length in
and .A vector with a length of
is called a unit vector. For any vector , there exists a unit vector in the direction of . To obtain this vector, we first calculate the length of and then divide by its length . The resulting vector is referred to as the unit vector of , and commonly denoted as :
This process is often called normalizing a vector, as it transforms
Example 2
Let
Solution:
# (1)
u = np.array([1,2,3])
uu = np.dot(u,u)
length_u = np.sqrt(uu)
length_u
3.7416573867739413
# (2)
v = -2*u
vv = np.dot(v,v)
length_v = np.sqrt(vv)
length_v
7.483314773547883
We could also use the properties of the dot products to compute
#(3) computing the unit vector of u
e_u = u/length_u
e_u
array([0.26726124, 0.53452248, 0.80178373])
Let’s check that the length of
length_e = np.sqrt(np.dot(e_u,e_u))
length_e
1.0
Example 3
Given vectors
and
and the subspace
Solution:
Since
# unit vector of u
u = np.array([1,-1,2])
e_u = u / np.sqrt(np.dot(u,u))
# unit vector of v
v = np.array([-2, 3, 4])
e_v = v / np.sqrt(np.dot(v,v))
print("e_u = ", e_u)
print("e_v = ", e_v)
e_u = [ 0.40824829 -0.40824829 0.81649658]
e_v = [-0.37139068 0.55708601 0.74278135]
Distance in
For
Note that this definition coincides with the standard definition of distances in
Example 4
Compute the distance between vectors in Example 2.
Solution:
w = u - v
dist = np.sqrt(np.dot(w,w))
print("The distance between ", u, " and ", v, " is ", dist)
The distance between [1 2 3] and [-2 -4 -6] is 11.224972160321824
Angles and Orthogonality#
Theorem 2 (Cauchy-Schwarz inequality)
For
The Cauchy-Schwarz inequality gives us a way to define the notion of an angle between two
Suppose
Therefore, there is a unique
Example 5
Find the angle between vectors in Example 2.
Solution:
We first compute the length of
u_len = np.sqrt(np.dot(u,u))
v_len = np.sqrt(np.dot(v,v))
z = u_len * v_len
t = np.arccos(np.dot(u,v)/z)
print("The angle between u and v is ", t ," rad")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 1
----> 1 u_len = np.sqrt(np.dot(u,u))
2 v_len = np.sqrt(np.dot(v,v))
4 z = u_len * v_len
NameError: name 'u' is not defined
Two vectors
Observe that for any two vectors
The above calculation implies the famous Pythagorean Theorem:
Theorem 3 (The Pythagorean Theorem)
Two vectors
The orthogonal complement of a subspace
Let
Example 6
Let
Example 7
If
Solution:
Note that
Alternatively, we can find the solutions to the linear system:
We have:
We conclude this section with a theorem summarizing the properties of the orthogonal complement of a subspace.
Theorem 4
Let
The orthogonal complement
is also a subspace of .The sum of the dimensions of
and is equal to the dimension of the ambient space , i.e., .If
is the column space of some matrix , i.e., , then is precisely the null space of the transpose of , i.e., .
Exercises#
Exercises
Let
.a. Find a vector
that is in the opposite direction of and has a magnitude of 2.b. Find two non-parallel vectors
and which are both orthogonal toFind two non-parallel vectors
and which are both orthogonal to andLet
and
a. Find
b. Check that
If
find a basis for
9.6.2. Orthogonal Projection#
In this section, we explore the construction of orthogonal projections, which is a key step in many calculations involving orthogonality, and properties of projection maps.
Orthogonal Set#
A set of vectors
In general, an orthogonal basis makes computations easier. The following theorem demonstrates how:
Theorem 5
Let
be an orthogonal basis for a subspace
Then the weights
Example 1
Write the vector
as a linear combination of the vectors in
Solution:
Since
We can solve this vector equation for
By substituting the appropriate values of
y = np.array([6, 1, -8])
u1 = np.array([3, 1, 1])
u2 = np.array([-1, 2, 1])
u3 = np.array([-1,-4, 7])
#computing the weights:
c1 = np.dot(y,u1)/ np.dot(u1,u1)
c2 = np.dot(y,u2)/ np.dot(u2,u2)
c3 = np.dot(y,u3)/ np.dot(u3,u3)
print("c1 = ", c1, ", c2 = ", c1, ", c3 = ", c3)
c1 = 1.0 , c2 = 1.0 , c3 = -1.0
Orthogonal Projections onto -dimensional spaces#
Given a nonzero vector
A more challenging question is when we want to express
where
Simplifying the equation above, we find:
The vector
Example 2
Suppose
Compute the
Solution:
# setup vectors
y = np.array([7,6])
u = np.array([4,2])
# projection of y onto u
proj_y_u = (np.dot(y,u)/ np.dot(u,u))* u
print( "the projection of y onto u is ", proj_y_u)
z = y - proj_y_u
print("the component of y orthogonal to u is", z )
the projection of y onto u is [8. 4.]
the component of y orthogonal to u is [-1. 2.]
Let’s plot
# Plot the coordinates in two separate plots
fig, ax = plt.subplots()
figsize=(20, 10)
# vector y
ax.quiver(0, 0, 7, 6, angles='xy', scale_units='xy', scale=1, color='blue', label ='y')
ax.text(7.1,6.1,'$y$')
# vector u
ax.quiver(0, 0, 4, 2, angles='xy', scale_units='xy', scale=1, color='black')
ax.text(4.1,1.6,'$u$')
# orthogonal projection p
ax.quiver(0, 0, 8, 4, angles='xy', scale_units='xy', scale=1, color='red')
ax.text(8.1,4.1,'$p$')
# vector z
ax.quiver(0, 0, -1, 2, angles='xy', scale_units='xy', scale=1, color='green')
ax.text(-1.1,2.2,'$z = y - p$')
# A copy of vector z starting at the end of vector p
ax.quiver(8, 4, -1, 2, angles='xy', scale_units='xy', scale=1, color='green')
ax.set_xlim([-4, 10])
ax.set_ylim([-4, 10])
ax.set_aspect('equal')
ax.axhline(y=0, color='black', linestyle='--', linewidth=0.5)
ax.axvline(x=0, color='black', linestyle='--', linewidth=0.5)
ax.set_title('y = p + z')
plt.show()

Orthogonal Projections onto a general subspace#
Let’s take another look at the above Figure from Example 2. The projection of
Theorem 6
Let
where
The vector
Example 3
Suppose
Compute the
Solution:
# setup vectors
u1 = np.array([2,5,-1])
u2 = np.array([-2,1,1])
y = np.array([1,2,3])
# projection of y onto W
proj_y_W = (np.dot(y,u1)/ np.dot(u1,u1))* u1 + (np.dot(y,u2)/ np.dot(u2,u2))* u2
print( "The projection of y onto W is ", proj_y_W, '\n')
# component of y orthogonal to p
z = y - proj_y_W
print("The component of y orthogonal to u is", z )
The projection of y onto W is [-0.4 2. 0.2]
The component of y orthogonal to u is [1.4 0. 2.8]
Note that if
proj_u_W = (np.dot(u1,u1)/ np.dot(u1,u1))* u1 + (np.dot(u1,u2)/ np.dot(u2,u2))* u2
print(proj_u_W)
[ 2. 5. -1.]
The Best Approximation Theorem#
Projections can be viewed as linear transformations:
Consider a subspace
Informally, we say
Theorem 7 (The Best Approximation Theorem)
Let
In particular, if
Example 4
Suppose
and
What is the closest point to
in ?Find the distance between
and .
Solution
The closest point is the projection of
onto :
# setup vectors
y = np.array([-1, -5, 10])
u1 = np.array([5, -2, 1])
u2 = np.array([1 , 2, -1])
# projection of y onto W
p = (np.dot(y,u1)/ np.dot(u1,u1))* u1 + (np.dot(y,u2)/ np.dot(u2,u2))* u2
print( "The projection of y onto W is ", p, '\n')
# component of y orthogonal to p
z = y - p
print("The component of y orthogonal to u is", z, '\n' )
# distance between y and W
r = np.sqrt(np.dot(z,z))
print("The distance between y and W is", r )
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 3
1 # setup vectors
----> 3 y = np.array([-1, -5, 10])
5 u1 = np.array([5, -2, 1])
7 u2 = np.array([1 , 2, -1])
NameError: name 'np' is not defined
It is possible to simplify the
A basis
Theorem 8
Let
We can also represent this formula even in a more compact form:
if
A matrix whose columns are orthonormal, such as
Exercises#
Exercises
Let
be a basis of
a. Check that this is an orthogonal basis.
b. Find the coordinates of the vector
with respect to the basis
Suppose
a. Show that
b. What are the closest points to
c. Find the distance between
d. Convert
e. Compute the projection of
f. Write
9.6.3. The Gram–Schmidt Process#
In the previous sections, we saw the importance of working with orthogonal and orthonormal bases. In this section, we explore the Gram–Schmidt process which is a simple algorithm to construct an orthogonal or orthonormal basis from any nonzero subspace of
Idea: construction of an orthogonal basis from a basis with two elements#
To see the idea behind the Gram–Schmidt process let’s review Example 2 in Section 5.2:
Suppose
# setup vectors
y = np.array([7,6])
u = np.array([4,2])
# projection of y onto u
proj_y_u = (np.dot(y,u)/ np.dot(u,u))* u
print( "The projection of y onto u is ", proj_y_u)
z = y - proj_y_u
print("The component of y orthogonal to u is", z )
# Plot the coordinates in two separate plots
fig, ax = plt.subplots()
figsize=(10, 5)
# vector y
ax.quiver(0, 0, 7, 6, angles='xy', scale_units='xy', scale=1, color='blue', label ='y')
ax.text(7.1,6.1,'$y$')
# vector u
ax.quiver(0, 0, 4, 2, angles='xy', scale_units='xy', scale=1, color='black')
ax.text(4.1,1.6,'$u$')
# orthogonal projection p
ax.quiver(0, 0, 8, 4, angles='xy', scale_units='xy', scale=1, color='red')
ax.text(8.1,4.1,'$p$')
# vector z
ax.quiver(0, 0, -1, 2, angles='xy', scale_units='xy', scale=1, color='green')
ax.text(-1.1,2.2,'$z = y - p$')
# A copy of vector z starting at the end of vector p
ax.quiver(8, 4, -1, 2, angles='xy', scale_units='xy', scale=1, color='green')
ax.set_xlim([-4, 10])
ax.set_ylim([-4, 10])
ax.set_aspect('equal')
ax.axhline(y=0, color='black', linestyle='--', linewidth=0.5)
ax.axvline(x=0, color='black', linestyle='--', linewidth=0.5)
ax.set_title('y = p + z')
#plt.tight_layout()
plt.show()
The projection of y onto u is [8. 4.]
The component of y orthogonal to u is [-1. 2.]

Therefore, we have converted a basis for
Construction of an orthogonal basis from a basis with three elements#
Example 1
Let
and
Construct an orthogonal basis for
Solution:
The set
Step 1:
Define
Step 2: We seek a vector
Now, let
# setup vectors
x1 = np.array([1,1,1,1])
x2 = np.array([0,1,1,1])
x3 = np.array([0,0,1,1])
v1 = x1
# projection of x2 onto x1
proj_x2_v1 = (np.dot(x2,v1)/ np.dot(v1,v1))* v1
print( "The projection of x2 onto x1 is ", proj_x2_v1, '\n')
v2 = x2 - proj_x2_v1
print("The component of x1 orthogonal to W1 is", v2 )
The projection of x2 onto x1 is [0.75 0.75 0.75 0.75]
The component of x1 orthogonal to W1 is [-0.75 0.25 0.25 0.25]
Step 3:
We can use a similar approach to construct
Since both
# projection of x3 onto W2
proj_x3_W2 = (np.dot(x3,v1)/ np.dot(v1,v1))* v1 + (np.dot(x3,v2)/ np.dot(v2,v2))* v2
print( "the projection of x3 onto W2 is ", proj_x3_W2, '\n')
# component of y orthogonal to p
v3 = x3 - proj_x3_W2
print("the component of x3 orthogonal to W2 is", v3 )
the projection of x3 onto W2 is [0. 0.66666667 0.66666667 0.66666667]
the component of x3 orthogonal to W2 is [ 0. -0.66666667 0.33333333 0.33333333]
Step 4 (Optional)
Normalizing the basis elements
# The unit vector of v1
e1 = v1 / np.sqrt(np.dot(v1,v1))
# The unit vector of v1
e2 = v2 / np.sqrt(np.dot(v2,v2))
# The unit vector of v1
e3 = v3 / np.sqrt(np.dot(v3,v3))
print("e1 = ", e1)
print("e2 = ", e2)
print("e3 = ", e3)
e1 = [0.5 0.5 0.5 0.5]
e2 = [-0.8660254 0.28867513 0.28867513 0.28867513]
e3 = [ 0. -0.81649658 0.40824829 0.40824829]
The Gram_Schmidt Process#
Theorem 8 (the Gram-Schmidt algorithm)
Let
Then
# setup A
A = np.transpose([x1, x2, x3])
# setup Q
Q = np.transpose([e1, e2, e3])
# Compute R= Q^T*A
R = np.transpose(Q) @ A
R
array([[2.00000000e+00, 1.50000000e+00, 1.00000000e+00],
[1.11022302e-16, 8.66025404e-01, 5.77350269e-01],
[1.11022302e-16, 1.11022302e-16, 8.16496581e-01]])
Note that elements below the diagonal in
eps = 0.000001
R[np.abs(R) < eps] = 0
R
array([[2. , 1.5 , 1. ],
[0. , 0.8660254 , 0.57735027],
[0. , 0. , 0.81649658]])
Numerical Note
When the Gram–Schmidt process is run on a computer, round-off error can build up as the vectors
We can also utilize numpy.linalg.qr
in Python to compute the QR factorization of a matrix. This function provides an efficient and accurate implementation of the Gram-Schmidt process avoiding unnecessary loss of orthogonality.
Q, R = np.linalg.qr(A)
print('Q = \n', Q, '\n\n')
print('R = \n', R)
Q =
[[-0.5 0.8660254 0. ]
[-0.5 -0.28867513 0.81649658]
[-0.5 -0.28867513 -0.40824829]
[-0.5 -0.28867513 -0.40824829]]
R =
[[-2. -1.5 -1. ]
[ 0. -0.8660254 -0.57735027]
[ 0. 0. -0.81649658]]
Exercises#
Exercises
Let
and
Construct an orthonormal basis for
Let
be a matrix whose columns are , , from Exercise 1:
Find a QR decomposition for
Let
, where is an matrix with orthogonal columns, and is an matrix. Show that if the columns of are linearly dependent, then cannot be invertible.
9.6.4. Least-Squares Problems#
Linear systems arising in applications are often inconsistent. In such situations, the best one can do is to find a vector
Let
The equation
Theorem 9
The set of least-squares solutions of
Example 1
Find a least-squares solution of the inconsistent system
and
Solution: First, let’s find
A = np.array([[4,0], [0,2], [1,1]])
b = np.array([[2], [0], [11]])
# compute A^TA
ATA = A.transpose() @ A
print("A^TA = \n", ATA)
# compute A^Tb
ATb = A.transpose() @ b
print("\n A^Tb = \n", ATb)
A^TA =
[[17 1]
[ 1 5]]
A^Tb =
[[19]
[11]]
Thus, the normal equation for
To solve this equation, we can use row operations; alternatively, since
# computing x_hat
x_hat = np.linalg.inv(ATA) @ ATb
x_hat
array([[1.],
[2.]])
Example 2
Find a least-squares solution of the inconsistent system
for
and
Solution
Let’s set up the normal equation for
A = np.array([[1,1,0,0], [1,1,0,0], [1,0,1,0],[1,0,1,0], [1,0,0,1],[1,0,0,1]])
b = np.array([[-3,-1,0,2,5,1]])
# compute A^TA
ATA = A.T @ A
print('A^TA = \n', ATA)
# compute A^Tb
ATb = A.T @ b.T
print('\n A^Tb = \n', ATb)
A^TA =
[[6 2 2 2]
[2 2 0 0]
[2 0 2 0]
[2 0 0 2]]
A^Tb =
[[ 4]
[-4]
[ 2]
[ 6]]
Thus, the normal equation is
To solve this equation we form its augmented matrix:
#aumented
M = np.concatenate((ATA, ATb), axis=1)
M
array([[ 6, 2, 2, 2, 4],
[ 2, 2, 0, 0, -4],
[ 2, 0, 2, 0, 2],
[ 2, 0, 0, 2, 6]])
Call row operations:
# Swap two rows
def swap(matrix, row1, row2):
copy_matrix=np.copy(matrix).astype('float64')
copy_matrix[row1,:] = matrix[row2,:]
copy_matrix[row2,:] = matrix[row1,:]
return copy_matrix
# Multiply all entries in a row by a nonzero number
def scale(matrix, row, scalar):
copy_matrix=np.copy(matrix).astype('float64')
copy_matrix[row,:] = scalar*matrix[row,:]
return copy_matrix
# Replacing a row1 by the sum of itself and a multiple of rpw2
def replace(matrix, row1, row2, scalar):
copy_matrix=np.copy(matrix).astype('float64')
copy_matrix[row1] = matrix[row1]+ scalar * matrix[row2]
return copy_matrix
M1 = swap(M, 0, 2)
M1
array([[ 2., 0., 2., 0., 2.],
[ 2., 2., 0., 0., -4.],
[ 6., 2., 2., 2., 4.],
[ 2., 0., 0., 2., 6.]])
M2 = scale(M1, 0, 1/2)
M2
array([[ 1., 0., 1., 0., 1.],
[ 2., 2., 0., 0., -4.],
[ 6., 2., 2., 2., 4.],
[ 2., 0., 0., 2., 6.]])
M3 = replace(M2, 1, 0, -2)
M3
array([[ 1., 0., 1., 0., 1.],
[ 0., 2., -2., 0., -6.],
[ 6., 2., 2., 2., 4.],
[ 2., 0., 0., 2., 6.]])
M4 = replace(M3, 2, 0, -6)
M4
array([[ 1., 0., 1., 0., 1.],
[ 0., 2., -2., 0., -6.],
[ 0., 2., -4., 2., -2.],
[ 2., 0., 0., 2., 6.]])
M5 = replace(M4, 3, 0, -2)
M5
array([[ 1., 0., 1., 0., 1.],
[ 0., 2., -2., 0., -6.],
[ 0., 2., -4., 2., -2.],
[ 0., 0., -2., 2., 4.]])
M6 = scale(M5, 1, 1/2)
M6
array([[ 1., 0., 1., 0., 1.],
[ 0., 1., -1., 0., -3.],
[ 0., 2., -4., 2., -2.],
[ 0., 0., -2., 2., 4.]])
M7 = replace(M6, 2, 1, -2)
M7
array([[ 1., 0., 1., 0., 1.],
[ 0., 1., -1., 0., -3.],
[ 0., 0., -2., 2., 4.],
[ 0., 0., -2., 2., 4.]])
M8 = scale(M7, 2, -1/2)
M8
array([[ 1., 0., 1., 0., 1.],
[ 0., 1., -1., 0., -3.],
[-0., -0., 1., -1., -2.],
[ 0., 0., -2., 2., 4.]])
M9 = replace(M8, 3, 2, 2)
M9
array([[ 1., 0., 1., 0., 1.],
[ 0., 1., -1., 0., -3.],
[-0., -0., 1., -1., -2.],
[ 0., 0., 0., 0., 0.]])
M10 = replace(M9, 0, 2, -1)
M10
array([[ 1., 0., 0., 1., 3.],
[ 0., 1., -1., 0., -3.],
[-0., -0., 1., -1., -2.],
[ 0., 0., 0., 0., 0.]])
M11 = replace(M10, 1, 2, 1)
M11
array([[ 1., 0., 0., 1., 3.],
[ 0., 1., 0., -1., -5.],
[-0., -0., 1., -1., -2.],
[ 0., 0., 0., 0., 0.]])
The general solution is
Any linear system
The next theorem gives useful criteria for determining when there is only one least-squares solution.
Theorem 10
Let
The equation
has a unique least-squares solution for each .The columns of
are linearly independent.The matrix
is invertible.
In any of these cases, the least-squares solution
Moreover, if
Example 3
Let
Solution:
A QR-factorization of numpy.linalg.qr()
:
A = np.array([[1,3,5], [1,1,0], [1,1,2],[1,3,3]])
# QR factorization
Q, R = np.linalg.qr(A)
print('Q = \n', Q, '\n')
print('R = \n', R)
Q =
[[-0.5 0.5 -0.5]
[-0.5 -0.5 0.5]
[-0.5 -0.5 -0.5]
[-0.5 0.5 0.5]]
R =
[[-2. -4. -5.]
[ 0. 2. 3.]
[ 0. 0. -2.]]
Now we compute
# setup
b = np.array([[3, 5, 7, -3]]).T
#computing x_hat
x_hat = np.linalg.inv(R) @ Q.T @ b
x_hat
array([[10.],
[-6.],
[ 2.]])
Numerical Note#
Since
For large matrices, solving
Exercises#
Exercises
Let
and
a. Find a least-squares solution of
b. Compute the associated least-squares error
Describe all least-squares solutions of the system