10.2. OLS Linear Regression#
We begin with the familiar topic of ordinary least squares (OLS) linear regression. The table below shows an excerpt of Chicago Public School data for 2011–2012 from the Chicago Data Portal. One expects a higher average ACT score to be associated with a higher percentage of college eligibility.
School zipcode |
Average ACT Score |
College Eligibility |
---|---|---|
60605 |
25.1 |
80.7 |
60607 |
27 |
91.6 |
… |
… |
… |
60660 |
16.5 |
14.2 |
Source: Chicago Data Portal,
The figure below shows a scatterplot of

10.2.1. Least-Squares Solutions#
Consider a collection of

If the
In matrix form, this system of equations becomes
where
In general, this linear system with
10.2.2. OLS LINEAR REGRESSION OPTIMIZATION PROBLEM#
OLS Linear Regression Optimization Problem
Find
(The factor of 1/2 multiplying the sum is introduced to simplify the theoretical analysis of the loss function.)
The loss function
10.2.3. Minimizing the OLS Loss Function via Normal Equations#
Let
By choosing
Minimizing the loss function

Hence,
The last equation is called the normal equation for the system
Example 2.1.#
Example 2.1
Consider the data
Click to show
Solution.
The system is
or, in matrix form,
The normal equations are
or
The least-squares solution is therefore
10.2.4. Equivalence of Gradient-Based Optimization#
Note that the normal equations are equivalent to gradient-based minimization of the OLS linear regression loss function
The normal equations are equivalent to setting both partial derivatives of
Example 2.2.#
Example 2.2
For the data points in Example 2.1, show how gradient-based optimization of the loss function
Click to show
Solution:
The loss function
To minimize
Solving this system, we obtain
The system used to find these critical values is equivalent to:
Gradient-based optimization expressed is indeed equivalent to the normal equations.
To say that solving the normal equations is mathematically equivalent to gradient-based optimization of the OLS loss function does not imply that the normal equations offer the best numerical method for optimization of the loss function [Epperly 2022]. Beyond the scope of this Module is an assessment of different numerical approaches such as gradient descent and its variants including the Adam method [Sun et. al. 2019], matrix factorizations such as
In addition to facilitating numerical analysis, mean-centering simplifies the linear algebra approach. Mean-centered data is obtained from a collection of data points by replacing the original dataset

The variation equation states that the total variation in the y-values is the sum of the explained variation (variation in the corresponding
10.2.5. Exercises#
Exercises
Consider the data
, , . Use the normal equations to find the least-squares solution line that best fits the data.Consider the data
, , , . Use the normal equations to find the least-squares solution for the parabola that best fits the data.