Least Squares

(Application of projection)

Recall our previous example, here we have a system of equation

\mathbb{A}\mathbb{X}=\mathbb{Y}

.
let's take an example, suppose we have

\(3\)

points (in form of

\(a_1,a_2\)

)

\((1,1),(2,2),(3,2)\)

.
Our objective is to get a best possible linear function for

\(a_2\)

, say that function be

\(f\)

.
our function might not give exact

\(a_2\)

that corresponds to

\(a_1\)

but it will give us best possible approximation for

\(a_2\)

.
The simplest linear function is

\(a_2 = f(a_1) = x_1 + a_1 x_2\)

.
here

\(x_1, x_2\)

are our parameters (unknown)
Our observations says,

\(f=x_1 + 1 x_2 =1\)

\(f=x_1 + 2 x_2 =2\)

\(f=x_1 + 3 x_2 =2\)

,
We can also write it as,

\underbrace{\begin{bmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \\ \end{bmatrix}}_{\mathbb{A}} \underbrace{\begin{bmatrix} x_1\\ x_2\\ \end{bmatrix}}_{\mathbb{X}} = \underbrace{\begin{bmatrix} 1\\ 2\\ 2\\ \end{bmatrix}}_{\mathbb{Y}}

\mathbb{A}\mathbb{X}=\mathbb{Y}

So we want to find the linear combinations of column vectors of

\mathbb{A}

that gives us

\mathbb{Y}

, but

\mathbb{Y}

does not lives in the column space of

\mathbb{A}

.
So now we will find a vector

\widehat{\mathbb{Y}}

in the column space of

\mathbb{A}

that is closest to

\mathbb{Y}

, here closeness is determined by the Euclidean distance between

\mathbb{Y}

and

\widehat{\mathbb{Y}}

.
So instead we will solve,

\mathbb{A}\widehat{\mathbb{X}}=\widehat{\mathbb{Y}}

(And $\widehat{\mathbb{X}}$ is just a way to tell that our solution is an estimate of exact solution).

\widehat{\mathbb{Y}}

lives in the column space of

\mathbb{A}

, and

\mathbb{Y}

is out of the column space of

\mathbb{A}

, so
The vector

\mathbb{Y} - \widehat{\mathbb{Y}}

is perpendicular to the column space of

\mathbb{A}

\Rightarrow \mathbb{Y} - \widehat{\mathbb{Y}}

is in the Null space of

\mathbb{A}^T

\Rightarrow \mathbb{A}^T(\mathbb{Y} - \widehat{\mathbb{Y}})=0

and we know that

\widehat{\mathbb{Y}}=\mathbb{A}\widehat{\mathbb{X}}

\Rightarrow \mathbb{A}^T(\mathbb{Y} - \mathbb{A}\widehat{\mathbb{X}})=0

\Rightarrow \mathbb{A}^T\mathbb{A}\widehat{\mathbb{X}}=\mathbb{A}^T \mathbb{Y}

\mathbb{A}^T\mathbb{A}= \begin{bmatrix} 3 & 6 \\ 6 & 14 \\ \end{bmatrix},\quad

\mathbb{A}^T\mathbb{Y}= \begin{bmatrix} 5\\ 11\\ \end{bmatrix},\quad

Now we have so solve,

\begin{bmatrix} 3 & 6 \\ 6 & 14 \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} 5\\ 11\\ \end{bmatrix}

We can write it as,

\(3x_1 + 6x_2 = 5\)

\(6x_1 + 14x_2 = 11\)

By solving we get

\(x_1=2/3\)

and

\(x_2=1/2\)

So our function

\(f(a_1) = x_1 + a_1 x_2\)

becomes,

f(a_1) = \frac{2}{3} + \frac{1}{2}a_1

Let's take a look at our estimate for our

\(3\)

data points and it's error(which is

a_2 - \hat{a_2}

For

\((a_1,a_2)=(1,1)\)

\hat{a_2} = \frac{2}{3} + \frac{1}{2}(1)=\frac{7}{6}

e_1= 1 - \frac{7}{6} = -\frac{1}{6}

For

\((a_1,a_2)=(2,2)\)

\hat{a_2} = \frac{2}{3} + \frac{1}{2}(2)=\frac{5}{3}

e_2= 2 - \frac{5}{3}= \frac{1}{3}

For

\((a_1,a_2)=(3,2)\)

\hat{a_2} = \frac{2}{3} + \frac{1}{2}(3)=\frac{13}{6}

e_3= 2 - \frac{13}{6} = -\frac{1}{6}

Now represent our estimate and errors as vector,

\widehat{\mathbb{Y}} = \begin{bmatrix} \frac{7}{6}\\ \frac{5}{3} \\ \frac{13}{6} \\ \end{bmatrix} ,\quad

e=\mathbb{Y}-\widehat{\mathbb{Y}} = \begin{bmatrix} -\frac{1}{6}\\ \frac{1}{3} \\ -\frac{1}{6} \\ \end{bmatrix}

As we discussed above that

\widehat{\mathbb{Y}}

is in column space of

\mathbb{A}

and

\mathbb{Y}-\widehat{\mathbb{Y}}

is perpendicular to that column space.
We can now see it in this example.
First notice that dot product of

\mathbb{Y}

and

\mathbb{Y}-\widehat{\mathbb{Y}}

\(0\)

\mathbb{Y}\cdot(\mathbb{Y}-\widehat{\mathbb{Y}})=\mathbb{Y}^T(\mathbb{Y}-\widehat{\mathbb{Y}})=0

As we said that

\mathbb{Y}-\widehat{\mathbb{Y}}

is perpendicular to the whole column space, you can took any linear combinations of the columns of

\mathbb{A}

it will be perpendicular to

\mathbb{Y}-\widehat{\mathbb{Y}}