Least Squares
(Application of projection)
Recall our previous example, here we have a system of equation\(\mathbb{A}\mathbb{X}=\mathbb{Y}\)
.let's take an example, suppose we have
\(3\)
points (in form of \(a_1,a_2\)
) \((1,1),(2,2),(3,2)\)
.Our objective is to get a best possible linear function for
\(a_2\)
, say that function be \(f\)
.our function might not give exact
\(a_2\)
that corresponds to \(a_1\)
but it will give us best possible approximation for \(a_2\)
.The simplest linear function is
\(a_2 = f(a_1) = x_1 + a_1 x_2\)
.here
\(x_1, x_2\)
are our parameters (unknown)Our observations says,
\(f=x_1 + 1 x_2 =1\)
,\(f=x_1 + 2 x_2 =2\)
,\(f=x_1 + 3 x_2 =2\)
,We can also write it as,
\( \underbrace{\begin{bmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \\ \end{bmatrix}}_{\mathbb{A}} \underbrace{\begin{bmatrix} x_1\\ x_2\\ \end{bmatrix}}_{\mathbb{X}} = \underbrace{\begin{bmatrix} 1\\ 2\\ 2\\ \end{bmatrix}}_{\mathbb{Y}} \)
.\(\mathbb{A}\mathbb{X}=\mathbb{Y}\)
So we want to find the linear combinations of column vectors of
\(\mathbb{A}\)
that gives us \(\mathbb{Y}\)
, but \(\mathbb{Y}\)
does not lives in the column space of \(\mathbb{A}\)
.So now we will find a vector
\(\widehat{\mathbb{Y}}\)
in the column space of \(\mathbb{A}\)
that is closest to \(\mathbb{Y}\)
, here closeness is determined by the Euclidean distance between \(\mathbb{Y}\)
and \(\widehat{\mathbb{Y}}\)
.So instead we will solve,
\(\mathbb{A}\widehat{\mathbb{X}}=\widehat{\mathbb{Y}}\)
(And
\(\widehat{\mathbb{X}}\)
is just a way to tell that our solution is an estimate of exact solution).\(\widehat{\mathbb{Y}}\)
lives in the column space of \(\mathbb{A}\)
, and \(\mathbb{Y}\)
is out of the column space of \(\mathbb{A}\)
, soThe vector
\(\mathbb{Y} - \widehat{\mathbb{Y}}\)
is perpendicular to the column space of \(\mathbb{A}\)
.\(\Rightarrow \mathbb{Y} - \widehat{\mathbb{Y}}\)
is in the Null space of \(\mathbb{A}^T\)
.\(\Rightarrow \mathbb{A}^T(\mathbb{Y} - \widehat{\mathbb{Y}})=0\)
and we know that \(\widehat{\mathbb{Y}}=\mathbb{A}\widehat{\mathbb{X}}\)
.\(\Rightarrow \mathbb{A}^T(\mathbb{Y} - \mathbb{A}\widehat{\mathbb{X}})=0\)
\(\Rightarrow \mathbb{A}^T\mathbb{A}\widehat{\mathbb{X}}=\mathbb{A}^T \mathbb{Y}\)
\(\mathbb{A}^T\mathbb{A}= \begin{bmatrix} 3 & 6 \\ 6 & 14 \\ \end{bmatrix},\quad\)
\(\mathbb{A}^T\mathbb{Y}= \begin{bmatrix} 5\\ 11\\ \end{bmatrix},\quad\)
Now we have so solve,\( \begin{bmatrix} 3 & 6 \\ 6 & 14 \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} 5\\ 11\\ \end{bmatrix} \)
We can write it as,
\(3x_1 + 6x_2 = 5\)
\(6x_1 + 14x_2 = 11\)
By solving we get
\(x_1=2/3\)
and \(x_2=1/2\)
So our function
\(f(a_1) = x_1 + a_1 x_2\)
becomes, \[f(a_1) = \frac{2}{3} + \frac{1}{2}a_1\]
Let's take a look at our estimate for our \(3\)
data points and it's error(which is \(a_2 - \hat{a_2}\)
).\((a_1,a_2)=(1,1)\)
- \(\hat{a_2} = \frac{2}{3} + \frac{1}{2}(1)=\frac{7}{6}\)\(e_1= 1 - \frac{7}{6} = -\frac{1}{6}\)
\((a_1,a_2)=(2,2)\)
- \(\hat{a_2} = \frac{2}{3} + \frac{1}{2}(2)=\frac{5}{3}\)\(e_2= 2 - \frac{5}{3}= \frac{1}{3}\)
\((a_1,a_2)=(3,2)\)
- \(\hat{a_2} = \frac{2}{3} + \frac{1}{2}(3)=\frac{13}{6}\)\(e_3= 2 - \frac{13}{6} = -\frac{1}{6}\)
\( \widehat{\mathbb{Y}} = \begin{bmatrix} \frac{7}{6}\\ \frac{5}{3} \\ \frac{13}{6} \\ \end{bmatrix} ,\quad\)
\( e=\mathbb{Y}-\widehat{\mathbb{Y}} = \begin{bmatrix} -\frac{1}{6}\\ \frac{1}{3} \\ -\frac{1}{6} \\ \end{bmatrix} \)
As we discussed above that
\( \widehat{\mathbb{Y}} \)
is in column space of \(\mathbb{A}\)
and \(\mathbb{Y}-\widehat{\mathbb{Y}}\)
is perpendicular to that column space.We can now see it in this example.
First notice that dot product of
\(\mathbb{Y}\)
and \(\mathbb{Y}-\widehat{\mathbb{Y}}\)
is \(0\)
.\(\mathbb{Y}\cdot(\mathbb{Y}-\widehat{\mathbb{Y}})=\mathbb{Y}^T(\mathbb{Y}-\widehat{\mathbb{Y}})=0\)
As we said that
\(\mathbb{Y}-\widehat{\mathbb{Y}}\)
is perpendicular to the whole column space, you can took any linear combinations of the columns of \(\mathbb{A}\)
it will be perpendicular to \(\mathbb{Y}-\widehat{\mathbb{Y}}\)
.