Menu
QuantML.orgQuantML.org
  • Linear Algebra
  • Projection
  • Least Squares(Application of projection)

    Least Squares

    (Application of projection)

    Recall our previous example, here we have a system of equation
    \(\mathbb{A}\mathbb{X}=\mathbb{Y}\)
    .
    let's take an example, suppose we have
    \(3\)
    points (in form of
    \(a_1,a_2\)
    )
    \((1,1),(2,2),(3,2)\)
    .
    Our objective is to get a best possible linear function for
    \(a_2\)
    , say that function be
    \(f\)
    .
    our function might not give exact
    \(a_2\)
    that corresponds to
    \(a_1\)
    but it will give us best possible approximation for
    \(a_2\)
    .
    The simplest linear function is
    \(a_2 = f(a_1) = x_1 + a_1 x_2\)
    .
    here
    \(x_1, x_2\)
    are our parameters (unknown)
    Our observations says,
    \(f=x_1 + 1 x_2 =1\)
    ,
    \(f=x_1 + 2 x_2 =2\)
    ,
    \(f=x_1 + 3 x_2 =2\)
    ,
    We can also write it as,
    \( \underbrace{\begin{bmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \\ \end{bmatrix}}_{\mathbb{A}} \underbrace{\begin{bmatrix} x_1\\ x_2\\ \end{bmatrix}}_{\mathbb{X}} = \underbrace{\begin{bmatrix} 1\\ 2\\ 2\\ \end{bmatrix}}_{\mathbb{Y}} \)
    .
    \(\mathbb{A}\mathbb{X}=\mathbb{Y}\)

    So we want to find the linear combinations of column vectors of
    \(\mathbb{A}\)
    that gives us
    \(\mathbb{Y}\)
    , but
    \(\mathbb{Y}\)
    does not lives in the column space of
    \(\mathbb{A}\)
    .
    So now we will find a vector
    \(\widehat{\mathbb{Y}}\)
    in the column space of
    \(\mathbb{A}\)
    that is closest to
    \(\mathbb{Y}\)
    , here closeness is determined by the Euclidean distance between
    \(\mathbb{Y}\)
    and
    \(\widehat{\mathbb{Y}}\)
    .
    So instead we will solve,
    \(\mathbb{A}\widehat{\mathbb{X}}=\widehat{\mathbb{Y}}\)

    (And
    \(\widehat{\mathbb{X}}\)
    is just a way to tell that our solution is an estimate of exact solution)
    .
    \(\widehat{\mathbb{Y}}\)
    lives in the column space of
    \(\mathbb{A}\)
    , and
    \(\mathbb{Y}\)
    is out of the column space of
    \(\mathbb{A}\)
    , so
    The vector
    \(\mathbb{Y} - \widehat{\mathbb{Y}}\)
    is perpendicular to the column space of
    \(\mathbb{A}\)
    .
    \(\Rightarrow \mathbb{Y} - \widehat{\mathbb{Y}}\)
    is in the Null space of
    \(\mathbb{A}^T\)
    .
    \(\Rightarrow \mathbb{A}^T(\mathbb{Y} - \widehat{\mathbb{Y}})=0\)
    and we know that
    \(\widehat{\mathbb{Y}}=\mathbb{A}\widehat{\mathbb{X}}\)
    .
    \(\Rightarrow \mathbb{A}^T(\mathbb{Y} - \mathbb{A}\widehat{\mathbb{X}})=0\)

    \(\Rightarrow \mathbb{A}^T\mathbb{A}\widehat{\mathbb{X}}=\mathbb{A}^T \mathbb{Y}\)

    \(\mathbb{A}^T\mathbb{A}= \begin{bmatrix} 3 & 6 \\ 6 & 14 \\ \end{bmatrix},\quad\)
    \(\mathbb{A}^T\mathbb{Y}= \begin{bmatrix} 5\\ 11\\ \end{bmatrix},\quad\)
    Now we have so solve,
    \( \begin{bmatrix} 3 & 6 \\ 6 & 14 \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} 5\\ 11\\ \end{bmatrix} \)

    We can write it as,
    \(3x_1 + 6x_2 = 5\)

    \(6x_1 + 14x_2 = 11\)

    By solving we get
    \(x_1=2/3\)
    and
    \(x_2=1/2\)

    So our function
    \(f(a_1) = x_1 + a_1 x_2\)
    becomes,
    \[f(a_1) = \frac{2}{3} + \frac{1}{2}a_1\]
    Let's take a look at our estimate for our
    \(3\)
    data points and it's error(which is
    \(a_2 - \hat{a_2}\)
    ).
  • For
    \((a_1,a_2)=(1,1)\)
  • \(\hat{a_2} = \frac{2}{3} + \frac{1}{2}(1)=\frac{7}{6}\)

    \(e_1= 1 - \frac{7}{6} = -\frac{1}{6}\)
  • For
    \((a_1,a_2)=(2,2)\)
  • \(\hat{a_2} = \frac{2}{3} + \frac{1}{2}(2)=\frac{5}{3}\)

    \(e_2= 2 - \frac{5}{3}= \frac{1}{3}\)
  • For
    \((a_1,a_2)=(3,2)\)
  • \(\hat{a_2} = \frac{2}{3} + \frac{1}{2}(3)=\frac{13}{6}\)

    \(e_3= 2 - \frac{13}{6} = -\frac{1}{6}\)
    Now represent our estimate and errors as vector,
    \( \widehat{\mathbb{Y}} = \begin{bmatrix} \frac{7}{6}\\ \frac{5}{3} \\ \frac{13}{6} \\ \end{bmatrix} ,\quad\)
    \( e=\mathbb{Y}-\widehat{\mathbb{Y}} = \begin{bmatrix} -\frac{1}{6}\\ \frac{1}{3} \\ -\frac{1}{6} \\ \end{bmatrix} \)

    As we discussed above that
    \( \widehat{\mathbb{Y}} \)
    is in column space of
    \(\mathbb{A}\)
    and
    \(\mathbb{Y}-\widehat{\mathbb{Y}}\)
    is perpendicular to that column space.
    We can now see it in this example.
    First notice that dot product of
    \(\mathbb{Y}\)
    and
    \(\mathbb{Y}-\widehat{\mathbb{Y}}\)
    is
    \(0\)
    .
    \(\mathbb{Y}\cdot(\mathbb{Y}-\widehat{\mathbb{Y}})=\mathbb{Y}^T(\mathbb{Y}-\widehat{\mathbb{Y}})=0\)

    As we said that
    \(\mathbb{Y}-\widehat{\mathbb{Y}}\)
    is perpendicular to the whole column space, you can took any linear combinations of the columns of
    \(\mathbb{A}\)
    it will be perpendicular to
    \(\mathbb{Y}-\widehat{\mathbb{Y}}\)
    .