Orthonormal Vectors
Say we have a\(m\times n\)
matrix \(Q\)
, ith columns of \(Q\)
is denotes as \(q_i\)
.\[ Q=\begin{bmatrix} \vdots & \vdots & & \vdots \\ q_1 & q_2 & \cdots & q_n \\ \vdots & \vdots & & \vdots \\ \end{bmatrix}_{m\times n} \]
And all columns vectors are perpendicular to each other, and all column vectors are of unit length\[ q_i^Tq_j=\left\{\begin{matrix} 0& \text{if }i\neq j\\ 1& \text{if }i=j \\ \end{matrix}\right. \]
equivalently we can say that \(Q^TQ=\mathcal{I}_{n\times n}\)
,\[ Q^TQ= \begin{bmatrix} \cdots q_1 \cdots\\ \cdots q_2 \cdots\\ \vdots\\ \cdots q_n \cdots\\ \end{bmatrix} \begin{bmatrix} \vdots & \vdots & & \vdots \\ q_1 & q_2 & \cdots & q_n \\ \vdots & \vdots & & \vdots \end{bmatrix} \]
\[ \Rightarrow Q^TQ= \begin{bmatrix} q_1q_1 & q_1q_2 & \cdots & q_1q_n \\ q_2q_1 & q_2q_2 & \cdots & q_2q_n \\ \vdots & \vdots & \ddots & \vdots \\ q_nq_1 & q_nq_2 & \cdots & q_nq_n \\ \end{bmatrix} \]
\[ \Rightarrow Q^TQ= \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & 0 \\ 0 & 0 & \cdots & 1 \\ \end{bmatrix} \]
So if\(Q\)is an Orthogonal matrix if\(Q\)is a square matrix
\(Q\)
is an Orthogonal matrix then,\(Q^TQ=\mathcal{I}\)
tells us that \(Q^T = Q^{-1}\)
, So,\(Q\)
is an Orthogonal matrix then\[ \begin{matrix} \displaystyle Q^T = Q^{-1} \end{matrix} \]
\( Q=\begin{bmatrix} 0 & 0 & 1\\ 1 & 0 & 0\\ 0 & 1 & 0\\ \end{bmatrix} \)here all vectors are orthogonal, unit length and it's an\(3\times 3\)square matrix so it's orthogonal matrix.\( Q=\begin{bmatrix} 1 & -2 & 2\\ 2 & -1 & -2\\ 2 & 2 & 1\\ \end{bmatrix} \)here all vectors are orthogonal, unit length and it's an\(3\times 3\)square matrix so it's orthogonal matrix.
What is the benefit of orthonormal column vector
Say we have a\(m\times n\)
matrix \(Q\)
with orthonormal column vectors .Now say we want the projection of a vector (say
\(\vec{v}\)
) onto the column space of \(Q\)
.Our projection matrix
\((P)\)
is: \(P=Q(Q^TQ)^{-1}Q^T\)
.And projection of vector
\(\vec{v}\)
onto the column space of \(Q\)
is \(P\vec{v}\)
because
\(Q\)
has orthonormal column vectors .So
\(Q^TQ=\mathcal{I}\)
.So our projection matrix
\((P)\)
becomes \(P=Q\mathcal{I}Q^T\)
, so.\((P)\)
:\[ \begin{matrix} \displaystyle P=QQ^T \end{matrix} \]
\(Q^TQ\)
)We can see the direct benefit of having a matrix with orthonormal column vectors is in least squares.
In Least squares we have equation of form\(A^TA\widehat{\mathbb{X}}=A^T\vec{v}\)and if\(A\)has orthonormal column vectors, then\(A^TA=\mathcal{I}\)so our equation becomes\(\widehat{\mathbb{X}}=A^T\vec{v}\)
No need to take\((A^TA)^{-1}\)
Now let's test the properties of a projection matrix,
\(P=P^T\)
- \(P=QQ^T\)\(P^T=(QQ^T)^T = Q^{T^T}Q^T = QQ^T=P\)
\(P=P^2\)
- \(P=QQ^T\)\(P^2=Q(Q^TQ)Q^T=Q\mathcal{I}Q^T=QQ^T\)
Gram Schmidt
Ok now we know that matrices with orthonormal column vectors are important.But if our matrix(with independent column vectors) do not have orthonormal column vectors then how to make them orthonormal column vectors?
This is where Gram Schmidt came into picture.
First let's look at the smaller picture,
Say that we have
\(2\)
vectors \(\vec{a}\in\mathbb{R}^n\)
and \(\vec{b}\in\mathbb{R}^n\)
.We have just
\(2\)
(non-parallel) vectors, and span of \(2\)
(non-parallel) vectors is just a \(2\)
dimensional plane.What we want is two orthonormal vectors(say
\(\vec{q}_a\)
and \(\vec{q}_b\)
) in this \(2\)
dimensional plane.Let's took
\(\vec{a}\)
as our first orthogonal vector, it's easy because it's only one vector.So
\[\vec{q}_a=\frac{\vec{a}}{\|\vec{a}\|}\]
Now let's find second orthonormal vectors?
IDEA is to remove it's direction along
\(\vec{q}_a\)
.So first take the projection of
\(\vec{b}\)
onto the vector space of \(\vec{q}_a\)
, say this projected \(\vec{b}\)
on \(\vec{q}_a\)
to be \(\vec{b}_p\)
.Now the vector joining
\(\vec{b}\)
and \(\vec{b}_p\)
is orthogonal to \(\vec{q}_a\)
, say this vector \(\vec{b}_o\)
.And
\(\vec{b}_p + \vec{b}_o = \vec{b}\)
so,\(\vec{b}_o = \vec{b}-\vec{b}_p\)
.Recall our projection matrix
\((P)\)
of a vector,\[\displaystyle P=\frac{\vec{v} \vec{v}^T}{\vec{v}^T\vec{v}}\]
So projection of \(\vec{b}\)
onto the vector space of \(\vec{q}_a\)
is \(\vec{b}_p=P\vec{b}\)
,\(\displaystyle P\vec{b}=\frac{\vec{q}_a \vec{q}_a^T}{\vec{q}_a^T\vec{q}_a}\vec{b}\)
.\(\displaystyle \Rightarrow \vec{b}_p =\frac{\vec{q}_a \vec{q}_a^T}{\vec{q}_a^T\vec{q}_a}\vec{b}\)
.And
\(\vec{q}_a^T\vec{q}_a=1\)
\(\displaystyle \Rightarrow \vec{b}_p =\vec{q}_a \vec{q}_a^T\vec{b} \)
.\[\vec{b}_o = \vec{b}-(\vec{q}_a\cdot\vec{b})\vec{q}_a \]And\[\vec{q}_b=\frac{\vec{b}_o}{\|\vec{b}_o\|}\]
And we know that
\(\vec{b}\)
is perpendicular to \(\vec{q}_a\)
, and if we think \(\vec{q}_a\)
as a matrix with one column then \(\vec{b}_o\)
is perpendicular to the column space of \(\vec{q}_a\)
.So
\(\vec{b}_o\)
is in the Null space of \(\vec{q}_a ^T\)
so,\(\vec{q}_a^T \vec{b}_o=0\)
, let's verify it,- \[\vec{q}_a^T \left( \vec{b}-\vec{q}_a \vec{q}_a^T\vec{b} \right) = 0\]\[\Rightarrow \vec{q}_a^T \vec{b}- (\vec{q}_a^T \vec{q}_a) \vec{q}_a^T\vec{b} = 0\]\[\Rightarrow \vec{q}_a^T \vec{b}- \vec{q}_a^T \vec{b} = 0\quad \color{green}{✓}\]
\(\vec{c}\)
.We got our two orthogonal vectors
\(\vec{q}_a\)
and \(\vec{q}_b\)
and this vector \(\vec{c}\)
is not in the vector space of \(\vec{q}_a\)
and \(\vec{q}_b\)
, in other words \(\vec{c}\)
is out of the plane spanned by vector \(\vec{q}_a\)
and \(\vec{q}_b\)
.So
\(\vec{c}\)
gives us access to the \(3\)
rd dimension, using this \(\vec{c}\)
we need to find a vector orthogonal to the vector space of \(\vec{q}_a\)
and \(\vec{q}_b\)
.IDEA: first from
\(\vec{c}\)
remove it's direction along \(\vec{q}_a\)
and then remove it's direction along \(\vec{q}_b\)
. \(\vec{q}_a\)
\(\vec{c}_{o_{a}}\)
.\(\vec{c}_{o_{a}} = \vec{c} - \frac{\vec{q}_a \vec{q}_a^T}{\vec{q}_a^T\vec{q}_a}\vec{c}\)
\(\vec{q}_a^T\vec{q}_a=1\)
\(\Rightarrow \vec{c}_{o_{a}} = \vec{c} - \vec{q}_a \vec{q}_a^T \vec{c}\)
\(\Rightarrow \vec{c}_{o_{a}} = \vec{c} - (\vec{q}_a\cdot\vec{c})\vec{q}_a\)
\(\vec{q}_a\)
\(\vec{c}_{o}\)
.\(\displaystyle \vec{c}_{o} = \left(\vec{c} - (\vec{q}_a\cdot\vec{c})\vec{q}_a\right) - \frac{\vec{q}_b \vec{q}_b^T}{\vec{q}_b^T\vec{q}_b}\vec{c}\)
\(\vec{q}_b^T\vec{q}_b=1\)
\(\displaystyle \Rightarrow \vec{c}_{o} = \left( \vec{c} - (\vec{q}_a\cdot\vec{c})\vec{q}_a\right) - (\vec{q}_b\cdot\vec{c})\vec{q}_b\)
\(\vec{q}_c=\frac{\vec{c}_o}{\|\vec{c}_o\|}\)
Now we can see a pattern here.
Say that we have
\(n\)
(independent) \(( a_1, a_2, \cdots, a_n )\)
vectors and we have to find \(n\)
orthonormal vectors \(( q_1, q_2, \cdots, q_n )\)
using these \(n\)
(independent) vectors.we can deduce a pattern above,
\[\vec{a_i}_o=\vec{a_i}-\sum_{k=1}^{i-1} (\vec{q}_k\cdot\vec{a}_i)\vec{q}_k \]
\[\vec{q}_i=\frac{\vec{a_i}_o}{\|\vec{a_i}_o\|}\]
So now we can find orthonormal vectors for any set of independent vectors.
This is The Gram Schmidt Process.