Orthonormal Vectors

Say we have a

m\times n

matrix

\(Q\)

, ith columns of

\(Q\)

is denotes as

\(q_i\)

Q=\begin{bmatrix} \vdots & \vdots & & \vdots \\ q_1 & q_2 & \cdots & q_n \\ \vdots & \vdots & & \vdots \\ \end{bmatrix}_{m\times n}

And all columns vectors are perpendicular to each other, and all column vectors are of unit length

q_i^Tq_j=\left\{\begin{matrix} 0& \text{if }i\neq j\\ 1& \text{if }i=j \\ \end{matrix}\right.

equivalently we can say that

Q^TQ=\mathcal{I}_{n\times n}

Q^TQ= \begin{bmatrix} \cdots q_1 \cdots\\ \cdots q_2 \cdots\\ \vdots\\ \cdots q_n \cdots\\ \end{bmatrix} \begin{bmatrix} \vdots & \vdots & & \vdots \\ q_1 & q_2 & \cdots & q_n \\ \vdots & \vdots & & \vdots \end{bmatrix}

\Rightarrow Q^TQ= \begin{bmatrix} q_1q_1 & q_1q_2 & \cdots & q_1q_n \\ q_2q_1 & q_2q_2 & \cdots & q_2q_n \\ \vdots & \vdots & \ddots & \vdots \\ q_nq_1 & q_nq_2 & \cdots & q_nq_n \\ \end{bmatrix}

\Rightarrow Q^TQ= \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & 0 \\ 0 & 0 & \cdots & 1 \\ \end{bmatrix}

$$Q$$ is an Orthogonal matrix if $$Q$$ is a square matrix

So if

\(Q\)

is an Orthogonal matrix then,

Q^TQ=\mathcal{I}

tells us that

Q^T = Q^{-1}

, So,

\(Q\)

is an Orthogonal matrix then

\begin{matrix} \displaystyle Q^T = Q^{-1} \end{matrix}

Examples of matrices with orthogonal vectors,

$Q=\begin{bmatrix} 0 & 0 & 1\\ 1 & 0 & 0\\ 0 & 1 & 0\\ \end{bmatrix}$ here all vectors are orthogonal, unit length and it's an $3\times 3$ square matrix so it's orthogonal matrix.

$Q=\begin{bmatrix} 1 & -2 & 2\\ 2 & -1 & -2\\ 2 & 2 & 1\\ \end{bmatrix}$ here all vectors are orthogonal, unit length and it's an $3\times 3$ square matrix so it's orthogonal matrix.

What is the benefit of orthonormal column vector

Say we have a

m\times n

matrix

\(Q\)

with orthonormal column vectors .
Now say we want the projection of a vector (say

\vec{v}

) onto the column space of

\(Q\)

.
Our projection matrix $$(P)$$ is:

P=Q(Q^TQ)^{-1}Q^T

.
And projection of vector

\vec{v}

onto the column space of

\(Q\)

P\vec{v}

because

\(Q\)

has orthonormal column vectors .
So

Q^TQ=\mathcal{I}

.
So our projection matrix $$(P)$$ becomes

P=Q\mathcal{I}Q^T

, so.

Projection Matrix $$(P)$$ :

\begin{matrix} \displaystyle P=QQ^T \end{matrix}

(So now we don't have to take the inverse is $$Q^TQ$$ )
We can see the direct benefit of having a matrix with orthonormal column vectors is in least squares.

In Least squares we have equation of form $A^TA\widehat{\mathbb{X}}=A^T\vec{v}$ and if $$A$$ has orthonormal column vectors, then $A^TA=\mathcal{I}$ so our equation becomes $\widehat{\mathbb{X}}=A^T\vec{v}$
No need to take $(A^TA)^{-1}$

Now let's test the properties of a projection matrix,

\(P=P^T\)

\(P=QQ^T\)

P^T=(QQ^T)^T = Q^{T^T}Q^T = QQ^T=P

\(P=P^2\)

\(P=QQ^T\)

P^2=Q(Q^TQ)Q^T=Q\mathcal{I}Q^T=QQ^T

Gram Schmidt

Ok now we know that matrices with orthonormal column vectors are important.
But if our matrix(with independent column vectors) do not have orthonormal column vectors then how to make them orthonormal column vectors?
This is where Gram Schmidt came into picture.

First let's look at the smaller picture,
Say that we have

\(2\)

vectors

\vec{a}\in\mathbb{R}^n

and

\vec{b}\in\mathbb{R}^n

.
We have just

\(2\)

(non-parallel) vectors, and span of

\(2\)

(non-parallel) vectors is just a

\(2\)

dimensional plane.
What we want is two orthonormal vectors(say

\vec{q}_a

and

\vec{q}_b

) in this

\(2\)

dimensional plane.
Let's took

\vec{a}

as our first orthogonal vector, it's easy because it's only one vector.
So

$\vec{q}_a=\frac{\vec{a}}{\|\vec{a}\|}$

Now let's find second orthonormal vectors?
IDEA is to remove it's direction along

\vec{q}_a

.
So first take the projection of

\vec{b}

onto the vector space of

\vec{q}_a

, say this projected

\vec{b}

\vec{q}_a

to be

\vec{b}_p

.
Now the vector joining

\vec{b}

and

\vec{b}_p

is orthogonal to

\vec{q}_a

, say this vector

\vec{b}_o

.
And

\vec{b}_p + \vec{b}_o = \vec{b}

so,

\vec{b}_o = \vec{b}-\vec{b}_p

.
Recall our projection matrix

\((P)\)

of a vector,

\displaystyle P=\frac{\vec{v} \vec{v}^T}{\vec{v}^T\vec{v}}

So projection of

\vec{b}

onto the vector space of

\vec{q}_a

\vec{b}_p=P\vec{b}

\displaystyle P\vec{b}=\frac{\vec{q}_a \vec{q}_a^T}{\vec{q}_a^T\vec{q}_a}\vec{b}

\displaystyle \Rightarrow \vec{b}_p =\frac{\vec{q}_a \vec{q}_a^T}{\vec{q}_a^T\vec{q}_a}\vec{b}

.
And

\vec{q}_a^T\vec{q}_a=1

\displaystyle \Rightarrow \vec{b}_p =\vec{q}_a \vec{q}_a^T\vec{b}

$\vec{b}_o = \vec{b}-(\vec{q}_a\cdot\vec{b})\vec{q}_a$ And $\vec{q}_b=\frac{\vec{b}_o}{\|\vec{b}_o\|}$

And we know that

\vec{b}

is perpendicular to

\vec{q}_a

, and if we think

\vec{q}_a

as a matrix with one column then

\vec{b}_o

is perpendicular to the column space of

\vec{q}_a

.
So

\vec{b}_o

is in the Null space of

\vec{q}_a ^T

so,

\vec{q}_a^T \vec{b}_o=0

, let's verify it,

\vec{q}_a^T \left( \vec{b}-\vec{q}_a \vec{q}_a^T\vec{b} \right) = 0

\Rightarrow \vec{q}_a^T \vec{b}- (\vec{q}_a^T \vec{q}_a) \vec{q}_a^T\vec{b} = 0

\Rightarrow \vec{q}_a^T \vec{b}- \vec{q}_a^T \vec{b} = 0\quad \color{green}{✓}

Now let's push ourselves a little further now add another vector

\vec{c}

.
We got our two orthogonal vectors

\vec{q}_a

and

\vec{q}_b

and this vector

\vec{c}

is not in the vector space of

\vec{q}_a

and

\vec{q}_b

, in other words

\vec{c}

is out of the plane spanned by vector

\vec{q}_a

and

\vec{q}_b

.
So

\vec{c}

gives us access to the

\(3\)

rd dimension, using this

\vec{c}

we need to find a vector orthogonal to the vector space of

\vec{q}_a

and

\vec{q}_b

.
IDEA: first from

\vec{c}

remove it's direction along

\vec{q}_a

and then remove it's direction along

\vec{q}_b

First remove it's direction along

\vec{q}_a

Let's call this vector

\vec{c}_{o_{a}}

\vec{c}_{o_{a}} = \vec{c} - \frac{\vec{q}_a \vec{q}_a^T}{\vec{q}_a^T\vec{q}_a}\vec{c}

\vec{q}_a^T\vec{q}_a=1

\Rightarrow \vec{c}_{o_{a}} = \vec{c} - \vec{q}_a \vec{q}_a^T \vec{c}

\Rightarrow \vec{c}_{o_{a}} = \vec{c} - (\vec{q}_a\cdot\vec{c})\vec{q}_a

Now remove it's direction along

\vec{q}_a

This is our vector

\vec{c}_{o}

\displaystyle \vec{c}_{o} = \left(\vec{c} - (\vec{q}_a\cdot\vec{c})\vec{q}_a\right) - \frac{\vec{q}_b \vec{q}_b^T}{\vec{q}_b^T\vec{q}_b}\vec{c}

\vec{q}_b^T\vec{q}_b=1

\displaystyle \Rightarrow \vec{c}_{o} = \left( \vec{c} - (\vec{q}_a\cdot\vec{c})\vec{q}_a\right) - (\vec{q}_b\cdot\vec{c})\vec{q}_b

\vec{q}_c=\frac{\vec{c}_o}{\|\vec{c}_o\|}

Now we can see a pattern here.
Say that we have

\(n\)

(independent)

( a_1, a_2, \cdots, a_n )

vectors and we have to find

\(n\)

orthonormal vectors

( q_1, q_2, \cdots, q_n )

using these

\(n\)

(independent) vectors.
we can deduce a pattern above,

\vec{a_i}_o=\vec{a_i}-\sum_{k=1}^{i-1} (\vec{q}_k\cdot\vec{a}_i)\vec{q}_k

\vec{q}_i=\frac{\vec{a_i}_o}{\|\vec{a_i}_o\|}

So now we can find orthonormal vectors for any set of independent vectors.
This is The Gram Schmidt Process.