Menu
QuantML.orgQuantML.org
  • STATISTICS
  • Gaussian Distribution
  • Python
  • Julia

Gaussian Distribution

Gaussian Distribution (a.k.a. Normal distribution, Bell curve) is perhaps the most important probability distribution in statistics. We can see it in numerous natural phenomena like: heights, weight, age, measurement error, IQ score, etc.
We had already witnessed it's importance in Central Limit Theorem.

Gaussian Distribution

Notation
\[\mathcal{N}(\mu,\sigma^2)\]
Here
\(\mu\)
and
\(\sigma^2\)
are parameters
Where
\(-\infty < \mu < +\infty\)

And
\(\sigma^2 \gt \infty\)
PDF
\[f(x)=\frac{1}{\sigma \sqrt{2\pi }} \exp \left(-\frac{(x-\mu )^2}{2 \sigma ^2}\right)\]
\(-\infty \lt x \lt \infty\)
CDF
\[F(x)=\frac{1}{\sigma \sqrt{2\pi }} \int_{-\infty}^{x}\exp \left(-\frac{(t-\mu )^2}{2 \sigma ^2}\right)dt\]
There is no closed form solution,
for CDF of a Normal distribution
Notation
\[\mathcal{N}(\mu,\sigma^2)\]
Here
\(\mu\)
and
\(\sigma^2\)
are parameters
Where
\(-\infty < \mu < +\infty\)

And
\(\sigma^2 \gt \infty\)
PDF
\[f(x)=\frac{1}{\sigma \sqrt{2\pi }} \exp \left(-\frac{(x-\mu )^2}{2 \sigma ^2}\right)\]
\(-\infty \lt x \lt \infty\)
CDF
\[F(x)=\frac{1}{\sigma \sqrt{2\pi }} \int_{-\infty}^{x}\exp \left(-\frac{(t-\mu )^2}{2 \sigma ^2}\right)dt\]
There is no closed form solution,
for CDF of a Normal distribution
Python code to plot this distribution
import numpy as np
import matplotlib.pyplot as plt
def gaussian_plot(mu: float,
                  sigma: float,
                  partitions: int = 1000):
    rvs = np.linspace(mu - 3 * sigma, mu + 3 * sigma, partitions)
    pdf = (1/(sigma*np.sqrt(2*np.pi))) * np.exp(-((rvs-mu)**2)/(2*sigma**2))
    plt.plot(rvs, pdf, label = f"N(μ = {mu}, σ = {sigma})")
    plt.legend()
    plt.show()

gaussian_plot(0,1)
gaussian_plot(10,4)

Properties of Gaussian

The Empirical Rule

The Empirical Rule
\((\text{a.k.a. }68-95-99.7\text{ rule})\)
says:
  • \(68\%\)
    of population lies within
    \(1\)
    standard deviation
    \((\sigma)\)
    from the mean
    \((\mu)\)
    .
  • \(95\%\)
    of population lies within
    \(2\)
    standard deviation
    \((\sigma)\)
    from the mean
    \((\mu)\)
    .
  • \(99.7\%\)
    of population lies within
    \(3\)
    standard deviation
    \((\sigma)\)
    from the mean
    \((\mu)\)
    .

Linear functions of Normal random variable

Gaussians are invariant under affine(linear) transformation, it means when we do a linear transformation of a Gaussian random variable, it remains a Gaussian random variable.
Like if
\(X\sim\mathcal{N}(\mu,\sigma^2)\)
and
\(Y=aX+b\)
then
\(Y\sim\mathcal{N}(a\mu+b,a^2\sigma^2)\)
Explanation
\(\mathbb{E}[Y] =\mathbb{E}[aX+b]\)

\(\mathbb{E}[Y] =\mathbb{E}[aX]+b \quad; \text{by linearity of expectation}\)

\(\mathbb{E}[Y] =a\mathbb{E}[X] +b\)

\(\mathbb{E}[Y] = a\mu+b\)

\(Var(Y) = Var(aX+b)\)

\(Var(Y) = Var(aX)\)

\(Var(Y) =a^2Var(X)\)

\(Var(Y) =a^2\sigma^2\)

Standardization (a.k.a. Normalization/Z-score)

As we know that CDF of Gaussian distribution has no close form, so we can't solve it by hand. So we need a computer to do it for us. But the problem is, there are so many Gaussian r.v. with different
\(\mu\)
and
\(\sigma^2\)
.
The solution for this is Standardization. By it we can convert any Gaussian r.v. to a Standard Gaussian r.v
\(\mathcal{N}(0,1)\)
.
If
\(X\sim\mathcal{N}(\mu,\sigma^2)\)
then:
\[X-\mu\sim\mathcal{N}(0,\sigma^2) \]
\[\frac{X-\mu}{\sigma}\sim\mathcal{N}(0,1) \]
\[\Rightarrow Z=\frac{X-\mu}{\sigma}\sim\mathcal{N}(0,1)\]
Now we can compute probabilities for any Gaussian distribution:
\[\mathbb{P}\left(u\lt X\lt v\right)=\mathbb{P}\left(\frac{u-\mu}{\sigma} \lt Z \lt \frac{v-\mu}{\sigma}\right)\]

Symmetry

Any Gaussian r.v. with mean
\((\mu)=0\)
is symmetric
If
\(X\sim \mathcal{N}(0,\sigma^2)\)
then
\(-X\)
has the same distribution as
\(X\)

\(\Rightarrow -X \sim \mathcal{N}(0,\sigma^2)\)


Gaussian Probability Table

So as we know that CDF of a gaussian has no close form, therefore we use tables to get those CDF's.
The table given below gives us CDF of a Standard Gaussian r.v.
\(\mathcal{N}(0,1)\)

CDF of
\(\mathcal{N}(0,1)=\Phi(x)\)

\[\Phi(x)=\mathbb{P}(\mathcal{N}(0,1) \leq x)=\frac{1}{\sqrt{2\pi }} \int_{-\infty}^{x}\exp \left(-\frac{t^2}{2}\right)dt\]
\(\text{Table for }\Phi(x)\)

z+0.00+0.01+0.02+0.03+0.04+0.05+0.06+0.07+0.08+0.09
0.0 0.500000.503990.507980.511970.515950.519940.523920.527900.531880.53586
0.1 0.539830.543800.547760.551720.555670.559660.563600.567490.571420.57535
0.2 0.579260.583170.587060.590950.594830.598710.602570.606420.610260.61409
0.3 0.617910.621720.625520.629300.633070.636830.640580.644310.648030.65173
0.4 0.655420.659100.662760.666400.670030.673640.677240.680820.684390.68793
0.5 0.691460.694970.698470.701940.705400.708840.712260.715660.719040.72240
0.6 0.725750.729070.732370.735650.738910.742150.745370.748570.751750.75490
0.7 0.758040.761150.764240.767300.770350.773370.776370.779350.782300.78524
0.8 0.788140.791030.793890.796730.799550.802340.805110.807850.810570.81327
0.9 0.815940.818590.821210.823810.826390.828940.831470.833980.836460.83891
1.0 0.841340.843750.846140.848490.850830.853140.855430.857690.859930.86214
1.1 0.864330.866500.868640.870760.872860.874930.876980.879000.881000.88298
1.2 0.884930.886860.888770.890650.892510.894350.896170.897960.899730.90147
1.3 0.903200.904900.906580.908240.909880.911490.913080.914660.916210.91774
1.4 0.919240.920730.922200.923640.925070.926470.927850.929220.930560.93189
1.5 0.933190.934480.935740.936990.938220.939430.940620.941790.942950.94408
1.6 0.945200.946300.947380.948450.949500.950530.951540.952540.953520.95449
1.7 0.955430.956370.957280.958180.959070.959940.960800.961640.962460.96327
1.8 0.964070.964850.965620.966380.967120.967840.968560.969260.969950.97062
1.9 0.971280.971930.972570.973200.973810.974410.975000.975580.976150.97670
2.0 0.977250.977780.978310.978820.979320.979820.980300.980770.981240.98169
2.1 0.982140.982570.983000.983410.983820.984220.984610.985000.985370.98574
2.2 0.986100.986450.986790.987130.987450.987780.988090.988400.988700.98899
2.3 0.989280.989560.989830.990100.990360.990610.990860.991110.991340.99158
2.4 0.991800.992020.992240.992450.992660.992860.993050.993240.993430.99361
2.5 0.993790.993960.994130.994300.994460.994610.994770.994920.995060.99520
2.6 0.995340.995470.995600.995730.995850.995980.996090.996210.996320.99643
2.7 0.996530.996640.996740.996830.996930.997020.997110.997200.997280.99736
2.8 0.997440.997520.997600.997670.997740.997810.997880.997950.998010.99807
2.9 0.998130.998190.998250.998310.998360.998410.998460.998510.998560.99861
3.0 0.998650.998690.998740.998780.998820.998860.998890.998930.998960.99900
3.1 0.999030.999060.999100.999130.999160.999180.999210.999240.999260.99929
3.2 0.999310.999340.999360.999380.999400.999420.999440.999460.999480.99950
3.3 0.999520.999530.999550.999570.999580.999600.999610.999620.999640.99965
3.4 0.999660.999680.999690.999700.999710.999720.999730.999740.999750.99976
3.5 0.999770.999780.999780.999790.999800.999810.999810.999820.999830.99983
3.6 0.999840.999850.999850.999860.999860.999870.999870.999880.999880.99989
3.7 0.999890.999900.999900.999900.999910.999910.999920.999920.999920.99992
3.8 0.999930.999930.999930.999940.999940.999940.999940.999950.999950.99995
3.9 0.999950.999950.999960.999960.999960.999960.999960.999960.999970.99997
4.0 0.999970.999970.999970.999970.999970.999970.999980.999980.999980.99998
Source: https://en.wikipedia.org/wiki/Standard_normal_table#Cumulative

How to read the table:
The first 2 digits represent a row and 3rd digit represents a column.

Examples

  • Say that we want to calculate
    \(\Phi(0.07)\)

    then first 2 digits "
    \(0.0\)
    " of 0.07 gives us row number 1, and 3rd digit "
    \(7\)
    " gives us column number 8, so
    \(\Phi(0.07) = 0.52790\)

    z+0.00+0.01+0.02+0.03+0.04+0.05+0.06+0.07+0.08+0.09
    0.00.500000.503990.507980.511970.515950.519940.523920.527900.531880.53586

  • Now say that we want to calculate
    \(\Phi(1.26)\)

    then first 2 digits "
    \(1.2\)
    " of 1.26 gives us row number 17, and 3rd digit "
    \(6\)
    " gives us column number 7, so
    \(\Phi(1.26) = 0.89617\)

    z+0.00+0.01+0.02+0.03+0.04+0.05+0.06+0.07+0.08+0.09
    1.2 0.884930.886860.888770.890650.892510.894350.896170.897960.899730.90147

  • Now we can also find
    \(\mathbb{P}( 0.07 \leq \mathcal{N}(0,1) \leq 1.26)\)
    :
    See solution
    \(\mathbb{P}( 0.07 \leq \mathcal{N}(0,1) \leq 1.26)=\)
    \(\mathbb{P}(\mathcal{N}(0,1) \leq 1.26)-\mathbb{P}(\mathcal{N}(0,1) \leq 0.07)\)
    \(=0.89617 - 0.5279=0.36827\)

  • Now let's calculate
    \(\Phi(-0.07) = \mathbb{P}(\mathcal{N}(0,1) \leq -0.07)\)

    See solution
    Say
    \(Z=\mathcal{N}(0,1)\)

    \(\mathbb{P}(Z \leq -0.07)=\mathbb{P}(-Z \geq 0.07)\)

    And
    \(\mathbb{P}(-Z \geq 0.07)=\mathbb{P}(Z \geq 0.07)\)
    ;by symmetry
    \(\mathbb{P}(Z \geq 0.07)= 1- \mathbb{P}(Z \leq 0.07)\)

    \(\mathbb{P}(Z \geq 0.07)= 1- \Phi(0.07)\)

    \(\mathbb{P}(Z \geq 0.07)= 1- 0.52790=0.4721\)

    \(\Rightarrow \Phi(-0.07)=0.4721\)

  • Now let's calculate
    \(\mathbb{P}(|\mathcal{N}(0,1)| \gt 0.07)\)

    See solution
    Say
    \(Z=\mathcal{N}(0,1)\)

    \(\mathbb{P}(|Z| \gt 0.07)=\mathbb{P}(Z \gt 0.07 \cup Z \lt -0.07)=\)
     
    \( \mathbb{P}(Z \gt 0.07)+\mathbb{P}(Z\lt -0.07) \)

    And
    • \(\mathbb{P}(Z \gt 0.07) =1-\mathbb{P}(Z \leq 0.07)=\)
       
      \(1-\Phi(0.07)=\)
       
      \(1 - 0.5279=0.4721\)
    • \(\mathbb{P}(Z\lt -0.07)=\Phi(-0.07)\)
      and we calculated above and
      \(\Phi(-0.07)=0.4721\)
    So:
    \(\mathbb{P}(|Z| \gt 0.07)=0.4721+0.4721=0.9442\)

    And we can say that:
    \(\mathbb{P}(|Z| \gt x)= 2(1-\Phi(x)) \)

  • Say
    \( X\sim \mathcal{N}(70,36) \)
    now calculate
    \(\mathbb{P}(X>80)\)

    See solution
    Say
    \(Z=\mathcal{N}(0,1)\)

    \( \mathbb{P}(X\gt 80) =\mathbb{P}(X-70 \gt 80-70)\)

    \(\Rightarrow \mathbb{P}(X\gt 80) =\mathbb{P}(\frac{X-70}{6} \gt \frac{80-70}{6})\)

    \(\Rightarrow \mathbb{P}(X\gt 80) =\mathbb{P}( Z \gt 1.66)\)

    • \( \mathbb{P}(Z \gt 1.66)=1 - \mathbb{P}(Z \leq 1.66) \)

      \(\Rightarrow \mathbb{P}(Z \gt 3.33) = 1-0.95154 = 0.04846 \)
    \(\Rightarrow \mathbb{P}(X\gt 80) = 0.04846 \)

  • Say
    \(X\sim \mathcal{N}(70,36)\)
    now we have to find
    \(x\)
    such that
    \(\mathbb{P}(X\leq x)=80\%\)

    Now we have to read the table backward.
    See solution
    Say
    \(Z=\mathcal{N}(0,1)\)

    \( \mathbb{P}(X\leq x) =0.8\)

    \(\Rightarrow \mathbb{P}(X-70\leq x-70) =0.8\)

    \(\Rightarrow \mathbb{P}\left(\frac{X-70}{6}\leq \frac{x-70}{6}\right) =0.8\)

    \(\Rightarrow \mathbb{P}\left(Z \leq \frac{x-70}{6}\right) =0.8\)

    \(\Rightarrow \Phi\left( \frac{x-70}{6}\right) =0.8\)

    z+0.00+0.01+0.02+0.03+0.04+0.05+0.06+0.07+0.08+0.09
    0.8 0.788140.791030.793890.796730.799550.802340.805110.807850.810570.81327
    • \(\Phi(0.85)=0.80\)
    \(\Rightarrow \frac{x-70}{6}=0.85\)

    \(\Rightarrow x=75.1\)

Quantile

Here we have to find a number
\(q_\alpha\)
or say the quantile of order
\(1- \alpha\)
of a r.v.
\(X\)
such that the CDF of
\(X\)
at
\(q_\alpha\)
is:
\(F(q_\alpha)=\mathbb{P}(X \leq q_\alpha)=1-\alpha\)

Here we are just reading the table backward, So we are just computing
\(F^{-1}(x)\)


Some important quantiles of
\(Z\sim \mathcal{N}(0,1)\)
:
\(\alpha\)
\(2.5\%\)
\(5\%\)
\(10\%\)
\(q_\alpha\)
1.961.651.28

\(\mathbb{P}(|Z| \gt 1.96)=5\%\)
Now let's see an implementation, choose your language of choice,
Python  ,  Julia
Launch Statistics App launch



Recommended Watching

The Normal Distribution (by Sir Jeremy Balka)
The Normal Distribution (by Sir Josh Starmer)
The Normal Distribution (by Simple Learning Pro)