Menu
QuantML.orgQuantML.org
  • STATISTICS
  • Central Limit Theorem
  • Python
  • Julia

Central Limit Theorem

Visualize Central Limit Theorem launch
Continuing previous example, we discussed how Weak Law of Large Numbers helps us getting the average height of students. Remember that the average height we calculated
\((\overline{X}_n)\)
is a random variable, so it must have an underlying probability distribution and we don't know what that distribution is. Central Limit Theorem helps us to approximate that unknown distribution. Once we know the underlying distribution we can answer a lot of questions.

Introduction

Say we have a large population and we took a sample out of that population, and we want to study the distribution of the average of some specific property of that sample.
For example say we want to study the average height of students in our class.
Central Limit Theorem(CLT) helps us in this. CLT says:
No matter what distribution(with finite mean and variance) our population follows, as we increase sample size, Sampling distribution of the mean converges to a Normal distribution.

Sampling Distribution of the Mean

Say you collected multiple samples (of same sample size) out of the population, then you take the average of all of those samples, and then you plot a histogram of those sample averages. This histogram is what we are referring as Sampling distribution of the mean.
Fortunately, we don't need multiple samples. CLT helps us to approximate sampling distribution with just one sample.
We only have to take care that our sample size is sufficiently large.
Now the question arises, how large should our sample size be (say
\(n\)
)?
Rule of thumb: If our distribution is symmetric around mean then
\(n\geq 30\)
is sufficient to apply Central Limit Theorem.

But remember it's just a rule of thumb!

The more the true distribution varies from the Normal distribution, the larger sample size is required.
If the distribution is not symmetric around mean, then
\(n\geq 30\)
might not be near to sufficient!
What you can do is, plot the CDF of your data and superimpose CDF of corresponding Normal distribution, and see that do they superimpose nicely? (We have covered it in our Python / Julia simulation.)
Definition of nicely is up to you, how much error are you willing to accept.
You can also use some statistical test to check if our Sampling distribution matches the Normal distribution (like Kolmogorov–Smirnov test). We will cover these tests in this guide.

Convergence of Sampling Distribution

Now let's say we have a population and we draw
\(n\)
random I.I.D. observations
\(X_1,X_2,\cdots,X_n\)
from it.
These
\(n\)
I.I.D. observations are result of some random process with unknown distribution. This random process has a finite mean say
\((\mu)\)
and a finite variance say
\((\sigma^2)\)
.
\(\mathbb{E}[X]=\mu\)
and
\(Var(X)=\sigma^2\)

Estimator we used for
\(\mu\)
is
\(\overline{X}_n=\frac{1}{n}\left(X_1+X_2+\cdots+X_n\right)\)
.
And according to Weak Law of Large Numbers
\(\overline{X}_n \xrightarrow [n\to \infty ] {\mathbb{P}} \mu\)
\(\displaystyle\text{Var}(\overline{X}_n)=\frac{\sigma^2}{n}\)
\(\displaystyle\text{Var}(\overline{X}_n)=\text{Var}\left(\frac{1}{n}\left(X_1+X_2+\cdots+X_n\right)\right)\)
\(\displaystyle\text{Var}(\overline{X}_n) = \frac{1}{n^2}(\text{Var}(X_1) + \text{Var}(X_2) + \cdots + \text{Var}(X_n))\)
\(\displaystyle\text{Var}(\overline{X}_n) = \frac{1}{n^2}(\underbrace{\sigma^2 + \cdots + \sigma^2}_{\text{n times}} )\)
\(\displaystyle\text{Var}(\overline{X}_n) = \frac{1}{n^2}(n\sigma^2)\)

\(\displaystyle\text{Var}(\overline{X}_n)=\frac{\sigma^2}{n}\)
Here we can see as
\(n\to \infty\)
,
\(\text{Var}(\overline{X}_n)\to 0\)
, then as a consequence, the probability distribution of
\(\overline{X}_n\)
is highly concentrated in an arbitrarily small interval around mean
\(\mu\)
. So this probability distribution doesn't help us in any way, as it's totally concentrated around one number
\(\mu\)
.

  • Now let's look at the distribution of
    \(\sqrt{n}\ \overline{X}_n\)
    .
  • \(\displaystyle\text{Var}(\sqrt{n}\overline{X}_n)=\sigma^2\)
    for the distribution of
    \(\sqrt{n}\ \overline{X}_n\)
    , variance remains unchanged.
    \(\displaystyle\mathbb{E}[\sqrt{n}\overline{X}_n]=\sqrt{n}\mu\)
    but
    \(\mathbb{E}[\sqrt{n}\overline{X}_n] \xrightarrow [n\to \infty ]{} \infty\)
    , so let's center the distribution around
    \(0\)
    .

  • Now let's look at the distribution of
    \(\sqrt{n}\ (\overline{X}_n - \mu)\)
    .
  • \(\displaystyle\text{Var}\left(\sqrt{n}\ (\overline{X}_n - \mu)\right)=\sigma^2\)
    \(\displaystyle\mathbb{E}\left[\sqrt{n}\ (\overline{X}_n - \mu)\right]=0\)

    Now there is no effect of
    \(n\)
    on both the variance and the expectation of the distribution of
    \(\sqrt{n}\ (\overline{X}_n - \mu)\)
    .
  • Now let's look at the distribution of
    \(\displaystyle\sqrt{n}\ \left( \frac{\overline{X}_n - \mu}{\sigma} \right)\)
    .
  • \(\displaystyle\text{Var}\left(\sqrt{n}\ \left( \frac{\overline{X}_n - \mu}{\sigma} \right)\right)=1\)
    \(\displaystyle\mathbb{E}\left[\sqrt{n}\ \left( \frac{\overline{X}_n - \mu}{\sigma} \right)\right]=0\)

    Now we have standardize our random variable
    \(\overline{X}_n\)
    for every mean
    \((\mu)\)
    , variance
    \((\sigma^2)\)
    and sample size
    \((n)\)
    .

    Say
    \(Z_n := \displaystyle\sqrt{n}\ \left( \frac{\overline{X}_n - \mu}{\sigma} \right)\)
    , and
    \(Z\sim\mathcal{N}(0,1)\)
    .
    (
    \(Z\)
    is a standard normal random variable with mean
    \(0\)
    and variance
    \(1\)
    ).
    Now Central Limit Theorem states,
    For every
    \(z:\)
    \[\lim_{n\to\infty}\mathbb{P}(Z_n \lt z) = \mathbb{P}(Z \lt z)\]
    \[\sqrt{n}\, \frac{\overline{X}_n-\mu }{\sigma } \xrightarrow [n\to \infty ]{(d)} \mathcal{N}(0,1) \]

    So Central Limit Theorem says that, as
    \(n\to \infty\)
    then CDF(cumulative distribution function) of the random variable
    \(Z_n\)
    converges to the CDF of standard normal random variable
    \(Z\)
    .
    So Central Limit Theorem is a statement about the convergence of CDF, it's not a statement about the convergence of PDF or PMF.
    Rule of thumb to apply CLT: when
    \(n\geq 30\)
    then
    \(\sqrt{n}\, \frac{\overline{X}_n-\mu }{\sigma } \xrightarrow []{(d)} \mathcal{N}(0,1) \)

    Rate of Convergence

    Now we can finally answer our previous question in Weak Law of Large Numbers that, how fast(at what rate)
    \(\overline{X}_n\)
    approaches to
    \(\mu\)
    .
    If we draw a standard gaussian
    \(\mathcal{N}(0,1) \)
    (say)
    \(Z\)
    , then with probability
    \(0.9974,\)
    \(Z\in [-3,3]\)

    \(P(-3\leq Z\leq3)=0.9974\)
    (we can calculate it here).
    So
    \(Z\)
    is almost in between -3 and 3
    And we know that :
    \[\sqrt{n}\, \frac{\overline{X}_n-\mu }{\sigma } \xrightarrow [n\to \infty ]{(d)} \mathcal{N}(0,1) \]

    And we say that
    \(-3\leq\mathcal{N}(0,1)\leq3\)

    So:
    \[ -3\leq \sqrt{n}\, \frac{\overline{X}_n-\mu }{\sigma } \leq3 \\ \Rightarrow \left|\sqrt{n}\, \frac{\overline{X}_n-\mu }{\sigma } \right| \leq 3 \\ \Rightarrow \left| \overline{X}_n-\mu \right| \leq \frac{3\sigma}{\sqrt{n}} \]

    So according to CLT
    \(f(n)=\sqrt{n}\)


    Now let's see some Simulation, choose your language of choice,
    Python  ,  Julia
    Launch Statistics App launch



    Recommended Watching

    Central Limit Theorem (by Prof. John Tsitsiklis)
    Central Limit Theorem (by Khan Academy)
    Central Limit Theorem (by Sir Josh Starmer)
    Central Limit Theorem (by Sir Jeremy Balka)
    Real-world application of the CLT (by 365 Data Science)