Weak Law of Large Numbers
Visualize Law of Large NumbersSuppose your classroom consists of
\(300\)
students and you want to know what is the average height of those \(300\)
students?Now say you measure height of
\(50\)
students and suppose that the average height of those \(50\)
students is somewhat near to the average height of all \(300\)
students. Here allSo now we have our\(300\)students is the population and those\(50\)students is the sample
so population size is\(300\)and sample size\((n)\)is\(50\)
\(50\)
observations \(X_1,X_2,\cdots,X_{50}\)
, and these observations are random, they are the result of some unknown random process, so we call these random observations as Random variables. These random variables are resultant of a common random process therefore they are identically distributed and all of them are independent of each other, so we call then I.I.D.(Independent and Identically Distributed) random variables.
So average height of those
\(50\)
students (sample mean) is \(\displaystyle\overline{X}_{50}=\frac{X_1+X_2+\cdots+X_{50}}{50}\)
. What we actually strive is to get the average over total population (Average height of all
\(300\)
students), the True mean. Let's say that the true mean is \(\mu\left(=\mathbb{E}[X_i]\right)\)
. Note
- True mean
\((\mu)\)is over the entire population. True mean\((\mu)\)is not random, it's a number.- Sample mean
\((\overline{X}_n)\)is over the observed values during an experiment. Sample mean\((\overline{X}_n)\)is a Random variable because\(X_1,\cdots,X_n\)are random.
Weak Law of Large Numbers says, as we increases our sample sizeSo according to the Weak Law of Large Numbers,\((n)\)then, in probability our sample mean goes toward the True mean (this is what we referred as Truth in our central dogma).
\[\overline{X}_n:=\frac{1}{n}\sum _{i=1}^ n X_ i \xrightarrow [n\to \infty ] {\mathbb{P}} \mu\]
\(:=\quad\)this symbol means "by definition"\(\mathbb{P}\quad\)it means "in Probability"
Explanation
For\(X_1,X_2,\dots,X_n\)
I.I.D. random variables with finite mean \(\mu\)
and variance \(\sigma^2\)
. Sample mean
\(\displaystyle\overline{X}_n=\frac{X_1+\cdots+X_n}{n}\)
\(\displaystyle\mathbb{E}[\overline{X}_n]=\mu\)
\(\displaystyle\mathbb{E}[\overline{X}_n]=\mathbb{E}\left[\frac{X_1+\cdots+X_n}{n}\right]\)\(\displaystyle\mathbb{E}[\overline{X}_n]=\frac{\mathbb{E}[X_1]+\cdots+\mathbb{E}[X_n]}{n}\)\(\displaystyle\mathbb{E}[\overline{X}_n]=\frac{n\mu}{n}\)\(\displaystyle\mathbb{E}[\overline{X}_n]=\mu\)
\(\displaystyle\text{Var}[\overline{X}_n]=\frac{n\sigma^2}{n^2}=\frac{\sigma^2}{n}\)
\(\displaystyle\text{Var}[\overline{X}_n]=\text{Var}\left[\frac{X_1+\cdots+X_n}{n}\right]\)\(\displaystyle\text{Var}[\overline{X}_n]=\frac{\text{Var}[X_1]+\cdots+\text{Var}[X_n]}{n^2}\)\(\displaystyle\text{Var}[\overline{X}_n]=\frac{n\sigma^2}{n^2}=\frac{\sigma^2}{n}\)
\(\mathbb{P}\left(|\overline{X}_n - \mu| \geq \epsilon\right) \xrightarrow [n\to \infty ] {} 0;\quad\forall\epsilon\gt 0\)
By Chebyshev's inequality\(\displaystyle \mathbb{P}\left(|\overline{X}_n - \mu| \geq \epsilon\right) \leq \frac{\text{Var}(\overline{X}_n)}{\epsilon^2}\)\(\displaystyle \mathbb{P}\left(|\overline{X}_n - \mu| \geq \epsilon\right) \leq \frac{\sigma^2}{n\epsilon^2}\xrightarrow [n\to \infty ] {} 0;\quad\forall\epsilon\gt 0\)
So for any\(\epsilon\geq0\),\[\mathbb{P}\left(|\overline{X}_n - \mu|\geq \epsilon\right) \xrightarrow [n\to \infty ]{} 0\]This is convergence in probabilityLet's assume a very small number, like\(0.00001\). Now convergence in probability says that if\(n\)is large enough then it's highly unlikely for\(\overline{X}_n\)to be more than\(0.00001\)units away from\(\mu\).
Or say that, if\(n\)is large then it's extremely likely that\(\overline{X}_n\)is extremely close to\(\mu\).
Interpretation
For any\(\epsilon\gt0\)(it's constant), probability that the sample mean\((\overline{X}_n)\)falls away from the true mean\((\mu)\)by more than\(\epsilon\)goes to\(0\)as our sample size\((n)\to\infty\).
In our above example we have a population of\(300\)students, among those\(300\)students we randomly select\(50\)students and measure their heights\(X_1,\cdots,X_{50}\).
If the true mean of all\(300\)students is\(\mu(=\mathbb{E}[X_i])\), then we can say that,So our sample mean
- Height of the
\(i^{th}\)student is\(X_i = \mu + W_i\), where\(W_i\)is the measurement noise for the\(i^{th}\)student, and the Weak Law of Large Numbers tells us that as\(n\to\infty\)then in probability the average sample noise\(\to 0\).\((\overline{X}_n)\)is unlikely to be far from the true mean\((\mu)\).
So according to Weak Law of Large Numbers if we increase the number of students in our sample from
\(n=50\)
to say something like \(n=100\)
then we should get a better estimate of the true mean. There is also a Strong Law of large numbers.
Strong Law of Large Numbers,\[\overline{X}_n:=\frac{1}{n}\sum _{i=1}^ n X_ i \xrightarrow [n\to \infty ] {\mathbb{P},\text{ a.s.}} \mu\]\(:=\quad\)this symbol means "by definition"\(\mathbb{P}\quad\)it means "in Probability"\(\text{a.s.}\quad\)it means "almost surely"(with probability\(1\))
Note:\(\text{ a.s.}\)implies\(\mathbb{P}\)
Ok now we know that the Law of large numbers says, if we have large enough Sample size then our estimator
\(\overline{X}_n\)
and real parameter \(\mu\)
are close \(\overline{X}_n \xrightarrow [n\to \infty] {} \mu\)
, but how much close, we don't know! We don't know that how fast(at what rate) \(\overline{X}_n\)
approaches to \(\mu\)
. We can think it as:
\[ \left|\overline{X}_n -\mu \right| \propto \frac{1}{f(n)} \]
where \(f(n)\)
is an increasing function w.r.t. \(n\)
.As
\(f(n)\)
increases \(\left|\overline{X}_n -\mu\right| \)
decreases, so we want a function \(f(n)\)
that increases rapidly w.r.t. \(n\)
. For example
\(log(log(n))\)
increases very slowly so function like this are not useful.So what is the rate at which
\(\overline{X}_n\)
approaches \(\mu\)
? The answer is hidden in Central Limit Theorem.
Gambler's Fallacy
Gambler's Fallacy also known as Monte Carlo Fallacy is a rather popular mistaken belief that,If an independent event is occurring more frequently (then it normally does), then it's less likely to occur in the future.Example:Note that this statement is not true, as it's a mistaken belief
Say you start flipping a fair coin
\((p=0.5)\)
, and you observe that first \(20\)
tosses are \(\text{Heads}\)
. Then some might say that,
"According to Law of Large Numbers the average proportion of
\(\text{Heads}\)
shall be \(50\%\)
and we got \(20\)
\(\text{Heads}\)
in a row so there are high chances for our next toss to be \(\text{Tails}\)
." But the above statement is Incorrect
Even if you got
\(1000\)
\(\text{Heads}\)
in a row, but the probability of next toss to be \(\text{Tails}\)
is still \(50\%\)
. But why exactly is above statement False?
Because Law of Large Numbers says as
\(n\to\infty\)
our Sample mean \(\to\)
True mean. So even if we got
\(1000\)
\(\text{Heads}\)
in a row, there is still \(\infty\)
tosses are left to make our Sample mean a True mean.Now let's see some Simulation, choose your language of choice,,
Launch Statistics App
Recommended Watching
Chebyshev's Inequality? (by Prof. John Tsitsiklis)
Chebyshev's Inequality? (by Sir Ben Lambert)
The Weak Law of Large Numbers (by Prof. John Tsitsiklis)
Law of Large Numbers (by Sir Jeremy Jones)
The Gambler's Fallacy (by Sir Kevin deLaplante)
Also checkout Sir's Probability Fallacies playlist