Wednesday 28 January 2015

$68\%, 95\%, 99.7\%$

When we learn standard derivations, we learn about 68%, 95%, 99.7%, the percentage of values that lie within one, two and three standard deviations of the mean for a normal distribution. Have you ever wondered how these numerical values come about?

They come from the cumulative distribution function of the normal distribution:
[Proof: pending] $\large \Phi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{x}e^{-\frac{1}{2}t^2}dt$

Note that $\phi(x) \ne \Phi(x)$. $\phi(x)$ refers to height, whereas $\Phi(x)$ refers to area.


For example, $\large \Phi(2)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{2}e^{-t^2}dt$. This integral cannot be expressed in closed form. We resort to using numerical integration, and we have $\Phi(2) \approx 0.9772$, or $P(x\leq\mu+2\sigma) \approx 0.9772$. To compute the probability that an observation is within two standard deviations of the mean, that is, the area shaded in light blue:

$\begin{align}P(\mu-2\sigma \leq x \leq \mu+2\sigma)&=\Phi(2)-\Phi(-2)\\&=\int_{-\infty}^{2}-\int_{-\infty}^{-2}\\&\approx 0.9772-(1-0.9772)\\&\approx 0.9545\end{align}$

Remark: $\Phi(x)=1-\Phi(-x)$.

No comments:

Post a Comment