Processing math: 100%

Wednesday, 28 January 2015

68\%, 95\%, 99.7\%

When we learn standard derivations, we learn about 68%, 95%, 99.7%, the percentage of values that lie within one, two and three standard deviations of the mean for a normal distribution. Have you ever wondered how these numerical values come about?

They come from the cumulative distribution function of the normal distribution:
[Proof: pending] \large \Phi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{x}e^{-\frac{1}{2}t^2}dt

Note that \phi(x) \ne \Phi(x). \phi(x) refers to height, whereas \Phi(x) refers to area.


For example, \large \Phi(2)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{2}e^{-t^2}dt. This integral cannot be expressed in closed form. We resort to using numerical integration, and we have \Phi(2) \approx 0.9772, or P(x\leq\mu+2\sigma) \approx 0.9772. To compute the probability that an observation is within two standard deviations of the mean, that is, the area shaded in light blue:

\begin{align}P(\mu-2\sigma \leq x \leq \mu+2\sigma)&=\Phi(2)-\Phi(-2)\\&=\int_{-\infty}^{2}-\int_{-\infty}^{-2}\\&\approx 0.9772-(1-0.9772)\\&\approx 0.9545\end{align}

Remark: \Phi(x)=1-\Phi(-x).

No comments:

Post a Comment