Monday 26 January 2015

Binomial and Poisson distributions

Let's say in a town with a population of 60000, we would expect 1 in 40000 of the population suffering from a rare disease in a year. Then the expected number of cases is 60000 $\large \cdot \frac{1}{40000}$ or 1.5.

This situation can be modelled by the binomial distribution.
P(get disease) = $\large \frac{1}{40000}$ and P(not get disease) = $\large \frac{39999}{40000}$.

Probability of 5 cases among 60000 people (and thus 59995 people not getting the disease):
$\large C_5^{60000} (\frac{39999}{40000})^{59995}(\frac{1}{40000})^5 \approx 0.0141$

What we are probably interested in, however, is not the probability of exactly 5 cases but that of 5 or more cases.

P(5 or more cases)
$= 1 - [P(0) + P(1) + P(2) + P(3) + P(4)]\\
= \large 1-(\frac{39999}{40000})^{60000}-C_1^{60000} (\frac{39999}{40000})^{59999}\frac{1}{40000}\\ \large-C_2^{60000} (\frac{39999}{40000})^{59998}(\frac{1}{40000})^2-C_3^{60000} (\frac{39999}{40000})^{59997}(\frac{1}{40000})^3\\ \large-C_4^{60000} (\frac{39999}{40000})^{59996}(\frac{1}{40000})^4\\
= \large 1-0.223-0.335-0.251-0.126-0.047\\
= \large 0.019$

As you can see, the calculation for this binomial distribution is tedious. In fact, we can approximate the binomial terms as follows, giving us a completely different distribution -- Poisson distribution. We assume the event is rare but there are many opportunities for it to occur, that is, p is small and n is large.

Let $\large (\frac{39999}{40000})^{60000}=k$, a constant.

Then P(1)
$\Large =C_1^{60000} (\frac{39999}{40000})^{59999}\frac{1}{40000}=\frac{60000\cdot(\frac{39999}{40000})^{60000}\cdot\frac{40000}{39999}}{40000}\\
\Large =k\cdot \frac{60000}{39999}\approx k\cdot\frac{60000}{40000}=k\cdot 1.5$

In the same vein, we found P(2) to be approximately $k\cdot \frac{(1.5)^2}{2}$.

Now do you notice something?

$\text{Number of cases} \:\:\:\:\: 0 \:\:\:\:\:\:\:\:\:\: 1 \:\:\:\:\:\:\:\:\:\:\:\:\:\:\: 2 \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\: 3 \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\: 4 \:\:\:\:\:\:\: \ldots\\
\text{Probability} \:\:\:\:\:\:\:\:\:\:\:\:\:\:\: k \:\:\:\:\: k \cdot 1.5 \:\:\:\:\: \frac{k \cdot (1.5)^2}{2!} \:\:\:\:\: \frac{k \cdot (1.5)^3}{3!} \:\:\:\:\: \frac{k \cdot (1.5)^4}{4!} \:\:\:\:\: \ldots $

Sum of probabilities = 1
$\begin{align} k + 1.5k + \frac{(1.5)^2}{2!}\cdot k + \frac{(1.5)^3}{3!}\cdot k + … &=1\\
k (1+1.5+\frac{(1.5)^2}{2!}+\frac{(1.5)^3}{3!}+…) &=1\\
k \cdot e^{1.5} &=1\\
k&=e^{-1.5}\end{align}$
$\large P(X=r)=\frac{e^{-1.5}(1.5)^r}{r!}$, where the discrete random variable X denotes the number of cases of the disease.

This can be generalised to the Poisson distribution with mean λ for which $\large P(X=r)=e^{-\lambda}\frac{\lambda^r}{r!}$.

More to explore:
http://www.math.uah.edu/stat/expect/Properties.html
http://web.mit.edu/jorloff/www/18.05/pdf/class6-prep-a.pdf
http://math.arizona.edu/~jwatkins/h-expectedvalue.pdf
P40-52, 134-137

No comments:

Post a Comment