Processing math: 100%

Wednesday, 17 June 2015

Convergence of random variables

Prerequisites

Convergence of a sequence of numbers
The sequence \frac{1}{2},\frac{2}{3},\frac{3}{4},\cdots,\frac{n}{n+1},\cdots is defined as a_n=\frac{n}{n+1} for n=1,2,3,\cdots The sequence converges to 1.
We say that a sequence a_1,a_2,a_3,\cdots converges to a limit L if a_n approaches L as n goes to infinity.

Formal definition:
A sequence a_1,a_2,a_3,\cdots converges to a limit L if \lim\limits_{n \to \infty}a_n=L. Namely, for any \epsilon>0, there exists an N \in \mathbb{N} such that |a_n-L|<\epsilon for all n>N.

Sequence of random variables
In any probability model, we have a sample space S and a probability measure P. Suppose that our sample space consists of a finite number of elements: S=\{s_1,s_2,\cdots,s_k\}. Then, a random variable X is a mapping that assigns a real number to any of the possible outcomes s_i, i=1,2,\cdots,k. In other words, X(s_i)=x_i, for i=1,2,\cdots,k.

Let's now consider a sequence of random variables X_1,X_2,X_3,\cdots. Each X_n is a function from S to real numbers. We can then write X_n(s_i)=x_{ni}, for i=1,2,\cdots,k.

In short, a sequence of random variables is simply a sequence of functions X_n:S\to \mathbb{R}.

Markov's inequality
P(X \geq t) \leq \frac{E(X)}{t}

Chebyshev's inequality
P(|X-\mu| \geq t) \leq \frac{Var(X)}{t^2}



Relations between different types of convergence
In this figure, the stronger types of convergence are on top. The ones on top imply those at the bottom. Namely, if a sequence of random variables converges in probability to a random variable X, then the sequence converges in distribution to X as well.

Convergence in distribution [weakest]

Formal definition:
A sequence of random variables X_1,X_2,X_3,\cdots converges in distribution to a random variable, denoted by X_n \stackrel{d}{\to} X, if \lim\limits_{n \to \infty} F_{X_n}(x)=F_X(x), for all x at which F_X(x) is continuous.
[Note that although we talk of a sequence of random variables converging in distribution, it is really the cumulative distribution functions that converge, not the random variables.]

When working with integer-valued random variables, the following theorem is often useful.

Consider the sequence X_1,X_2,X_3,\cdots and the random variable X. Assume that X and X_n (for all n) are non-negative and integer-valued. Then X_n \stackrel{d}{\to} X \iff \lim\limits_{n \to \infty} P_{X_n}(k)=P_X(k), for k=0,1,2,\cdots [Proof: later]

Example:
Let X_1,X_2,X_3,\cdots be a sequence of random variables such that X_n \sim \text{Binomial}(n,\frac{\lambda}{n}), for n \in \mathbb{N}, n>\lambda, where \lambda>0 is a constant. Show that X_n \stackrel{d}{\to} \text{Poisson}(\lambda).
[One can thus approximate a Binomial probability with a Poisson.]

Proof:
By the previous theorem, we just need to prove \lim\limits_{n \to \infty} P_{X_n}(k)=P_X(k), for k=0,1,2,\cdots.

We have \begin{align} \lim\limits_{n \to \infty} P_{X_n}(k)&=\lim\limits_{n \to \infty}\binom{n}{k}(\frac{\lambda}{n})^k(1-\frac{\lambda}{n})^{n-k}\\ &=\lambda^k \lim\limits_{n \to \infty}\frac{n!}{(n-k)!k!}(\frac{1}{n^k})(1-\frac{\lambda}{n})^{n-k}\\ &=\frac{\lambda^k}{k!} \lim\limits_{n \to \infty} \Big( [\frac{n(n-1)(n-2)\cdots(n-k+1)}{n^k}][(1-\frac{\lambda}{n})^n][(1-\frac{\lambda}{n})^{-k}] \Big)\\ &=\frac{\lambda^k}{k!} (1 \cdot e^{-\lambda} \cdot 1) \end{align} We conclude \lim\limits_{n \to \infty} P_{X_n}(k)=\frac{e^{-\lambda} \lambda^k}{k!}.\:\Box

The most famous example of convergence in distribution is the central limit theorem (CLT). The CLT states that the normalized average of independent and identically distributed random variables X_1,X_2,X_3,\cdots converges in distribution to a standard normal random variable.

Convergence in probability

Formal definition:
A sequence of random variables X_1,X_2,X_3,\cdots converges in probability to a random variable X, denoted by X_n \stackrel{p}{\to} X, if \lim\limits_{n \to \infty} P(|X_n-X|\geq \epsilon)=0, for all \epsilon >0.
[Intuitively, we can think like this: having a zero probability of X_n and X being different implies X_n "converges" to X.]

Example:
Let X be a random variable and X_n=X+Y_n, where E(Y_n)=\frac{1}{n}, Var(Y_n)=\frac{\sigma^2}{n}, where \sigma>0 is a constant. Show that X_n \stackrel{p}{\to} X.

Proof:
Recall triangle inequality |a+b| \leq |a|+|b| \: \forall \: a,b \in \mathbb{R}.
Choosing a=Y_n-E(Y_n) and b=E(Y_n), we have |Y_n| \leq |Y_n-E(Y_n)|+\frac{1}{n}.

Now, for any \epsilon >0, we have \begin{align} P(|X_n-X| \geq \epsilon)&=P(|Y_n| \geq \epsilon)\\&\leq P(|Y_n-E(Y_n)|+\frac{1}{n} \geq \epsilon)\\&=P(|Y_n-E(Y_n)|\geq \epsilon-\frac{1}{n})\\& \leq \frac{Var(Y_n)}{(\epsilon-\frac{1}{n})^2}\;\;\text{by Chebyshev's inequality}\\&=\frac{\sigma^2}{n(\epsilon-\frac{1}{n})^2} \to 0\:\:\text{as}\:n\to \infty \end{align} We conclude that X_n \stackrel{p}{\to} X.\:\Box

The most famous example of convergence in probability is the weak law of large numbers (WLLN). The WLLN states that if X_1,X_2,X_3,\cdots are independent and identically distributed random variables random variables with mean E(X_i)=\mu<\infty, then the average sequence defined by \large \bar{X_n}=\frac{X_1+X_2+\cdots+X_n}{n} converges in probability to \mu. It is called the weak law because it refers to convergence in probability while the strong law of large numbers (SLLN) is related to almost sure convergence (a stronger convergence).

Convergence in mean

Formal definition:
Let r\geq 1 be a fixed number. A sequence of random variables X_1,X_2,X_3,\cdots converges in the rth mean or in the L^r norm to a random variable X, denoted by X_n \stackrel{L^r}{\to} X, if \lim\limits_{n \to \infty} E(|X_n-X|^r)=0.
If r=2, it is called the mean-square convergence X_n \stackrel{m.s.}{\to} X.

Theorem:
If X_n \stackrel{L^r}{\to} X for some r \geq 1, then X_n \stackrel{p}{\to} X.

Proof:
For any \epsilon>0, we have P(|X_n-X| \geq \epsilon)=P(|X_n-X|^r \geq \epsilon^r) \leq \frac{E(|X_n-X|^r)}{\epsilon^r} \;\; \text{by Markov's inequality}

Since \lim\limits_{n \to \infty} E(|X_n-X|^r)=0, we conclude \lim\limits_{n \to \infty} P(|X_n-X| \geq \epsilon)=0 for all \epsilon>0.\:\Box

Theorem:
Let 1\leq r\leq s. If X_n \stackrel{L^s}{\to} X, then X_n \stackrel{L^r}{\to} X.

Proof:
Recall Hölder's inequality E|XY| \leq (E|X|^p)^{\frac{1}{p}}(E|Y|^q)^{\frac{1}{q}} where 1<p,q<\infty and \frac{1}{p}+\frac{1}{q}=1. Choose X=|X_n-X|^r, Y=1, p=\frac{s}{r}>1.
Then E|X_n-X|^r \leq (E|X_n-X|^s)^{\frac{1}{p}}. Since \lim\limits_{n \to \infty} E(|X_n-X|^s)=0, we conclude \lim\limits_{n \to \infty} E(|X_n-X|^r) \leq \lim\limits_{n \to \infty} (E(|X_n-X|^s)^{\frac{1}{p}}=0 Therefore, X_n \stackrel{L^r}{\to} X.\:\Box

Almost sure convergence (convergence with probability 1)

Formal definition:
A sequence of random variables X_1,X_2,X_3,\cdots converges almost surely to a random variable X, denoted by X_n \stackrel{a.s.}{\to} X, if P(\{s\in S: \lim\limits_{n \to \infty} X_n(s)=X(s)\})=1.

In some problems, proving almost sure convergence directly can be difficult. Here is a result that is sometimes useful when we would like to prove almost sure convergence.

Consider the sequence X_1,X_2,X_3,\cdots. If for all \epsilon>0, we have \sum\limits_{n=1}^\infty P(|X_n-X|>\epsilon)<\infty, then X_n \stackrel{a.s.}{\to} X.

Strong law of large numbers
Let X_1,X_2,\cdots,X_n be independent and identically distributed random variables with expected value E(X_i)=\mu <\infty. Let also M_n=\frac{X_1+X_2+\cdots+X_n}{n}. Then M_n \stackrel{a.s.}{\to} \mu.

Useful:
Convergence concepts
Convergence in Distribution
Asymptotic theory
Measure

No comments:

Post a Comment