Math of the Day: Convergence of random variables

Prerequisites

Convergence of a sequence of numbers
The sequence $\frac{1}{2},\frac{2}{3},\frac{3}{4},\cdots,\frac{n}{n+1},\cdots$ is defined as $a_n=\frac{n}{n+1}$ for $n=1,2,3,\cdots$ The sequence converges to $1$.
We say that a sequence $a_1,a_2,a_3,\cdots$ converges to a limit $L$ if $a_n$ approaches $L$ as $n$ goes to infinity.

Formal definition:
A sequence $a_1,a_2,a_3,\cdots$ converges to a limit $L$ if $\lim\limits_{n \to \infty}a_n=L$. Namely, for any $\epsilon>0$, there exists an $N \in \mathbb{N}$ such that $|a_n-L|<\epsilon$ for all $n>N$.

Sequence of random variables
In any probability model, we have a sample space $S$ and a probability measure $P$. Suppose that our sample space consists of a finite number of elements: $S=\{s_1,s_2,\cdots,s_k\}$. Then, a random variable $X$ is a mapping that assigns a real number to any of the possible outcomes $s_i$, $i=1,2,\cdots,k$. In other words, $X(s_i)=x_i$, for $i=1,2,\cdots,k$.

Let's now consider a sequence of random variables $X_1,X_2,X_3,\cdots$. Each $X_n$ is a function from $S$ to real numbers. We can then write $X_n(s_i)=x_{ni}$, for $i=1,2,\cdots,k$.

In short, a sequence of random variables is simply a sequence of functions $X_n:S\to \mathbb{R}$.

Markov's inequality
$P(X \geq t) \leq \frac{E(X)}{t}$

Chebyshev's inequality
$P(|X-\mu| \geq t) \leq \frac{Var(X)}{t^2}$

Relations between different types of convergence

In this figure, the stronger types of convergence are on top. The ones on top imply those at the bottom. Namely, if a sequence of random variables converges in probability to a random variable $X$, then the sequence converges in distribution to $X$ as well.

Convergence in distribution [weakest]

Formal definition:
A sequence of random variables $X_1,X_2,X_3,\cdots$ converges in distribution to a random variable, denoted by $X_n \stackrel{d}{\to} X$, if $\lim\limits_{n \to \infty} F_{X_n}(x)=F_X(x)$, for all $x$ at which $F_X(x)$ is continuous.
[Note that although we talk of a sequence of random variables converging in distribution, it is really the cumulative distribution functions that converge, not the random variables.]

When working with integer-valued random variables, the following theorem is often useful.

Consider the sequence $X_1,X_2,X_3,\cdots$ and the random variable $X$. Assume that $X$ and $X_n$ (for all $n$) are non-negative and integer-valued. Then $X_n \stackrel{d}{\to} X \iff \lim\limits_{n \to \infty} P_{X_n}(k)=P_X(k)$, for $k=0,1,2,\cdots$ [Proof: later]

Example:
Let $X_1,X_2,X_3,\cdots$ be a sequence of random variables such that $X_n \sim \text{Binomial}(n,\frac{\lambda}{n})$, for $n \in \mathbb{N}, n>\lambda$, where $\lambda>0$ is a constant. Show that $X_n \stackrel{d}{\to} \text{Poisson}(\lambda).$
[One can thus approximate a Binomial probability with a Poisson.]

Proof:
By the previous theorem, we just need to prove $\lim\limits_{n \to \infty} P_{X_n}(k)=P_X(k)$, for $k=0,1,2,\cdots$.

We have $$\begin{align} \lim\limits_{n \to \infty} P_{X_n}(k)&=\lim\limits_{n \to \infty}\binom{n}{k}(\frac{\lambda}{n})^k(1-\frac{\lambda}{n})^{n-k}\\ &=\lambda^k \lim\limits_{n \to \infty}\frac{n!}{(n-k)!k!}(\frac{1}{n^k})(1-\frac{\lambda}{n})^{n-k}\\ &=\frac{\lambda^k}{k!} \lim\limits_{n \to \infty} \Big( [\frac{n(n-1)(n-2)\cdots(n-k+1)}{n^k}][(1-\frac{\lambda}{n})^n][(1-\frac{\lambda}{n})^{-k}] \Big)\\ &=\frac{\lambda^k}{k!} (1 \cdot e^{-\lambda} \cdot 1) \end{align}$$ We conclude $\lim\limits_{n \to \infty} P_{X_n}(k)=\frac{e^{-\lambda} \lambda^k}{k!}.\:\Box$

The most famous example of convergence in distribution is the central limit theorem (CLT). The CLT states that the normalized average of independent and identically distributed random variables $X_1,X_2,X_3,\cdots$ converges in distribution to a standard normal random variable.

Convergence in probability

Formal definition:
A sequence of random variables $X_1,X_2,X_3,\cdots$ converges in probability to a random variable $X$, denoted by $X_n \stackrel{p}{\to} X$, if $\lim\limits_{n \to \infty} P(|X_n-X|\geq \epsilon)=0$, for all $\epsilon >0$.
[Intuitively, we can think like this: having a zero probability of $X_n$ and $X$ being different implies $X_n$ "converges" to $X$.]

Example:
Let $X$ be a random variable and $X_n=X+Y_n$, where $E(Y_n)=\frac{1}{n}, Var(Y_n)=\frac{\sigma^2}{n}$, where $\sigma>0$ is a constant. Show that $X_n \stackrel{p}{\to} X$.

Proof:
Recall triangle inequality $|a+b| \leq |a|+|b| \: \forall \: a,b \in \mathbb{R}$.
Choosing $a=Y_n-E(Y_n)$ and $b=E(Y_n)$, we have $|Y_n| \leq |Y_n-E(Y_n)|+\frac{1}{n}$.

Now, for any $\epsilon >0$, we have $$\begin{align} P(|X_n-X| \geq \epsilon)&=P(|Y_n| \geq \epsilon)\\&\leq P(|Y_n-E(Y_n)|+\frac{1}{n} \geq \epsilon)\\&=P(|Y_n-E(Y_n)|\geq \epsilon-\frac{1}{n})\\& \leq \frac{Var(Y_n)}{(\epsilon-\frac{1}{n})^2}\;\;\text{by Chebyshev's inequality}\\&=\frac{\sigma^2}{n(\epsilon-\frac{1}{n})^2} \to 0\:\:\text{as}\:n\to \infty \end{align}$$ We conclude that $X_n \stackrel{p}{\to} X.\:\Box$

The most famous example of convergence in probability is the weak law of large numbers (WLLN). The WLLN states that if $X_1,X_2,X_3,\cdots$ are independent and identically distributed random variables random variables with mean $E(X_i)=\mu<\infty$, then the average sequence defined by $\large \bar{X_n}=\frac{X_1+X_2+\cdots+X_n}{n}$ converges in probability to $\mu$. It is called the weak law because it refers to convergence in probability while the strong law of large numbers (SLLN) is related to almost sure convergence (a stronger convergence).

Convergence in mean

Formal definition:
Let $r\geq 1$ be a fixed number. A sequence of random variables $X_1,X_2,X_3,\cdots$ converges in the rth mean or in the $L^r$ norm to a random variable $X$, denoted by $X_n \stackrel{L^r}{\to} X$, if $\lim\limits_{n \to \infty} E(|X_n-X|^r)=0$.
If $r=2$, it is called the mean-square convergence $X_n \stackrel{m.s.}{\to} X$.

Theorem:
If $X_n \stackrel{L^r}{\to} X$ for some $r \geq 1$, then $X_n \stackrel{p}{\to} X$.

Proof:
For any $\epsilon>0$, we have $P(|X_n-X| \geq \epsilon)=P(|X_n-X|^r \geq \epsilon^r) \leq \frac{E(|X_n-X|^r)}{\epsilon^r} \;\; \text{by Markov's inequality}$

Since $\lim\limits_{n \to \infty} E(|X_n-X|^r)=0$, we conclude $\lim\limits_{n \to \infty} P(|X_n-X| \geq \epsilon)=0$ for all $\epsilon>0.\:\Box$

Theorem:
Let $1\leq r\leq s$. If $X_n \stackrel{L^s}{\to} X$, then $X_n \stackrel{L^r}{\to} X$.

Proof:
Recall Hölder's inequality $E|XY| \leq (E|X|^p)^{\frac{1}{p}}(E|Y|^q)^{\frac{1}{q}}$ where $1<p,q<\infty$ and $\frac{1}{p}+\frac{1}{q}=1$. Choose $X=|X_n-X|^r, Y=1, p=\frac{s}{r}>1$.
Then $E|X_n-X|^r \leq (E|X_n-X|^s)^{\frac{1}{p}}$. Since $\lim\limits_{n \to \infty} E(|X_n-X|^s)=0$, we conclude $\lim\limits_{n \to \infty} E(|X_n-X|^r) \leq \lim\limits_{n \to \infty} (E(|X_n-X|^s)^{\frac{1}{p}}=0$ Therefore, $X_n \stackrel{L^r}{\to} X.\:\Box$

Almost sure convergence (convergence with probability 1)

Formal definition:
A sequence of random variables $X_1,X_2,X_3,\cdots$ converges almost surely to a random variable $X$, denoted by $X_n \stackrel{a.s.}{\to} X$, if $P(\{s\in S: \lim\limits_{n \to \infty} X_n(s)=X(s)\})=1$.

In some problems, proving almost sure convergence directly can be difficult. Here is a result that is sometimes useful when we would like to prove almost sure convergence.

Consider the sequence $X_1,X_2,X_3,\cdots$. If for all $\epsilon>0$, we have $\sum\limits_{n=1}^\infty P(|X_n-X|>\epsilon)<\infty$, then $X_n \stackrel{a.s.}{\to} X$.

Strong law of large numbers
Let $X_1,X_2,\cdots,X_n$ be independent and identically distributed random variables with expected value $E(X_i)=\mu <\infty$. Let also $M_n=\frac{X_1+X_2+\cdots+X_n}{n}$. Then $M_n \stackrel{a.s.}{\to} \mu$.

Useful:
Convergence concepts
Convergence in Distribution
Asymptotic theory
Measure

Math of the Day

Wednesday, 17 June 2015

Convergence of random variables

No comments:

Post a Comment