Math of the Day

Friday, 19 June 2015

Inverse trigonometric functions

A fast way to find inverse functions (only for simple functions)
Note that a function and its inverse undo each other. Say $f(x)=2x+1$. The function $f$ takes $x$ and multiplies it by $2$ and then adds $1$. To undo that, we go in reverse orders. We first subtract $1$ from $x$ and then we divide the whole thing by $2$, giving us $f^{-1}(x)=\frac{x-3}{2}$. Here's one more example. Say $f(x)=\sqrt{x+2}$. $f$ takes $x$, adds $2$ and then take the square root. To undo the square root, we square x; and then instead of adding $2$, we subtract $2$. So $f^{-1}(x)=x^2-2$. Last example: $f(x)=\frac{1}{x+4}-2$. $f$ takes $x$, adds $4$, inverses the sum and then minuses $2$. $f^{-1}$ adds 2, inverses the sum and lastly minuses $4$. $f^{-1}(x)=\frac{1}{x+2}-4$
[Here we are using the term "function" loosely. Formally, when one talks about a function, the domain and codomain should be specified. For simplicity, we only provide the expression of the function.]

Differentiating inverse functions
Consider a linear function $l(x)=mx+b$. Its inverse will be $l^{-1}(y)=\frac{1}{m}y-\frac{b}{m}$. We then have $l'(x)=m$ and $(l^{-1})'(y)=\frac{1}{m}$, the reciprocal. We can express this by $\frac{dx}{dy}=\frac{1}{\frac{dy}{dx}}$.

Flipping the graphs preserves tangency.

Since the line $y=mx+b$ is tangent to the curve on the left and flipping the drawing should preserve this tangency, the line $x=\frac{1}{m}y-\frac{b}{m}$ is the tangent line to $x=f^{-1}(y)$ at $(y_0,x_0)$ and its slope $\frac{1}{m}$ should be the derivative $(f^{-1})'(y_0)$. Since $m=f'(x_0)$, we conclude that $(f^{-1})'(y_0)=\frac{1}{f'(x_0)}=\frac{1}{f'(f^{-1}(y_0))}$, or equivalently $f'(x_0)=\frac{1}{(f^{-1})'(f(x_0))}$.

Examples:
Find the derivative of the inverse of this real function $f(x)=2x+\cos x$.
$f'(x)=2-\sin x$.
$(f^{-1})'(x)=\frac{1}{f'(f^{-1}(x))}=\frac{1}{2-\sin f^{-1}(x)}$.

Let $y=f(x)=x^7+4x-5$. Find $(f^{-1})'(-5)$.
Note that $y=-5$ corresponds to $x=0$. Also, $f'(x)=7x^6+4$, so $(f^{-1})'(-5)=\frac{1}{f'(0)}=\frac{1}{4}$.

Inverse trigonometric functions and their derivatives

Sine and arcsine
The sine function $f: \mathbb{R} \to [-1,1] \; f(x)=\sin x$ is not one-to-one, but if we restrict the domain to be $[-\frac{\pi}{2},\frac{\pi}{2}]$, $f$ becomes one-to-one.

We have $\sin^{-1}y=x \iff \sin x=y \quad \text{and} \quad -\frac{\pi}{2}\leq x \leq \frac{\pi}{2}$.
Then $f^{-1}: [-1,1] \to [-\frac{\pi}{2},\frac{\pi}{2}] \quad f^{-1}(x)=\sin^{-1}x$.

The cancellation equations for inverse functions become
$\sin^{-1}(\sin x)=x \quad -\frac{\pi}{2}\leq x \leq\frac{\pi}{2}$
$\sin(\sin^{-1} x)=x \quad -1 \leq x \leq 1$.

$y=\sin x, -\frac{\pi}{2}\leq x \leq\frac{\pi}{2}$ $y=\sin^{-1} x, -1\leq x \leq 1$

Let $y=\sin^{-1}x$. Then $y'=\frac{1}{\cos y}$ from the formula we discussed earlier. Alternatively, one can differentiate $\sin y=x$ implicitly with respect to x to yield the same result. Now $\cos y \geq 0$ since $-\frac{\pi}{2} \leq y \leq \frac{\pi}{2}$. We have $\cos y=\sqrt{1-\sin^2{y}}=\sqrt{1-x^2}$, thus $\frac{d}{dx}(\sin^{-1}x)=\frac{1}{\sqrt{1-x^2}} \quad -1<x<1$.

Cosine and arccosine
The inverse cosine function is handled similarly. The restricted cosine function $f(x)=\cos x, 0\leq x\leq \pi$, is one-to-one and so has an inverse function.

$y=\cos x, 0 \leq x \leq \pi$ $y=\cos^{-1} x, -1\leq x \leq 1$

We have $\cos^{-1}y=x \iff \cos x=y \quad \text{and} \quad 0\leq x \leq \pi$
and $f^{-1}: [-1,1] \to [0,\pi] \quad f^{-1}(x)=\cos^{-1}x$. Its derivative is given by $\frac{d}{dx}(\cos^{-1}x)=-\frac{1}{\sqrt{1-x^2}} \quad -1<x<1$.

Tangent and arctangent
The tangent function can be made one-to-one by restricting its domain to $(-\frac{\pi}{2},\frac{\pi}{2})$.

We have $\tan^{-1}y=x \iff \tan x=y \quad \text{and} \quad -\frac{\pi}{2}\leq x \leq \frac{\pi}{2}$
and $f^{-1}: \mathbb{R} \to (-\frac{\pi}{2},\frac{\pi}{2}) \quad f^{-1}(x)=\tan^{-1}x$.
We know that $\lim\limits_{x \to \frac{\pi}{2}^-} \tan x=\infty$ and $\lim\limits_{x \to -\frac{\pi}{2}^+} \tan x=-\infty$, namely, $x=\pm \frac{\pi}{2}$ are vertical asymptotes of the graph of $\tan x$. The graph of $\tan^{-1}$ is obtained by reflecting the graph of the restricted tangent function about $y=x$, it follows that $y=\pm \frac{\pi}{2}$ are horizontal asymptotes of the graph of $\tan^{-1}$. We thus have $\lim\limits_{x \to \infty} \tan^{-1} x=\frac{\pi}{2}$ and $\lim\limits_{x \to -\infty} \tan^{-1} x=-\frac{\pi}{2}$.

$y=\tan x, -\frac{\pi}{2}\leq x \leq\frac{\pi}{2}$ $y=\tan^{-1} x, x \in \mathbb{R}$

The derivative of the arctangent function is $\frac{d}{dx}(\tan^{-1}x)=\frac{1}{1+x^2} \quad -1<x<1$.

Inverse Hyperbolic Function and Their Derivatives

Inverse function integration
Suppose $f$ is a continuous one-to-one function. Then $\int f^{-1}(x)dx=xf^{-1}(x)-\int f(u) d(u)=xf^{-1}(x)-F(f^{-1}(x)).$

Definite integral version
$\int_a^b f^{-1}(x)=[xf^{-1}(x)]_a^b-\int_{f^{-1}(a)}^{f^{-1}(b)} f(u) du=[xf^{-1}(x)]_a^b-F(f^{-1}(b))+F(f^{-1}(a))$

Examples:
$\begin{align} \int \arctan x \:dx &= x\arctan x-\int \tan u \:du \quad [u=\arctan x]\\
&=x\arctan x+\ln|\cos u|+C\\ &=x\arctan x-\frac{1}{2}\ln(1+x^2)+C \end{align}$
since $\ln|\cos u|=\frac{1}{2}\ln(\cos^2 u)=-\frac{1}{2}\ln(\sec^2 u)=-\frac{1}{2}\ln(1+x^2)$.

$\int \ln x \:dx=x\ln x-x+C$
$\int \arccos x \:dx=x\arccos x-\sin(\arccos x)+C$

Here's an interesting theorem.
If $f(a)=c$ and $f(b)=d$, we have $\int_c^d f^{-1}(y)dy\:\text{[blue region]}+\int_a^b f(x)dx\:\text{[grey region]}=bd-ac$.

Proof without words

Higher derivatives
Related

Reference:
Differentiating inverse functions
Inverse trigonometric functions and their derivatives

Wednesday, 17 June 2015

Convergence of random variables

Prerequisites

Convergence of a sequence of numbers
The sequence $\frac{1}{2},\frac{2}{3},\frac{3}{4},\cdots,\frac{n}{n+1},\cdots$ is defined as $a_n=\frac{n}{n+1}$ for $n=1,2,3,\cdots$ The sequence converges to $1$.
We say that a sequence $a_1,a_2,a_3,\cdots$ converges to a limit $L$ if $a_n$ approaches $L$ as $n$ goes to infinity.

Formal definition:
A sequence $a_1,a_2,a_3,\cdots$ converges to a limit $L$ if $\lim\limits_{n \to \infty}a_n=L$. Namely, for any $\epsilon>0$, there exists an $N \in \mathbb{N}$ such that $|a_n-L|<\epsilon$ for all $n>N$.

Sequence of random variables
In any probability model, we have a sample space $S$ and a probability measure $P$. Suppose that our sample space consists of a finite number of elements: $S=\{s_1,s_2,\cdots,s_k\}$. Then, a random variable $X$ is a mapping that assigns a real number to any of the possible outcomes $s_i$, $i=1,2,\cdots,k$. In other words, $X(s_i)=x_i$, for $i=1,2,\cdots,k$.

Let's now consider a sequence of random variables $X_1,X_2,X_3,\cdots$. Each $X_n$ is a function from $S$ to real numbers. We can then write $X_n(s_i)=x_{ni}$, for $i=1,2,\cdots,k$.

In short, a sequence of random variables is simply a sequence of functions $X_n:S\to \mathbb{R}$.

Markov's inequality
$P(X \geq t) \leq \frac{E(X)}{t}$

Chebyshev's inequality
$P(|X-\mu| \geq t) \leq \frac{Var(X)}{t^2}$

Relations between different types of convergence

In this figure, the stronger types of convergence are on top. The ones on top imply those at the bottom. Namely, if a sequence of random variables converges in probability to a random variable $X$, then the sequence converges in distribution to $X$ as well.

Convergence in distribution [weakest]

Formal definition:
A sequence of random variables $X_1,X_2,X_3,\cdots$ converges in distribution to a random variable, denoted by $X_n \stackrel{d}{\to} X$, if $\lim\limits_{n \to \infty} F_{X_n}(x)=F_X(x)$, for all $x$ at which $F_X(x)$ is continuous.
[Note that although we talk of a sequence of random variables converging in distribution, it is really the cumulative distribution functions that converge, not the random variables.]

When working with integer-valued random variables, the following theorem is often useful.

Consider the sequence $X_1,X_2,X_3,\cdots$ and the random variable $X$. Assume that $X$ and $X_n$ (for all $n$) are non-negative and integer-valued. Then $X_n \stackrel{d}{\to} X \iff \lim\limits_{n \to \infty} P_{X_n}(k)=P_X(k)$, for $k=0,1,2,\cdots$ [Proof: later]

Example:
Let $X_1,X_2,X_3,\cdots$ be a sequence of random variables such that $X_n \sim \text{Binomial}(n,\frac{\lambda}{n})$, for $n \in \mathbb{N}, n>\lambda$, where $\lambda>0$ is a constant. Show that $X_n \stackrel{d}{\to} \text{Poisson}(\lambda).$
[One can thus approximate a Binomial probability with a Poisson.]

Proof:
By the previous theorem, we just need to prove $\lim\limits_{n \to \infty} P_{X_n}(k)=P_X(k)$, for $k=0,1,2,\cdots$.

We have $$\begin{align} \lim\limits_{n \to \infty} P_{X_n}(k)&=\lim\limits_{n \to \infty}\binom{n}{k}(\frac{\lambda}{n})^k(1-\frac{\lambda}{n})^{n-k}\\ &=\lambda^k \lim\limits_{n \to \infty}\frac{n!}{(n-k)!k!}(\frac{1}{n^k})(1-\frac{\lambda}{n})^{n-k}\\ &=\frac{\lambda^k}{k!} \lim\limits_{n \to \infty} \Big( [\frac{n(n-1)(n-2)\cdots(n-k+1)}{n^k}][(1-\frac{\lambda}{n})^n][(1-\frac{\lambda}{n})^{-k}] \Big)\\ &=\frac{\lambda^k}{k!} (1 \cdot e^{-\lambda} \cdot 1) \end{align}$$ We conclude $\lim\limits_{n \to \infty} P_{X_n}(k)=\frac{e^{-\lambda} \lambda^k}{k!}.\:\Box$

The most famous example of convergence in distribution is the central limit theorem (CLT). The CLT states that the normalized average of independent and identically distributed random variables $X_1,X_2,X_3,\cdots$ converges in distribution to a standard normal random variable.

Convergence in probability

Formal definition:
A sequence of random variables $X_1,X_2,X_3,\cdots$ converges in probability to a random variable $X$, denoted by $X_n \stackrel{p}{\to} X$, if $\lim\limits_{n \to \infty} P(|X_n-X|\geq \epsilon)=0$, for all $\epsilon >0$.
[Intuitively, we can think like this: having a zero probability of $X_n$ and $X$ being different implies $X_n$ "converges" to $X$.]

Example:
Let $X$ be a random variable and $X_n=X+Y_n$, where $E(Y_n)=\frac{1}{n}, Var(Y_n)=\frac{\sigma^2}{n}$, where $\sigma>0$ is a constant. Show that $X_n \stackrel{p}{\to} X$.

Proof:
Recall triangle inequality $|a+b| \leq |a|+|b| \: \forall \: a,b \in \mathbb{R}$.
Choosing $a=Y_n-E(Y_n)$ and $b=E(Y_n)$, we have $|Y_n| \leq |Y_n-E(Y_n)|+\frac{1}{n}$.

Now, for any $\epsilon >0$, we have $$\begin{align} P(|X_n-X| \geq \epsilon)&=P(|Y_n| \geq \epsilon)\\&\leq P(|Y_n-E(Y_n)|+\frac{1}{n} \geq \epsilon)\\&=P(|Y_n-E(Y_n)|\geq \epsilon-\frac{1}{n})\\& \leq \frac{Var(Y_n)}{(\epsilon-\frac{1}{n})^2}\;\;\text{by Chebyshev's inequality}\\&=\frac{\sigma^2}{n(\epsilon-\frac{1}{n})^2} \to 0\:\:\text{as}\:n\to \infty \end{align}$$ We conclude that $X_n \stackrel{p}{\to} X.\:\Box$

The most famous example of convergence in probability is the weak law of large numbers (WLLN). The WLLN states that if $X_1,X_2,X_3,\cdots$ are independent and identically distributed random variables random variables with mean $E(X_i)=\mu<\infty$, then the average sequence defined by $\large \bar{X_n}=\frac{X_1+X_2+\cdots+X_n}{n}$ converges in probability to $\mu$. It is called the weak law because it refers to convergence in probability while the strong law of large numbers (SLLN) is related to almost sure convergence (a stronger convergence).

Convergence in mean

Formal definition:
Let $r\geq 1$ be a fixed number. A sequence of random variables $X_1,X_2,X_3,\cdots$ converges in the rth mean or in the $L^r$ norm to a random variable $X$, denoted by $X_n \stackrel{L^r}{\to} X$, if $\lim\limits_{n \to \infty} E(|X_n-X|^r)=0$.
If $r=2$, it is called the mean-square convergence $X_n \stackrel{m.s.}{\to} X$.

Theorem:
If $X_n \stackrel{L^r}{\to} X$ for some $r \geq 1$, then $X_n \stackrel{p}{\to} X$.

Proof:
For any $\epsilon>0$, we have $P(|X_n-X| \geq \epsilon)=P(|X_n-X|^r \geq \epsilon^r) \leq \frac{E(|X_n-X|^r)}{\epsilon^r} \;\; \text{by Markov's inequality}$

Since $\lim\limits_{n \to \infty} E(|X_n-X|^r)=0$, we conclude $\lim\limits_{n \to \infty} P(|X_n-X| \geq \epsilon)=0$ for all $\epsilon>0.\:\Box$

Theorem:
Let $1\leq r\leq s$. If $X_n \stackrel{L^s}{\to} X$, then $X_n \stackrel{L^r}{\to} X$.

Proof:
Recall Hölder's inequality $E|XY| \leq (E|X|^p)^{\frac{1}{p}}(E|Y|^q)^{\frac{1}{q}}$ where $1<p,q<\infty$ and $\frac{1}{p}+\frac{1}{q}=1$. Choose $X=|X_n-X|^r, Y=1, p=\frac{s}{r}>1$.
Then $E|X_n-X|^r \leq (E|X_n-X|^s)^{\frac{1}{p}}$. Since $\lim\limits_{n \to \infty} E(|X_n-X|^s)=0$, we conclude $\lim\limits_{n \to \infty} E(|X_n-X|^r) \leq \lim\limits_{n \to \infty} (E(|X_n-X|^s)^{\frac{1}{p}}=0$ Therefore, $X_n \stackrel{L^r}{\to} X.\:\Box$

Almost sure convergence (convergence with probability 1)

Formal definition:
A sequence of random variables $X_1,X_2,X_3,\cdots$ converges almost surely to a random variable $X$, denoted by $X_n \stackrel{a.s.}{\to} X$, if $P(\{s\in S: \lim\limits_{n \to \infty} X_n(s)=X(s)\})=1$.

In some problems, proving almost sure convergence directly can be difficult. Here is a result that is sometimes useful when we would like to prove almost sure convergence.

Consider the sequence $X_1,X_2,X_3,\cdots$. If for all $\epsilon>0$, we have $\sum\limits_{n=1}^\infty P(|X_n-X|>\epsilon)<\infty$, then $X_n \stackrel{a.s.}{\to} X$.

Strong law of large numbers
Let $X_1,X_2,\cdots,X_n$ be independent and identically distributed random variables with expected value $E(X_i)=\mu <\infty$. Let also $M_n=\frac{X_1+X_2+\cdots+X_n}{n}$. Then $M_n \stackrel{a.s.}{\to} \mu$.

Useful:
Convergence concepts
Convergence in Distribution
Asymptotic theory
Measure

Tuesday, 16 June 2015

Complexification of vector spaces

Motivation to complexify real vector spaces:
One reason is to solve equations. If we want to prove theorems about real solutions to a system of real linear equations or a system of real linear differential equations, it can be convenient to examine the complex solution space, which gives us information about the real solution space.

Difference between direct sum and Cartesian product of two vector spaces
For the direct sum, the result is a new object of the same type: it has an algebraic structure. But Cartesian product is just a set. For example $R\oplus R$ is the set $R\times R$ with the operation $(a,b)+(c,d)=(a+c,b+d)$.

Complexifying with direct sums
Definition: The complexification of a real vector space $W$ over $\mathbb{C}$ is defined to be $W_\mathbb{C}=W\oplus W$, the set of all ordered pairs $(w_1,w_2)$, where $w_1,w_2 \in W$ obtained by formally extending scalar multiplication to include multiplication by complex numbers, with vector addition and multiplication by complex scalars defined as:

$(w_1+iv_1)+(w_2+iv_2)=(w_1+w_2)+i(v_1+v_2)$ for $w_1,v_1,w_2,v_2 \in W$

$(a+bi)(w_1,w_2)=(aw_1-bw_2,bw_1+aw_2)$ for $a,b \in \mathbb{R}, w_1,w_2 \in W$

This rule of multiplication is reasonable if you think about a pair $(w_1,w_2)\in W\oplus W$ as a formal sum of $w_1+iw_2$: $(a+bi)(w_1+iw_2)=aw_1+aiw_2+biw_1-bw_2=(aw_1-bw_2)+i(bw_1+aw_2)$.

In particular, $i(w_1,w_2)=(-w_2,w_1)$.
[$(a+bi)(w_1,w_2)"="a(w_1,w_2)+bi(w_1,w_2)$ For this expression to equal $(aw_1-bw_2)+i(bw_1+aw_2)$, we must have $i(w_1,w_2)=(-w_2,w_1)$.]

We can check that $W_\mathbb{C}$ is a complex vector space by the multiplication rule. Since $i(w,0)=(0,w)$, we have $(w_1,w_2)=(w_1,0)+(0,w_2)=(w_1,0)+i(w_2,0)$. The $\mathbb{R}$-linear function $w \mapsto (w,0)$ is called the standard embedding of $W$ into $W_\mathbb{C}$. With this, we can regard $W_\mathbb{C}$ as $W+iW$.

Examples:
The complexifications of $\mathbb{R}^n,M_n(\mathbb{R}),\mathbb{R}[X]$ are isomorphic to $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$ by sending an ordered pair $(w_1,w_2)$ in $\mathbb{R}^n \oplus \mathbb{R}^n,M_n(\mathbb{R})\oplus M_n(\mathbb{R}),\mathbb{R}[X]\oplus \mathbb{R}[X]$ to $w_1+iw_2$ in $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$.

[$\mathbb{R}[X]$ is the set of all polynomials with real coefficients in the variable $x$.]

Namely, we can identify the complexification $(\mathbb{R}^2)_\mathbb{C}$ with $\mathbb{C}^2$ by $(w_1,w_2)\mapsto w_1+iw_2$$\in \mathbb{R}^2+i\mathbb{R}^2=\mathbb{C}^2$ and this sends the basis vectors $\tiny\begin{pmatrix} 1\\0 \end{pmatrix}$ and $\tiny\begin{pmatrix} 0\\1 \end{pmatrix}$ of $\mathbb{R}^2 \subset (\mathbb{R}^2)_\mathbb{C}$ to $\tiny\begin{pmatrix} 1\\0 \end{pmatrix}$ and $\tiny\begin{pmatrix} 0\\1 \end{pmatrix}$ in $\mathbb{C}^2$, which are the standard basis vectors of $\mathbb{C}^2$ as a complex vector space.

Similarly, the identifications of $(\mathbb{R}^n)_\mathbb{C},M_n(\mathbb{R})_\mathbb{C},\mathbb{R}[X]_\mathbb{C}$ with $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$ turn every real basis of $\mathbb{R}^n,M_n(\mathbb{R}),\mathbb{R}[X]$ into a complex basis of $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$.

Denote the space of complex $n\times n$ matrices by $M_n(\mathbb{C})$, the space of anti-Hermitian $A^*=-A$ matrices by $u(n)$, the space of Hermitian matrices $A^*=A$ by $iu(n)$. Then $M_n(\mathbb{C})=u(n)\oplus iu(n)$. Thus $u(n)$ is a real form in $M_n(\mathbb{C})$ and $M_n(\mathbb{C})$ is a complexification of $u(n)$.

Definitions:
A map $T$ between complex vector spaces is said to be $\mathbb{C}$-linear if $T(\lambda w)=\lambda T(w)$ for all $w$ in the domain in $T$ and for all $\lambda \in \mathbb{C}$.
A map $T$ between complex vector spaces is said to be $\mathbb{C}$-antilinear or conjugate linear if it is additive and if $T(\lambda w)=\overline{\lambda}T(w)$ for all $w$ in the domain in $T$ and for all $\lambda \in \mathbb{C}$.

Theorem: If $W=0$, then $W_\mathbb{C}=0$. If $W\neq 0$ and $\{e_1,\cdots,e_n\}$ is an $\mathbb{R}$-basis of $W$, then $\{(e_1,0),\cdots,(e_n,0)\}$ is a $\mathbb{C}$-basis of $W_\mathbb{C}$. In particular, $\text{dim}_\mathbb{C}(W_\mathbb{C})=\text{dim}_\mathbb{R}(W)$ for all $W$.

Proof: We want to prove $\{(e_1,0),\cdots,(e_n,0)\}$ is a basis for $W_\mathbb{C}$: spanning set & linearly independence.
Suppose $W\neq 0$ and $\{e_1,\cdots,e_n\}$ is an $\mathbb{R}$-basis of $W$.
Spanning:
If $\{e_1,\cdots,e_n\}$ is an $\mathbb{R}$-basis of $W$, then $\{(e_1,0),\cdots,(e_n,0),(0,e_1),\cdots,(0,e_n)\}$ is a basis for the real vector space $W\oplus W$.
Let $w\in W_\mathbb{C}$. Using the basis for $W\oplus W$, there exist $a_1,\cdots,a_n,b_1,\cdots,b_n \in \mathbb{R}$ such that
$\begin{align}w&=a_1(e_1,0)+\cdots+a_n(e_n,0)+b_1(0,e_1)+\cdots+b_n(0,e_n) & \qquad [*]\\
&=a_1(e_1,0)+\cdots+a_n(e_n,0)+ib_1(e_1,0)+\cdots+ib_n(e_n,0) &[ib(e_j,0)=b(0,e_j)]\\
&=(a_1+ib_1)(e_1,0)+\cdots+(a_n+ib_n)(e_n,0) & \qquad [**]\end{align}$
$[*]: \mathbb{R}$-linear combination of $(e_j,0)$ and $(0,e_j)$
$[**]: \mathbb{C}$-linear combination of $(e_j,0)$
Therefore, $\{(e_1,0),\cdots,(e_n,0)\}$ is a $\mathbb{C}$-linear spanning set of $W_\mathbb{C}$.
Linearly independence:
Suppose we can write $(0,0)$ as a finite $\mathbb{C}$-linear combination of $(e_j,0)$.
$(a_1+ib_1)(e_1,0)+\cdots+(a_n+ib_n)(e_n,0)=(0,0)$ for some real $a_i,b_i\;\;i=1,\cdots,n$
Equivalently, $(a_1e_1+\cdots+a_ne_n,b_1e_1+\cdots+b_ne_n)=(0,0).\qquad [bi(e_j,0)=(0,be_j)]$
Therefore, $\sum\limits_{i=1}^n a_ie_i=0$ and $\sum\limits_{i=1}^n b_ie_i=0$ in $W$.
Since $e_j$'s are linearly independent over $\mathbb{R}$, all the coefficients $a_j$ and $b_j$ are 0, so $a_j+ib_j=0$ for all $j=1,\cdots,n\:\Box$.

Universal property of complexification
Theorem: For any $\mathbb{R}$-linear map $W\xrightarrow{f}V$ from $W$ into a complex vector space $V$, there is a unique $\mathbb{C}$-linear map $W_\mathbb{C}\xrightarrow{\tilde{f}}V$ making the diagram

$\;\; W \xrightarrow{\quad \quad} W_\mathbb{C}\\ \quad f \searrow \quad \nearrow \tilde{f}\\ \qquad\quad V$

commute, where the map $W\to W_\mathbb{C}$ is the standard embedding.

This theorem says $W\to W_\mathbb{C}$ is an initial object that admits a unique morphism to all other objects.

Complexification of linear transformation
Theorem: Every $\mathbb{R}$-linear transformation $\varphi:W\to W'$ of real vector spaces extends in a unique way to a $\mathbb{C}$-linear transformation of the complexifications: there is a unique $\mathbb{C}$-linear map $\varphi_\mathbb{C}:W_\mathbb{C}\to W'_\mathbb{C}$ making the diagram

$\qquad \qquad \;W\xrightarrow{\quad\;\varphi\;\quad}W'\\ w\mapsto (w,0) \Bigg\downarrow \qquad\qquad \Bigg\downarrow \varphi(w)\mapsto (\varphi(w),0) \\ \qquad \qquad W_\mathbb{C} \xrightarrow{\quad \varphi_\mathbb{C} \quad} W'_\mathbb{C}$

commute, where the vertical maps are the standard embeddings of real vector spaces into their complexifications.

Proof: If such a $\mathbb{C}$-linear map $\varphi_\mathbb{C}$ exists, then the commutativity of the diagram says $\varphi_\mathbb{C}(w,0)=(\varphi(w),0)$ for all $w\in W$. Therefore, for $(w_1,w_2)\in W_\mathbb{C}$, we have

$\begin{align} \varphi_\mathbb{C}(w_1,w_2)&=\varphi_\mathbb{C}(w_1,0)+\varphi_\mathbb{C}(0,w_2)\\
&=(\varphi_\mathbb{C}(w_1),0)+i\varphi_\mathbb{C}(w_2,0)\\
&=(\varphi(w_1),0)+i(\varphi(w_2),0)\\
&=(\varphi(w_1),0)+(0,\varphi(w_2))\\
&=(\varphi(w_1),\varphi(w_2)) \end{align}$

This tells us what $\varphi_\mathbb{C}$ must be. So define $\varphi_\mathbb{C}:W_\mathbb{C}\to W'_\mathbb{C}$ by $\varphi_\mathbb{C}(w_1,w_2)=(\varphi(w_1),\varphi(w_2))$. We need to check that $\varphi_\mathbb{C}$ is $\mathbb{C}$-linear. Now $\varphi_\mathbb{C}(w_1,w_2)=\varphi_\mathbb{C}(w_1,0)+i\varphi_\mathbb{C}(w_2,0)=(\varphi(w_1),0)+i(\varphi(w_2),0)=\varphi(w_1)+i\varphi(w_2).\:\Box$

In other words, to an operator $\varphi:W\to W'$, there corresponds an operator $\varphi_\mathbb{C}:W_\mathbb{C}\to W'_\mathbb{C}$ given by the formula $\varphi_\mathbb{C}(a+ib)=\varphi(a)+i\varphi(b)$. The operator $\varphi_\mathbb{C}$ is called the complexification of $\varphi$.

If $\{w_1,\cdots,w_n\}$ is a basis for $W$, $\{w'_1,\cdots,w'_m\}$ is a basis for $W'$, and $A$ is the matrix representation of $\varphi$ with respect to these bases, then $A$, regarded as a complex matrix, is also the representation of $W_\mathbb{C}$ with respect to the corresponding bases in $W_\mathbb{C}$ and $W'_\mathbb{C}$.

So the complexification process is a formal, coordinate-free way of saying: take the matrix $A$ of $\varphi$, with its real entries, but operate on it as a complex matrix. The advantage of making this abstracted definition is that we are not required to fix a choice of coordinates and use matrix representations. Say we want to make arguments about the complex eigenvalues and eigenvectors for a transformation $T:W\to W'$, while non-real eigenvalues and eigenvectors, by definition, cannot exist for a transformation between real vector spaces. What we really mean are the eigenvalues and eigenvectors of $\varphi_\mathbb{C}$. Also, the complexification process generalizes without change for infinite-dimensional spaces.

Complexifying with tensor products [pending]

Conjugations
In the complexification $W_\mathbb{C}$ of a real vector space $W$, we can define complex conjugation by $\overline{w}=\overline{u+iv}=u-iv$.

In an arbitrary complex vector space, however, there is no natural, basis-independent way of defining complex conjugation of vectors. For example, if we set the complex conjugate of a vector $u=(a+bi)e_j$ to be $\overline{u}=(a-bi)e_j$, this definition will give a different answer in the basis $\{ie_j\}$ since $u=[-i(a+bi)](ie_j) \Rightarrow \overline{u}=[i(a-bi)](ie_j)=-(a-bi)e_j$.

Thus the concept of complex conjugation of vectors requires prior knowledge of the 'real part' of a complex vector space. The complexification of a real space has precisely the required extra structure needed to define complex conjugation of vectors.

Later, we will use conjugation to describe the real subspaces which have the given complex vector space as its complexification.

Realification/decomplexificaction
One way of creating a real vector space $W_\mathbb{R}$ from a complex vector space $W$ is to forget altogether about the possibility of multiplying vectors by complex numbers and only allow scalar multiplication with real numbers. In abstract algebra, we call this the restriction of scalars from complex numbers to real numbers. In this process a pair of vectors $u$ and $iu$ must be regarded as linearly independent vectors in $W_\mathbb{R}$ for any non-zero vector $u\in W$. Thus if $W$ is finite-dimensional and $\text{dim}W=n$, then $W_\mathbb{R}$ is $2n$-dimensional, namely $\text{dim}_\mathbb{R} W=2\text{dim}_\mathbb{C}W$, for if $\{e_1,\cdots,e_n\}$ is a basis of $W$ then $\{e_1,\cdots,e_n,ie_1,\cdots,ie_n\}$ is a linearly independent set of vectors spanning $W_\mathbb{R}$.

We can verify that if $A=B+iC$ is the matrix of a linear map $A:V\to W$ with respect to bases $\{e_1,\cdots,e_n\}, \{\epsilon_1,\cdots,\epsilon_m\}$ and the matrices $B$ and $C$ are real, then the matrix of the linear map $A_\mathbb{R}$ with respect to the bases $\{e_1,\cdots,e_n\},\{ie_1,\cdots,ie_n\}$ and $\{\epsilon_1,\cdots,\epsilon_m\},\{i\epsilon_1,\cdots,i\epsilon_m\}$ is of the form $\begin{pmatrix} B & -C \\ C & B \end{pmatrix}$.

Example:
Say $n=2$, if $\{a+ib,c+id\}$ is a basis for a complex vector space $W$, then $\{(a,b,0,0),(-b,a,0,0),(0,0,c,d),(0,0,-d,c)\}$ is the associated basis for $W_\mathbb{R}=\mathbb{R}^{2\times2}=\mathbb{R}^4$.

More to explore:
Complexification of Lie algebras – a tool useful for representation theory
Complex spectral theorem – deducing a real classification for orthogonal matrices
Spliting characteristic polynomials
Extension of scalars & restriction of scalars

References:
Complexification
More to explore
Difference between Cartesian product and tensor product of two vector spaces
[?] Change of basis matrix
Complexification and realification [p3,5]
A Course in Modern Mathematical Physics Groups, Hilbert Space and Differential Geometry by Peter Szekeres [p154-156]
Ordinary Differential Equations by Vladimir I. Arnol'd [p177-189]
Problems and Theorems in Linear Algebra by Viktor Vasil' evich Prasolov [p55-57]

Sunday, 14 June 2015

Morphisms

General maps

one-to-one	injective
onto	surjective
any map	bijective

Linear maps

any map	homomorphism
one-to-one	monomorphism
onto	epimorphism
one-to-one and onto	isomorphism

Linear self-maps

any map	endomorphism
one-to-one and onto	automorphism

Examples of isomorphisms and maps that identifies the isomorphism:

$M_{1,3}(\Bbb F) \cong M_{3,1}(\Bbb F)\\ (x_1,x_2,x_3)\mapsto \begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}$

$M_{2,2}(\Bbb F) \cong \Bbb F^4\\ \begin{pmatrix}a&b\\ c&d \end{pmatrix}\mapsto (a,b,c,d)$

$M_{2,3}(\Bbb F) \cong M_{3,2}(\Bbb F)\\ A\mapsto A^T$

The plane $z=0$ in $\Bbb R^3$ is isomorphic to $\Bbb R^2$.
$(x,y,0)\mapsto (x,y)$

$P_n\cong \Bbb R^{n+1}\\ a_0+a_1x+\cdots+a_nx^n\mapsto (a_0,a_1,\cdots,a_n)$

$P$ is isomorphic to $\Bbb R_0^\infty$.
$a_0+a_1x+\cdots+a_nx^n\mapsto (a_0,a_1,\cdots,a_n,0,0,\cdots)$

$M_{m,n}(\Bbb F) \cong L(\Bbb F^n, \Bbb F^m)$
$A\mapsto L_A$, where $L_A(x)=Ax$.

Any vector space $V$ of dimension $n$ is isomorphic to $\Bbb F^n$.
$v\mapsto [v]_\alpha$, where $\alpha$ is a basis for $V$.

Category theory
A morphism $f:A\to B$ in a category $C$ is an isomorphism if there exists a morphism $g:B \to A$ such that both ways to compose $f$ and $g$ give the identity morphisms on the respective objects, namely $$gf=1_A, \quad fg=1_B.$$
A morphism $f:A\to B$ in a category $C$ is an monomorphism if for every object $C$ and every pair of morphisms $g,h:C\to A$, the condition $fg=fh$ implies $g=h$. Namely, the following diagram commutes:

When we say a diagram commutes, it means all ways to compose morphisms result in an identical morphisms. Therefore, whenever $f$ is an monomorphism and this diagram commutes, we can conclude $g=h$. The key idea is that monomorphisms allow us “cancel” $f$ from the left side of a composition.

The corresponding property for cancelling on the right is defined similarly.

A morphism $f:A\to B$ in a category $C$ is an epimorphism if for every object $C$ and every pair of morphisms $g,h:B\to C$, the condition $gf=hf$ implies $g=h$.

Whenever $f$ is an epimorphism and this diagram commutes, we can conclude $g=h$.

Reference:
Examples of isomorphisms
Properties of morphisms
An awesome video on morphisms

Thursday, 11 June 2015

Informal combinatorial proofs

Pascal's Triangle

A term in the middle is the sum of two terms above it.
$\binom{n}{k}+\binom{n}{k+1}=\binom{n+1}{k}$

The sum of all elements in each row is $2^n$.
$\sum\limits_{k=0}^n \binom{n}{k}=2^n$

Binomial theorem
$\sum\limits_{k=0}^n \binom{n}{k} x^k=(1+x)^n$
Put $x=1$, then $\sum\limits_{k=0}^n \binom{n}{k}=2^n$.
Put $x=2$, then $\sum\limits_{k=0}^n 2^k \binom{n}{k}=3^n$.

Bijections
Given a set $X$ and a non-negative integer $r$, an r-subset of $X$ is a subset $A \subset X$ of cardinality $r$. We denote the set of r-subsets of $X$ by $P_r(X)$, namely, $P_r(X)=\{A \subset X|\: |A|=r\}$. Define the binomial coefficient to be the cardinality of the set $P_r(X)$ when $|X|=n$.
There is a bijection $P_r(X) \to P_{n-r}(X)$ given by mapping each subset to its complement $A \mapsto A^c$. Hence $\binom{n}{r}=\binom{n}{n-r}$.
[...]

Combinatorial arguments
Key idea: count the same quantity in two different ways

$n^2=(n-1)^2+2(n-1)+1$
We count the number of pairs $(i,j)$ where $1 \leq i,j \leq n$, which is $n^2$. We subdivide the pairs in regard to how many 1s they have. There are $(n-1)^2$ pairs with no 1s; $2(n-1)$ pairs with one 1s; 1 pair with two 1s. Hence the result. In the same vein, we have $n^3=(n-1)^3+3(n-1)^2+3(n-1)+1$.

Remark: We can use the same situation to prove another identity, $1+3+5+\cdots+(2n-1)=n^2$. We classify the pairs according to the value of $\text{max}(i,j)$. There is one pair $(1,1)$ with $\text{max}(i,j)=1$ and three pairs $(1,2), (2,1), (2,2)$ with $\text{max}(i,j)=2$. Similarly, there are five pairs when $\text{max}(i,j)=3$; seven pairs with max value $4$; $(2n-1)$ pairs with max value $n$.

$\binom{m+n}{2}=\binom{m}{2}+\binom{n}{2}+mn$
We count the number of ways of selecting two numbers from a set $S$ with $m+n$ elements. We partition $S$ into two subsets $A$ and $B$ with $m$ and $n$ elements respectively. Then, either both the two elements are from A or B, or there is one each from A and B. Hence the stated identity.

Stars and bars
The number of solutions to the equation $a_1+a_2+\cdots+a_n=k, a_i \geq 0, a_i \in \mathbb{N}$ is $\binom{k+n-1}{k}$.
Put differently, an n-set has $\binom{k+n-1}{k}$ multisets of size $k$.

The problem is the same as finding the number of ways of placing $k$ undistinguishable objects (denoted by $*$) in $n$ numbered boxes where $a_i$ represents the number of objects in box $i$. We now put $k$ objects in a row.
$******$
We use vertical bars $|$ to represent the boxes, meaning the separation between two consecutive boxes. Hence we need $n-1$ bars. For example, in the arrangement $**|***|*$, we have 3 boxes of which the first has 2 objects, the second has 3 objects and the last has 1 object.
In total there will be $k+n−1$ positions with stars or bars. Therefore, the number of solutions is $\binom{k+n-1}{k}$. [...]

Useful:
http://www.maths.qmul.ac.uk/~pjc/notes/comb.pdf
http://www.millersville.edu/~rumble/Math.310/pascal3.pdf
http://www.erp.sophia.ac.jp/Projects/ese/notes/rmm/lecture4.pdf
https://math.dartmouth.edu/~m68f13/lectec.pdf