Friday, 19 June 2015

Inverse trigonometric functions

A fast way to find inverse functions (only for simple functions)
Note that a function and its inverse undo each other. Say $f(x)=2x+1$. The function $f$ takes $x$ and multiplies it by $2$ and then adds $1$. To undo that, we go in reverse orders. We first subtract $1$ from $x$ and then we divide the whole thing by $2$, giving us $f^{-1}(x)=\frac{x-3}{2}$. Here's one more example. Say $f(x)=\sqrt{x+2}$. $f$ takes $x$, adds $2$ and then take the square root. To undo the square root, we square x; and then instead of adding $2$, we subtract $2$. So $f^{-1}(x)=x^2-2$. Last example: $f(x)=\frac{1}{x+4}-2$. $f$ takes $x$, adds $4$, inverses the sum and then minuses $2$. $f^{-1}$ adds 2, inverses the sum and lastly minuses $4$. $f^{-1}(x)=\frac{1}{x+2}-4$
[Here we are using the term "function" loosely. Formally, when one talks about a function, the domain and codomain should be specified. For simplicity, we only provide the expression of the function.]

Differentiating inverse functions
Consider a linear function $l(x)=mx+b$. Its inverse will be $l^{-1}(y)=\frac{1}{m}y-\frac{b}{m}$. We then have $l'(x)=m$ and $(l^{-1})'(y)=\frac{1}{m}$, the reciprocal. We can express this by $\frac{dx}{dy}=\frac{1}{\frac{dy}{dx}}$.

Flipping the graphs preserves tangency.

Since the line $y=mx+b$ is tangent to the curve on the left and flipping the drawing should preserve this tangency, the line $x=\frac{1}{m}y-\frac{b}{m}$ is the tangent line to $x=f^{-1}(y)$ at $(y_0,x_0)$ and its slope $\frac{1}{m}$ should be the derivative $(f^{-1})'(y_0)$. Since $m=f'(x_0)$, we conclude that $(f^{-1})'(y_0)=\frac{1}{f'(x_0)}=\frac{1}{f'(f^{-1}(y_0))}$, or equivalently $f'(x_0)=\frac{1}{(f^{-1})'(f(x_0))}$.

Examples:
Find the derivative of the inverse of this real function $f(x)=2x+\cos x$.
$f'(x)=2-\sin x$.
$(f^{-1})'(x)=\frac{1}{f'(f^{-1}(x))}=\frac{1}{2-\sin f^{-1}(x)}$.

Let $y=f(x)=x^7+4x-5$. Find $(f^{-1})'(-5)$.
Note that $y=-5$ corresponds to $x=0$. Also, $f'(x)=7x^6+4$, so $(f^{-1})'(-5)=\frac{1}{f'(0)}=\frac{1}{4}$.

Inverse trigonometric functions and their derivatives

Sine and arcsine
The sine function $f: \mathbb{R} \to [-1,1] \; f(x)=\sin x$ is not one-to-one, but if we restrict the domain to be $[-\frac{\pi}{2},\frac{\pi}{2}]$, $f$ becomes one-to-one.

We have $\sin^{-1}y=x \iff \sin x=y \quad \text{and} \quad -\frac{\pi}{2}\leq x \leq \frac{\pi}{2}$.
Then $f^{-1}: [-1,1] \to [-\frac{\pi}{2},\frac{\pi}{2}] \quad f^{-1}(x)=\sin^{-1}x$.

The cancellation equations for inverse functions become
$\sin^{-1}(\sin x)=x \quad -\frac{\pi}{2}\leq x \leq\frac{\pi}{2}$
$\sin(\sin^{-1} x)=x \quad -1 \leq x \leq 1$.

                  
$y=\sin x, -\frac{\pi}{2}\leq x \leq\frac{\pi}{2}$              $y=\sin^{-1} x, -1\leq x \leq 1$

Let $y=\sin^{-1}x$. Then $y'=\frac{1}{\cos y}$ from the formula we discussed earlier. Alternatively, one can differentiate $\sin y=x$ implicitly with respect to x to yield the same result. Now $\cos y \geq 0$ since $-\frac{\pi}{2} \leq y \leq \frac{\pi}{2}$. We have $\cos y=\sqrt{1-\sin^2{y}}=\sqrt{1-x^2}$, thus $\frac{d}{dx}(\sin^{-1}x)=\frac{1}{\sqrt{1-x^2}} \quad -1<x<1$.

Cosine and arccosine
The inverse cosine function is handled similarly. The restricted cosine function $f(x)=\cos x, 0\leq x\leq \pi$, is one-to-one and so has an inverse function.

                  
$y=\cos x, 0 \leq x \leq \pi$                     $y=\cos^{-1} x, -1\leq x \leq 1$

We have $\cos^{-1}y=x \iff \cos x=y \quad \text{and} \quad 0\leq x \leq \pi$
and $f^{-1}: [-1,1] \to [0,\pi] \quad f^{-1}(x)=\cos^{-1}x$. Its derivative is given by $\frac{d}{dx}(\cos^{-1}x)=-\frac{1}{\sqrt{1-x^2}} \quad -1<x<1$.

Tangent and arctangent
The tangent function can be made one-to-one by restricting its domain to $(-\frac{\pi}{2},\frac{\pi}{2})$.

We have $\tan^{-1}y=x \iff \tan x=y \quad \text{and} \quad -\frac{\pi}{2}\leq x \leq \frac{\pi}{2}$
and $f^{-1}: \mathbb{R} \to (-\frac{\pi}{2},\frac{\pi}{2}) \quad f^{-1}(x)=\tan^{-1}x$.
We know that $\lim\limits_{x \to \frac{\pi}{2}^-} \tan x=\infty$ and $\lim\limits_{x \to -\frac{\pi}{2}^+} \tan x=-\infty$, namely, $x=\pm \frac{\pi}{2}$ are vertical asymptotes of the graph of $\tan x$. The graph of $\tan^{-1}$ is obtained by reflecting the graph of the restricted tangent function about $y=x$, it follows that $y=\pm \frac{\pi}{2}$ are horizontal asymptotes of the graph of $\tan^{-1}$. We thus have $\lim\limits_{x \to \infty} \tan^{-1} x=\frac{\pi}{2}$ and $\lim\limits_{x \to -\infty} \tan^{-1} x=-\frac{\pi}{2}$.

                  
$y=\tan x, -\frac{\pi}{2}\leq x \leq\frac{\pi}{2}$                        $y=\tan^{-1} x, x \in \mathbb{R}$

The derivative of the arctangent function is $\frac{d}{dx}(\tan^{-1}x)=\frac{1}{1+x^2} \quad -1<x<1$.

Inverse Hyperbolic Function and Their Derivatives

Inverse function integration
Suppose $f$ is a continuous one-to-one function. Then $\int f^{-1}(x)dx=xf^{-1}(x)-\int f(u) d(u)=xf^{-1}(x)-F(f^{-1}(x)).$

Definite integral version
$\int_a^b f^{-1}(x)=[xf^{-1}(x)]_a^b-\int_{f^{-1}(a)}^{f^{-1}(b)} f(u) du=[xf^{-1}(x)]_a^b-F(f^{-1}(b))+F(f^{-1}(a))$

Examples:
$\begin{align} \int \arctan x \:dx &= x\arctan x-\int \tan u \:du \quad [u=\arctan x]\\
&=x\arctan x+\ln|\cos u|+C\\ &=x\arctan x-\frac{1}{2}\ln(1+x^2)+C \end{align}$
since $\ln|\cos u|=\frac{1}{2}\ln(\cos^2 u)=-\frac{1}{2}\ln(\sec^2 u)=-\frac{1}{2}\ln(1+x^2)$.

$\int \ln x \:dx=x\ln x-x+C$
$\int \arccos x \:dx=x\arccos x-\sin(\arccos x)+C$

Here's an interesting theorem.
If $f(a)=c$ and $f(b)=d$, we have $\int_c^d f^{-1}(y)dy\:\text{[blue region]}+\int_a^b f(x)dx\:\text{[grey region]}=bd-ac$.
Proof without words
Higher derivatives
Related

Reference:
Differentiating inverse functions
Inverse trigonometric functions and their derivatives

Wednesday, 17 June 2015

Convergence of random variables

Prerequisites

Convergence of a sequence of numbers
The sequence $\frac{1}{2},\frac{2}{3},\frac{3}{4},\cdots,\frac{n}{n+1},\cdots$ is defined as $a_n=\frac{n}{n+1}$ for $n=1,2,3,\cdots$ The sequence converges to $1$.
We say that a sequence $a_1,a_2,a_3,\cdots$ converges to a limit $L$ if $a_n$ approaches $L$ as $n$ goes to infinity.

Formal definition:
A sequence $a_1,a_2,a_3,\cdots$ converges to a limit $L$ if $\lim\limits_{n \to \infty}a_n=L$. Namely, for any $\epsilon>0$, there exists an $N \in \mathbb{N}$ such that $|a_n-L|<\epsilon$ for all $n>N$.

Sequence of random variables
In any probability model, we have a sample space $S$ and a probability measure $P$. Suppose that our sample space consists of a finite number of elements: $S=\{s_1,s_2,\cdots,s_k\}$. Then, a random variable $X$ is a mapping that assigns a real number to any of the possible outcomes $s_i$, $i=1,2,\cdots,k$. In other words, $X(s_i)=x_i$, for $i=1,2,\cdots,k$.

Let's now consider a sequence of random variables $X_1,X_2,X_3,\cdots$. Each $X_n$ is a function from $S$ to real numbers. We can then write $X_n(s_i)=x_{ni}$, for $i=1,2,\cdots,k$.

In short, a sequence of random variables is simply a sequence of functions $X_n:S\to \mathbb{R}$.

Markov's inequality
$P(X \geq t) \leq \frac{E(X)}{t}$

Chebyshev's inequality
$P(|X-\mu| \geq t) \leq \frac{Var(X)}{t^2}$



Relations between different types of convergence
In this figure, the stronger types of convergence are on top. The ones on top imply those at the bottom. Namely, if a sequence of random variables converges in probability to a random variable $X$, then the sequence converges in distribution to $X$ as well.

Convergence in distribution [weakest]

Formal definition:
A sequence of random variables $X_1,X_2,X_3,\cdots$ converges in distribution to a random variable, denoted by $X_n \stackrel{d}{\to} X$, if $\lim\limits_{n \to \infty} F_{X_n}(x)=F_X(x)$, for all $x$ at which $F_X(x)$ is continuous.
[Note that although we talk of a sequence of random variables converging in distribution, it is really the cumulative distribution functions that converge, not the random variables.]

When working with integer-valued random variables, the following theorem is often useful.

Consider the sequence $X_1,X_2,X_3,\cdots$ and the random variable $X$. Assume that $X$ and $X_n$ (for all $n$) are non-negative and integer-valued. Then $X_n \stackrel{d}{\to} X \iff \lim\limits_{n \to \infty} P_{X_n}(k)=P_X(k)$, for $k=0,1,2,\cdots$ [Proof: later]

Example:
Let $X_1,X_2,X_3,\cdots$ be a sequence of random variables such that $X_n \sim \text{Binomial}(n,\frac{\lambda}{n})$, for $n \in \mathbb{N}, n>\lambda$, where $\lambda>0$ is a constant. Show that $X_n \stackrel{d}{\to} \text{Poisson}(\lambda).$
[One can thus approximate a Binomial probability with a Poisson.]

Proof:
By the previous theorem, we just need to prove $\lim\limits_{n \to \infty} P_{X_n}(k)=P_X(k)$, for $k=0,1,2,\cdots$.

We have $$\begin{align} \lim\limits_{n \to \infty} P_{X_n}(k)&=\lim\limits_{n \to \infty}\binom{n}{k}(\frac{\lambda}{n})^k(1-\frac{\lambda}{n})^{n-k}\\ &=\lambda^k \lim\limits_{n \to \infty}\frac{n!}{(n-k)!k!}(\frac{1}{n^k})(1-\frac{\lambda}{n})^{n-k}\\ &=\frac{\lambda^k}{k!} \lim\limits_{n \to \infty} \Big( [\frac{n(n-1)(n-2)\cdots(n-k+1)}{n^k}][(1-\frac{\lambda}{n})^n][(1-\frac{\lambda}{n})^{-k}] \Big)\\ &=\frac{\lambda^k}{k!} (1 \cdot e^{-\lambda} \cdot 1) \end{align}$$ We conclude $\lim\limits_{n \to \infty} P_{X_n}(k)=\frac{e^{-\lambda} \lambda^k}{k!}.\:\Box$

The most famous example of convergence in distribution is the central limit theorem (CLT). The CLT states that the normalized average of independent and identically distributed random variables $X_1,X_2,X_3,\cdots$ converges in distribution to a standard normal random variable.

Convergence in probability

Formal definition:
A sequence of random variables $X_1,X_2,X_3,\cdots$ converges in probability to a random variable $X$, denoted by $X_n \stackrel{p}{\to} X$, if $\lim\limits_{n \to \infty} P(|X_n-X|\geq \epsilon)=0$, for all $\epsilon >0$.
[Intuitively, we can think like this: having a zero probability of $X_n$ and $X$ being different implies $X_n$ "converges" to $X$.]

Example:
Let $X$ be a random variable and $X_n=X+Y_n$, where $E(Y_n)=\frac{1}{n}, Var(Y_n)=\frac{\sigma^2}{n}$, where $\sigma>0$ is a constant. Show that $X_n \stackrel{p}{\to} X$.

Proof:
Recall triangle inequality $|a+b| \leq |a|+|b| \: \forall \: a,b \in \mathbb{R}$.
Choosing $a=Y_n-E(Y_n)$ and $b=E(Y_n)$, we have $|Y_n| \leq |Y_n-E(Y_n)|+\frac{1}{n}$.

Now, for any $\epsilon >0$, we have $$\begin{align} P(|X_n-X| \geq \epsilon)&=P(|Y_n| \geq \epsilon)\\&\leq P(|Y_n-E(Y_n)|+\frac{1}{n} \geq \epsilon)\\&=P(|Y_n-E(Y_n)|\geq \epsilon-\frac{1}{n})\\& \leq \frac{Var(Y_n)}{(\epsilon-\frac{1}{n})^2}\;\;\text{by Chebyshev's inequality}\\&=\frac{\sigma^2}{n(\epsilon-\frac{1}{n})^2} \to 0\:\:\text{as}\:n\to \infty \end{align}$$ We conclude that $X_n \stackrel{p}{\to} X.\:\Box$

The most famous example of convergence in probability is the weak law of large numbers (WLLN). The WLLN states that if $X_1,X_2,X_3,\cdots$ are independent and identically distributed random variables random variables with mean $E(X_i)=\mu<\infty$, then the average sequence defined by $\large \bar{X_n}=\frac{X_1+X_2+\cdots+X_n}{n}$ converges in probability to $\mu$. It is called the weak law because it refers to convergence in probability while the strong law of large numbers (SLLN) is related to almost sure convergence (a stronger convergence).

Convergence in mean

Formal definition:
Let $r\geq 1$ be a fixed number. A sequence of random variables $X_1,X_2,X_3,\cdots$ converges in the rth mean or in the $L^r$ norm to a random variable $X$, denoted by $X_n \stackrel{L^r}{\to} X$, if $\lim\limits_{n \to \infty} E(|X_n-X|^r)=0$.
If $r=2$, it is called the mean-square convergence $X_n \stackrel{m.s.}{\to} X$.

Theorem:
If $X_n \stackrel{L^r}{\to} X$ for some $r \geq 1$, then $X_n \stackrel{p}{\to} X$.

Proof:
For any $\epsilon>0$, we have $P(|X_n-X| \geq \epsilon)=P(|X_n-X|^r \geq \epsilon^r) \leq \frac{E(|X_n-X|^r)}{\epsilon^r} \;\; \text{by Markov's inequality}$

Since $\lim\limits_{n \to \infty} E(|X_n-X|^r)=0$, we conclude $\lim\limits_{n \to \infty} P(|X_n-X| \geq \epsilon)=0$ for all $\epsilon>0.\:\Box$

Theorem:
Let $1\leq r\leq s$. If $X_n \stackrel{L^s}{\to} X$, then $X_n \stackrel{L^r}{\to} X$.

Proof:
Recall Hölder's inequality $E|XY| \leq (E|X|^p)^{\frac{1}{p}}(E|Y|^q)^{\frac{1}{q}}$ where $1<p,q<\infty$ and $\frac{1}{p}+\frac{1}{q}=1$. Choose $X=|X_n-X|^r, Y=1, p=\frac{s}{r}>1$.
Then $E|X_n-X|^r \leq (E|X_n-X|^s)^{\frac{1}{p}}$. Since $\lim\limits_{n \to \infty} E(|X_n-X|^s)=0$, we conclude $\lim\limits_{n \to \infty} E(|X_n-X|^r) \leq \lim\limits_{n \to \infty} (E(|X_n-X|^s)^{\frac{1}{p}}=0$ Therefore, $X_n \stackrel{L^r}{\to} X.\:\Box$

Almost sure convergence (convergence with probability 1)

Formal definition:
A sequence of random variables $X_1,X_2,X_3,\cdots$ converges almost surely to a random variable $X$, denoted by $X_n \stackrel{a.s.}{\to} X$, if $P(\{s\in S: \lim\limits_{n \to \infty} X_n(s)=X(s)\})=1$.

In some problems, proving almost sure convergence directly can be difficult. Here is a result that is sometimes useful when we would like to prove almost sure convergence.

Consider the sequence $X_1,X_2,X_3,\cdots$. If for all $\epsilon>0$, we have $\sum\limits_{n=1}^\infty P(|X_n-X|>\epsilon)<\infty$, then $X_n \stackrel{a.s.}{\to} X$.

Strong law of large numbers
Let $X_1,X_2,\cdots,X_n$ be independent and identically distributed random variables with expected value $E(X_i)=\mu <\infty$. Let also $M_n=\frac{X_1+X_2+\cdots+X_n}{n}$. Then $M_n \stackrel{a.s.}{\to} \mu$.

Useful:
Convergence concepts
Convergence in Distribution
Asymptotic theory
Measure

Tuesday, 16 June 2015

Complexification of vector spaces

Motivation to complexify real vector spaces:
One reason is to solve equations. If we want to prove theorems about real solutions to a system of real linear equations or a system of real linear differential equations, it can be convenient to examine the complex solution space, which gives us information about the real solution space.

Difference between direct sum and Cartesian product of two vector spaces
For the direct sum, the result is a new object of the same type: it has an algebraic structure. But Cartesian product is just a set. For example $R\oplus R$ is the set $R\times R$ with the operation $(a,b)+(c,d)=(a+c,b+d)$.

Complexifying with direct sums
Definition: The complexification of a real vector space $W$ over $\mathbb{C}$ is defined to be $W_\mathbb{C}=W\oplus W$, the set of all ordered pairs $(w_1,w_2)$, where $w_1,w_2 \in W$ obtained by formally extending scalar multiplication to include multiplication by complex numbers, with vector addition and multiplication by complex scalars defined as:

$(w_1+iv_1)+(w_2+iv_2)=(w_1+w_2)+i(v_1+v_2)$ for $w_1,v_1,w_2,v_2 \in W$
$(a+bi)(w_1,w_2)=(aw_1-bw_2,bw_1+aw_2)$ for $a,b \in \mathbb{R}, w_1,w_2 \in W$

This rule of multiplication is reasonable if you think about a pair $(w_1,w_2)\in W\oplus W$ as a formal sum of $w_1+iw_2$: $(a+bi)(w_1+iw_2)=aw_1+aiw_2+biw_1-bw_2=(aw_1-bw_2)+i(bw_1+aw_2)$.

In particular, $i(w_1,w_2)=(-w_2,w_1)$.
[$(a+bi)(w_1,w_2)"="a(w_1,w_2)+bi(w_1,w_2)$ For this expression to equal $(aw_1-bw_2)+i(bw_1+aw_2)$, we must have $i(w_1,w_2)=(-w_2,w_1)$.]

We can check that $W_\mathbb{C}$ is a complex vector space by the multiplication rule. Since $i(w,0)=(0,w)$, we have $(w_1,w_2)=(w_1,0)+(0,w_2)=(w_1,0)+i(w_2,0)$. The $\mathbb{R}$-linear function $w \mapsto (w,0)$ is called the standard embedding of $W$ into $W_\mathbb{C}$. With this, we can regard $W_\mathbb{C}$ as $W+iW$.

Examples:
The complexifications of $\mathbb{R}^n,M_n(\mathbb{R}),\mathbb{R}[X]$ are isomorphic to $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$ by sending an ordered pair $(w_1,w_2)$ in $\mathbb{R}^n \oplus \mathbb{R}^n,M_n(\mathbb{R})\oplus M_n(\mathbb{R}),\mathbb{R}[X]\oplus \mathbb{R}[X]$ to $w_1+iw_2$ in $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$.
[$\mathbb{R}[X]$ is the set of all polynomials with real coefficients in the variable $x$.]

Namely, we can identify the complexification $(\mathbb{R}^2)_\mathbb{C}$ with $\mathbb{C}^2$ by $(w_1,w_2)\mapsto w_1+iw_2$$\in \mathbb{R}^2+i\mathbb{R}^2=\mathbb{C}^2$ and this sends the basis vectors $\tiny\begin{pmatrix} 1\\0 \end{pmatrix}$ and $\tiny\begin{pmatrix} 0\\1 \end{pmatrix}$ of $\mathbb{R}^2 \subset (\mathbb{R}^2)_\mathbb{C}$ to $\tiny\begin{pmatrix} 1\\0 \end{pmatrix}$ and $\tiny\begin{pmatrix} 0\\1 \end{pmatrix}$ in $\mathbb{C}^2$, which are the standard basis vectors of $\mathbb{C}^2$ as a complex vector space.

Similarly, the identifications of $(\mathbb{R}^n)_\mathbb{C},M_n(\mathbb{R})_\mathbb{C},\mathbb{R}[X]_\mathbb{C}$ with $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$ turn every real basis of $\mathbb{R}^n,M_n(\mathbb{R}),\mathbb{R}[X]$ into a complex basis of $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$.

Denote the space of complex $n\times n$ matrices by $M_n(\mathbb{C})$, the space of anti-Hermitian $A^*=-A$ matrices by $u(n)$, the space of Hermitian matrices $A^*=A$ by $iu(n)$. Then $M_n(\mathbb{C})=u(n)\oplus iu(n)$. Thus $u(n)$ is a real form in $M_n(\mathbb{C})$ and $M_n(\mathbb{C})$ is a complexification of $u(n)$.

Definitions:
A map $T$ between complex vector spaces is said to be $\mathbb{C}$-linear if $T(\lambda w)=\lambda T(w)$ for all $w$ in the domain in $T$ and for all $\lambda \in \mathbb{C}$.
A map $T$ between complex vector spaces is said to be $\mathbb{C}$-antilinear or conjugate linear if it is additive and if $T(\lambda w)=\overline{\lambda}T(w)$ for all $w$ in the domain in $T$ and for all $\lambda \in \mathbb{C}$.

Theorem: If $W=0$, then $W_\mathbb{C}=0$. If $W\neq 0$ and $\{e_1,\cdots,e_n\}$ is an $\mathbb{R}$-basis of $W$, then $\{(e_1,0),\cdots,(e_n,0)\}$ is a $\mathbb{C}$-basis of $W_\mathbb{C}$. In particular, $\text{dim}_\mathbb{C}(W_\mathbb{C})=\text{dim}_\mathbb{R}(W)$ for all $W$.

Proof: We want to prove $\{(e_1,0),\cdots,(e_n,0)\}$ is a basis for $W_\mathbb{C}$: spanning set & linearly independence.
Suppose $W\neq 0$ and $\{e_1,\cdots,e_n\}$ is an $\mathbb{R}$-basis of $W$.
Spanning:
If $\{e_1,\cdots,e_n\}$ is an $\mathbb{R}$-basis of $W$, then $\{(e_1,0),\cdots,(e_n,0),(0,e_1),\cdots,(0,e_n)\}$ is a basis for the real vector space $W\oplus W$.
Let $w\in W_\mathbb{C}$. Using the basis for $W\oplus W$, there exist $a_1,\cdots,a_n,b_1,\cdots,b_n \in \mathbb{R}$ such that
$\begin{align}w&=a_1(e_1,0)+\cdots+a_n(e_n,0)+b_1(0,e_1)+\cdots+b_n(0,e_n) & \qquad [*]\\
&=a_1(e_1,0)+\cdots+a_n(e_n,0)+ib_1(e_1,0)+\cdots+ib_n(e_n,0) &[ib(e_j,0)=b(0,e_j)]\\
&=(a_1+ib_1)(e_1,0)+\cdots+(a_n+ib_n)(e_n,0) & \qquad [**]\end{align}$
$[*]: \mathbb{R}$-linear combination of $(e_j,0)$ and $(0,e_j)$
$[**]: \mathbb{C}$-linear combination of $(e_j,0)$
Therefore, $\{(e_1,0),\cdots,(e_n,0)\}$ is a $\mathbb{C}$-linear spanning set of $W_\mathbb{C}$.
Linearly independence:
Suppose we can write $(0,0)$ as a finite $\mathbb{C}$-linear combination of $(e_j,0)$.
$(a_1+ib_1)(e_1,0)+\cdots+(a_n+ib_n)(e_n,0)=(0,0)$ for some real $a_i,b_i\;\;i=1,\cdots,n$
Equivalently, $(a_1e_1+\cdots+a_ne_n,b_1e_1+\cdots+b_ne_n)=(0,0).\qquad [bi(e_j,0)=(0,be_j)]$
Therefore, $\sum\limits_{i=1}^n a_ie_i=0$ and $\sum\limits_{i=1}^n b_ie_i=0$ in $W$.
Since $e_j$'s are linearly independent over $\mathbb{R}$, all the coefficients $a_j$ and $b_j$ are 0, so $a_j+ib_j=0$ for all $j=1,\cdots,n\:\Box$.

Universal property of complexification
Theorem: For any $\mathbb{R}$-linear map $W\xrightarrow{f}V$ from $W$ into a complex vector space $V$, there is a unique $\mathbb{C}$-linear map $W_\mathbb{C}\xrightarrow{\tilde{f}}V$ making the diagram

$\;\; W \xrightarrow{\quad \quad} W_\mathbb{C}\\ \quad f \searrow \quad \nearrow \tilde{f}\\ \qquad\quad V$

commute, where the map $W\to W_\mathbb{C}$ is the standard embedding.

This theorem says $W\to W_\mathbb{C}$ is an initial object that admits a unique morphism to all other objects.

Complexification of linear transformation
Theorem: Every $\mathbb{R}$-linear transformation $\varphi:W\to W'$ of real vector spaces extends in a unique way to a $\mathbb{C}$-linear transformation of the complexifications: there is a unique $\mathbb{C}$-linear map $\varphi_\mathbb{C}:W_\mathbb{C}\to W'_\mathbb{C}$ making the diagram

$\qquad \qquad \;W\xrightarrow{\quad\;\varphi\;\quad}W'\\ w\mapsto (w,0) \Bigg\downarrow \qquad\qquad \Bigg\downarrow \varphi(w)\mapsto (\varphi(w),0) \\ \qquad \qquad W_\mathbb{C} \xrightarrow{\quad \varphi_\mathbb{C} \quad} W'_\mathbb{C}$

commute, where the vertical maps are the standard embeddings of real vector spaces into their complexifications.

Proof: If such a $\mathbb{C}$-linear map $\varphi_\mathbb{C}$ exists, then the commutativity of the diagram says $\varphi_\mathbb{C}(w,0)=(\varphi(w),0)$ for all $w\in W$. Therefore, for $(w_1,w_2)\in W_\mathbb{C}$, we have

$\begin{align} \varphi_\mathbb{C}(w_1,w_2)&=\varphi_\mathbb{C}(w_1,0)+\varphi_\mathbb{C}(0,w_2)\\
&=(\varphi_\mathbb{C}(w_1),0)+i\varphi_\mathbb{C}(w_2,0)\\
&=(\varphi(w_1),0)+i(\varphi(w_2),0)\\
&=(\varphi(w_1),0)+(0,\varphi(w_2))\\
&=(\varphi(w_1),\varphi(w_2)) \end{align}$

This tells us what $\varphi_\mathbb{C}$ must be. So define $\varphi_\mathbb{C}:W_\mathbb{C}\to W'_\mathbb{C}$ by $\varphi_\mathbb{C}(w_1,w_2)=(\varphi(w_1),\varphi(w_2))$. We need to check that $\varphi_\mathbb{C}$ is $\mathbb{C}$-linear. Now $\varphi_\mathbb{C}(w_1,w_2)=\varphi_\mathbb{C}(w_1,0)+i\varphi_\mathbb{C}(w_2,0)=(\varphi(w_1),0)+i(\varphi(w_2),0)=\varphi(w_1)+i\varphi(w_2).\:\Box$

In other words, to an operator $\varphi:W\to W'$, there corresponds an operator $\varphi_\mathbb{C}:W_\mathbb{C}\to W'_\mathbb{C}$ given by the formula $\varphi_\mathbb{C}(a+ib)=\varphi(a)+i\varphi(b)$. The operator $\varphi_\mathbb{C}$ is called the complexification of $\varphi$.

If $\{w_1,\cdots,w_n\}$ is a basis for $W$, $\{w'_1,\cdots,w'_m\}$ is a basis for $W'$, and $A$ is the matrix representation of $\varphi$ with respect to these bases, then $A$, regarded as a complex matrix, is also the representation of $W_\mathbb{C}$ with respect to the corresponding bases in $W_\mathbb{C}$ and $W'_\mathbb{C}$.

So the complexification process is a formal, coordinate-free way of saying: take the matrix $A$ of $\varphi$, with its real entries, but operate on it as a complex matrix. The advantage of making this abstracted definition is that we are not required to fix a choice of coordinates and use matrix representations. Say we want to make arguments about the complex eigenvalues and eigenvectors for a transformation $T:W\to W'$, while non-real eigenvalues and eigenvectors, by definition, cannot exist for a transformation between real vector spaces. What we really mean are the eigenvalues and eigenvectors of $\varphi_\mathbb{C}$. Also, the complexification process generalizes without change for infinite-dimensional spaces.

Complexifying with tensor products [pending]

Conjugations
In the complexification $W_\mathbb{C}$ of a real vector space $W$, we can define complex conjugation by $\overline{w}=\overline{u+iv}=u-iv$.

In an arbitrary complex vector space, however, there is no natural, basis-independent way of defining complex conjugation of vectors. For example, if we set the complex conjugate of a vector $u=(a+bi)e_j$ to be $\overline{u}=(a-bi)e_j$, this definition will give a different answer in the basis $\{ie_j\}$ since $u=[-i(a+bi)](ie_j) \Rightarrow \overline{u}=[i(a-bi)](ie_j)=-(a-bi)e_j$.

Thus the concept of complex conjugation of vectors requires prior knowledge of the 'real part' of a complex vector space. The complexification of a real space has precisely the required extra structure needed to define complex conjugation of vectors.

Later, we will use conjugation to describe the real subspaces which have the given complex vector space as its complexification.

Realification/decomplexificaction
One way of creating a real vector space $W_\mathbb{R}$ from a complex vector space $W$ is to forget altogether about the possibility of multiplying vectors by complex numbers and only allow scalar multiplication with real numbers. In abstract algebra, we call this the restriction of scalars from complex numbers to real numbers. In this process a pair of vectors $u$ and $iu$ must be regarded as linearly independent vectors in $W_\mathbb{R}$ for any non-zero vector $u\in W$. Thus if $W$ is finite-dimensional and $\text{dim}W=n$, then $W_\mathbb{R}$ is $2n$-dimensional, namely $\text{dim}_\mathbb{R} W=2\text{dim}_\mathbb{C}W$, for if $\{e_1,\cdots,e_n\}$ is a basis of $W$ then $\{e_1,\cdots,e_n,ie_1,\cdots,ie_n\}$ is a linearly independent set of vectors spanning $W_\mathbb{R}$.

We can verify that if $A=B+iC$ is the matrix of a linear map $A:V\to W$ with respect to bases $\{e_1,\cdots,e_n\}, \{\epsilon_1,\cdots,\epsilon_m\}$ and the matrices $B$ and $C$ are real, then the matrix of the linear map $A_\mathbb{R}$ with respect to the bases $\{e_1,\cdots,e_n\},\{ie_1,\cdots,ie_n\}$ and $\{\epsilon_1,\cdots,\epsilon_m\},\{i\epsilon_1,\cdots,i\epsilon_m\}$ is of the form $\begin{pmatrix} B & -C \\ C & B \end{pmatrix}$.

Example:
Say $n=2$, if $\{a+ib,c+id\}$ is a basis for a complex vector space $W$, then $\{(a,b,0,0),(-b,a,0,0),(0,0,c,d),(0,0,-d,c)\}$ is the associated basis for $W_\mathbb{R}=\mathbb{R}^{2\times2}=\mathbb{R}^4$.

More to explore:
Complexification of Lie algebras – a tool useful for representation theory
Complex spectral theorem – deducing a real classification for orthogonal matrices
Spliting characteristic polynomials
Extension of scalars & restriction of scalars

References:
Complexification
More to explore
Difference between Cartesian product and tensor product of two vector spaces
[?] Change of basis matrix
Complexification and realification [p3,5]
A Course in Modern Mathematical Physics Groups, Hilbert Space and Differential Geometry by Peter Szekeres [p154-156]
Ordinary Differential Equations by Vladimir I. Arnol'd [p177-189]
Problems and Theorems in Linear Algebra by Viktor Vasil' evich Prasolov [p55-57]

Sunday, 14 June 2015

Morphisms

General maps
one-to-oneinjective
ontosurjective
any mapbijective

Linear maps
any maphomomorphism
one-to-onemonomorphism
ontoepimorphism
one-to-one and ontoisomorphism

Linear self-maps
any mapendomorphism
one-to-one and ontoautomorphism

Examples of isomorphisms and maps that identifies the isomorphism:
  • $M_{1,3}(\Bbb F) \cong M_{3,1}(\Bbb F)\\ (x_1,x_2,x_3)\mapsto \begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}$

  • $M_{2,2}(\Bbb F) \cong \Bbb F^4\\ \begin{pmatrix}a&b\\ c&d \end{pmatrix}\mapsto (a,b,c,d)$

  • $M_{2,3}(\Bbb F) \cong M_{3,2}(\Bbb F)\\ A\mapsto A^T$

  • The plane $z=0$ in $\Bbb R^3$ is isomorphic to $\Bbb R^2$.
    $(x,y,0)\mapsto (x,y)$

  • $P_n\cong \Bbb R^{n+1}\\ a_0+a_1x+\cdots+a_nx^n\mapsto (a_0,a_1,\cdots,a_n)$

  • $P$ is isomorphic to $\Bbb R_0^\infty$.
    $a_0+a_1x+\cdots+a_nx^n\mapsto (a_0,a_1,\cdots,a_n,0,0,\cdots)$

  • $M_{m,n}(\Bbb F) \cong L(\Bbb F^n, \Bbb F^m)$
    $A\mapsto L_A$, where $L_A(x)=Ax$.

  • Any vector space $V$ of dimension $n$ is isomorphic to $\Bbb F^n$.
    $v\mapsto [v]_\alpha$, where $\alpha$ is a basis for $V$.

Category theory
A morphism $f:A\to B$ in a category $C$ is an isomorphism if there exists a morphism $g:B \to A$ such that both ways to compose $f$ and $g$ give the identity morphisms on the respective objects, namely $$gf=1_A, \quad fg=1_B.$$
A morphism $f:A\to B$ in a category $C$ is an monomorphism if for every object $C$ and every pair of morphisms $g,h:C\to A$, the condition $fg=fh$ implies $g=h$. Namely, the following diagram commutes:


When we say a diagram commutes, it means all ways to compose morphisms result in an identical morphisms. Therefore, whenever $f$ is an monomorphism and this diagram commutes, we can conclude $g=h$. The key idea is that monomorphisms allow us “cancel” $f$ from the left side of a composition.

The corresponding property for cancelling on the right is defined similarly.

A morphism $f:A\to B$ in a category $C$ is an epimorphism if for every object $C$ and every pair of morphisms $g,h:B\to C$, the condition $gf=hf$ implies $g=h$.


Whenever $f$ is an epimorphism and this diagram commutes, we can conclude $g=h$.

Reference:
Examples of isomorphisms
Properties of morphisms
An awesome video on morphisms

Thursday, 11 June 2015

Informal combinatorial proofs

Pascal's Triangle


A term in the middle is the sum of two terms above it.
$\binom{n}{k}+\binom{n}{k+1}=\binom{n+1}{k}$



The sum of all elements in each row is $2^n$.
$\sum\limits_{k=0}^n \binom{n}{k}=2^n$

Binomial theorem
$\sum\limits_{k=0}^n \binom{n}{k} x^k=(1+x)^n$
Put $x=1$, then $\sum\limits_{k=0}^n \binom{n}{k}=2^n$.
Put $x=2$, then $\sum\limits_{k=0}^n 2^k \binom{n}{k}=3^n$.

Bijections
Given a set $X$ and a non-negative integer $r$, an r-subset of $X$ is a subset $A \subset X$ of cardinality $r$. We denote the set of r-subsets of $X$ by $P_r(X)$, namely, $P_r(X)=\{A \subset X|\: |A|=r\}$. Define the binomial coefficient to be the cardinality of the set $P_r(X)$ when $|X|=n$.
There is a bijection $P_r(X) \to P_{n-r}(X)$ given by mapping each subset to its complement $A \mapsto A^c$. Hence $\binom{n}{r}=\binom{n}{n-r}$.
[...]

Combinatorial arguments
Key idea: count the same quantity in two different ways

$n^2=(n-1)^2+2(n-1)+1$
We count the number of pairs $(i,j)$ where $1 \leq i,j \leq n$, which is $n^2$. We subdivide the pairs in regard to how many 1s they have. There are $(n-1)^2$ pairs with no 1s; $2(n-1)$ pairs with one 1s; 1 pair with two 1s. Hence the result. In the same vein, we have $n^3=(n-1)^3+3(n-1)^2+3(n-1)+1$.

Remark: We can use the same situation to prove another identity, $1+3+5+\cdots+(2n-1)=n^2$. We classify the pairs according to the value of $\text{max}(i,j)$. There is one pair $(1,1)$ with $\text{max}(i,j)=1$ and three pairs $(1,2), (2,1), (2,2)$ with $\text{max}(i,j)=2$. Similarly, there are five pairs when $\text{max}(i,j)=3$; seven pairs with max value $4$; $(2n-1)$ pairs with max value $n$.

$\binom{m+n}{2}=\binom{m}{2}+\binom{n}{2}+mn$
We count the number of ways of selecting two numbers from a set $S$ with $m+n$ elements. We partition $S$ into two subsets $A$ and $B$ with $m$ and $n$ elements respectively. Then, either both the two elements are from A or B, or there is one each from A and B. Hence the stated identity.

Stars and bars
The number of solutions to the equation $a_1+a_2+\cdots+a_n=k, a_i \geq 0, a_i \in \mathbb{N}$ is $\binom{k+n-1}{k}$.
Put differently, an n-set has $\binom{k+n-1}{k}$ multisets of size $k$.

The problem is the same as finding the number of ways of placing $k$ undistinguishable objects (denoted by $*$) in $n$ numbered boxes where $a_i$ represents the number of objects in box $i$. We now put $k$ objects in a row.
$******$
We use vertical bars $|$ to represent the boxes, meaning the separation between two consecutive boxes. Hence we need $n-1$ bars. For example, in the arrangement $**|***|*$, we have 3 boxes of which the first has 2 objects, the second has 3 objects and the last has 1 object.
In total there will be $k+n−1$ positions with stars or bars. Therefore, the number of solutions is $\binom{k+n-1}{k}$. [...]

Useful:
http://www.maths.qmul.ac.uk/~pjc/notes/comb.pdf
http://www.millersville.edu/~rumble/Math.310/pascal3.pdf
http://www.erp.sophia.ac.jp/Projects/ese/notes/rmm/lecture4.pdf
https://math.dartmouth.edu/~m68f13/lectec.pdf

Friday, 5 June 2015

Four fundamental subspaces

Prerequisites

Orthogonal vectors & subspaces
Two vectors $v_1$ and $v_2$ are orthogonal if $v_1^T v_2=\left\lVert v_1 \right\rVert \left\lVert v_2 \right\rVert \cos\theta=0$.
Two subspaces $M_1$ and $M_2$ are orthogonal if $v_1 \bot v_2$ for all $v_1 \in M_1$ and $v_2 \in M_2$.

Orthogonal complement
The orthogonal complement to a subspace $M \subset R^n$ is defined by $M^\bot=\{w \in R^n: w^T v=0 \: \forall v \in M \}$.
$M^\bot$ is the largest (unique) subspace orthogonal to $M$.
The following holds:
$M^\bot$ is a subspace.
$R^n=M \oplus M^\bot$
If $\text{dim}\:M=k$, then $\text{dim}\:M^\bot=n-k$.
$(M^\bot)^\bot=M$



The four fundamental subspaces

Definitions
$A=\begin{pmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \cdots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{pmatrix}=\begin{pmatrix} \vec{a_1} & \cdots & \vec{a_n} \end{pmatrix}=\begin{pmatrix} \overline{a}_1^T \\  \overline{a}_2^T \\  \vdots \\ \overline{a}_m^T \end{pmatrix}$

Column space
$C(A)=\{Ax\:|\:x \in R^n \}=\{ \sum_{j=1}^n \vec{a_j}x_j\:|\:x_j \in R, j=1,2,\cdots,n\}$

Nullspace
$N(A)=\{x \in R^n\:|\:Ax=0\}=\{x \in R^n\:|\:\overline{a}_i^T x=0, i=1,2,\cdots,m\}$

Row space
$C(A^T)=\{A^Tu\:|\:u \in R^m \}=\{ \sum_{i=1}^m \overline{a}_i u_i\:|\:u_i \in R, i=1,2,\cdots,m\}$

Left nullspace
$N(A^T)=\{u \in R^m\:|\:A^T u=0\}=\{u \in R^m\:|\:\vec{a_j}^T u=0, j=1,2,\cdots,n \}=\{u \in R^m\:|\:u^T A=0\}$

The left nullspace is the space of all vectors y such that $A^Ty=0$, or equivalently $y^TA=0$. Thus the term left nullspace.

Geometric picture [later]

Relation to RREF [later]

Summary

Let $A$ be an $m\times n$ matrix, and $r$ be its rank.

subspace notation subspace of dimension
row space $C(A^T)$ $R^n$ $r$
column space $C(A)$ $R^m$ $r$
nullspace $N(A)$ $R^n$ $n-r$
left nullspace $N(A^T)$ $R^m$ $m-r$

$\text{dim}\:C(A^T)+\text{dim}\:N(A)=\text{dim}\:R^n\\
\text{dim}\:C(A)+\text{dim}\:N(A^T)=\text{dim}\:R^m$



Fundamental Theorem of Linear Algebra

Let A be an $m \times n$ matrix.
1. $N(A)$ is the orthogonal complement of $C(A^T)$ in $R^n$.
2. $N(A^T)$ is the orthogonal complement of $C(A)$ in $R^m$.
3. $N(A) \oplus C(A^T)=R^n$
4. $N(A^T) \oplus C(A)=R^m$

The following orthogonality relations hold:
$C(A)^\bot=N(A^T)\\
N(A^T)^\bot=C(A)\\
C(A^T)^\bot=N(A)\\
N(A)^\bot=C(A^T)$

Explanations:

Orthogonal subspaces
We will examine an $m \times n$ matrix $A$. Then both the row space and the nullspace are subspaces of $R^n$. [$A^T$ is $n \times m$ and $u$ is $m \times 1$. By matrix multiplication, the row space is $n \times 1$.] Suppose $x$ is a vector in the nullspace of $A$. Then $Ax=0$. From the definition of matrix multiplication, we have $$\begin{pmatrix} \text{row 1} \\ \text{row 2} \\ \vdots \\ \text{row m} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{pmatrix}=0.$$ As the row space is the linear combination of the rows, the dot product of $x$ with any vector in the row space must be $0$. So if $v\in C(A^T)$ and $w\in N(A)$, we must have $v^T w=0$ Therefore, the row space and the nullspace of $A$ are orthogonal.

Similarly, every vector in the left nullspace of $A$, $N(A^T)$, is perpendicular to every vector in the column space of $A$, $C(A)$. Hence, the column space of A and the left nullspace of A are orthogonal.

Orthogonal complements
We have established that the row space and the nullspace are orthogonal. It turns out the nullspace is in fact the orthogonal complement of the row space. The dimension of the nullspace is $n-r$, where $r$ is the rank (dimension of row space) of $A$. If there were a vector orthogonal to the row space but not to the nullspace, then the dimension of $C(A^T)^\bot$ would be at least $n-r+1$. But this would be too large for both $C(A^T)$ and $C(A^T)^\bot$ to fit in $R^n$. Therefore, the nullspace is the largest subspace perpendicular to the row space, and we have $C(A^T)^\bot=N(A)$. Using similar arguments, we have $N(A)^\bot=C(A^T)$.

What is so useful about complements is that we can use them to break down vectors into components. Namely, for an $m \times n$ matrix $A$, any vector $x \in R^n$ can be written as the sum of a component $x_r$ in the row space and a component $x_n$ in the nullspace: $x=x_r+x_n$.

Now, let's consider $Ax=b$. We claim that every output vector $b$ comes from one and only one vector in the row space. Write $x=x_r+x_n$. Then $Ax=Ax_r+Ax_n=Ax_r+0=Ax_r$. Thus, we know there is a vector $x_r$ in the row space such that $Ax_r=b$. Suppose $x_r'$ is another vector in the row space such that $Ax_r'=b$. Note that $x_r-x_r'$ is in row space since it is the difference of two vectors in the row space. Now $A(x_r-x_r')=Ax_r-Ax_r'=b-b=0$. Then $x_r-x_r'$ is in the nullspace. As the row space and nullspace are orthogonal complements, the only vector in both of them is $0$. So, $x_r-x_r'=0$ and $x_r=x_r'.\:\Box$

Relation to domain and codomain
Consider an $m \times n$ matrix. The matrix $A$ maps from $R^n$ to $R^m$; we say that its domain is $R^n$ and its codomain is $R^m$. Since a vector in the domain either maps to zero or elsewhere, we can say that the domain includes the nullspace of $A$ and the preimage of the range of $A$. Also, the codomain of $A$ includes the range of $A$ and the part we cannot reach with $A$. Now, the matrix $A^T$ maps from $R^m$ to $R^n$. We can say that the range of $A^T$ is the preimage of the range of $A$ and that the nullspace of $A^T$ is the subspace we cannot reach with $A$.

To sum up, we have the four fundamental subspaces as
• an orthogonal decomposition of the domain of $A$:
nullspace of $A$ $\oplus$ range of $A^T$
• an orthogonal decomposition of the codomain of $A$:
range of $A$ $\oplus$ nullspace of $A^T$

Useful:
http://xoax.net/math/crs/linear_algebra_mit/lessons/Lecture10/

Reference:
https://rip94550.wordpress.com/2009/09/07/linear-algebra-the-four-fundamental-subspaces/

Thursday, 4 June 2015

Spanning and linearly independence

A basis of a vector space V is subset $\{v_1,\cdots,v_n\} \subset V$ satisfying the two properties:
(i) $\{v_1,\cdots,v_n\}$ spans V;
(ii) $\{v_1,\cdots,v_n\}$ is linearly independent.

(i) To show spanning: Take an arbitrary vector $v \in V$, and show that $v$ can be written as a linear combination of $v_1, \cdots,v_n$, namely, $v=a_1v_1+\cdots+a_nv_n$.

(ii) To show linearly independence: Prove that $a_1v_1+\cdots+a_nv_n=0$ implies $a_1=a_2=\cdots=a_n=0$.

Let $au_1(x)+bu_2(x)=0$. Differentiating $\Rightarrow au_1'(x)+bu_2'(x)=0$.
If $W(u_1,u_2)=\begin{vmatrix} u_1 & u_2 \\ u_1' & u_2' \end{vmatrix}=u_1u_2'-u_2u_1' \neq 0$, then $a=b=0$ and $u_1$ and $u_2$ are linearly independent.
Note: $W(u_1, u_2)$ is known as the Wronskian determinant of functions $u_1$ and $u_2$.

Examples:
Spanning

$1$ and $i$ are linearly independent in $\mathbb{C}$ regarded as a vector space over $\mathbb{R}$ but they are linearly dependent if $\mathbb{C}$ is regarded as a complex vector space.



$u_1(x)=\sin x$ and $u_2(x)=\cos x$ are linearly independent.

Proof: Let $a\sin x +b\cos x=0$.
Differentiating $\Rightarrow a\cos x-b\sin x=0$.
So $a=b \sin x (\cos x)^{-1} \Rightarrow b(\sin^2 x (\cos x)^{-1}+\cos x)=0$
$\Rightarrow b (\cos x)^{-1}=0 \Rightarrow b=0$ Thus $a=b=0$.

Alternatively, $W(u_1,u_2)=u_1u_2'-u_2u_1'=-\sin^2 x-\cos^2 x=-1 \Rightarrow$ linear independence.



Show that the functions $x, xe^x, e^{-x}$ are linearly independent in the vector space $C^\infty(\mathbb{R})$.

Proof: Suppose $ax+bxe^x+ce^{-x}=0$ for all $x \in \mathbb{R}$, where $a,b,c$ are constants.

Method I – Differentiation:
$\begin{align}\: & ax+bxe^x+ce^{-x}=0 \; &(1)\\
\text{Differentiate both sides:}\: & a+be^x+bxe^x-ce^{-x}=0 \; &(2)\\
\text{Differentiate again:}\: & 2be^x+bxe^x+ce^{-x}=0 \; &(3)\\
\text{Differentiate yet again:}\: & 3be^x+bxe^x-ce^{-x}=0 \; &(4)\\
\text{And again:}\: & 4be^x+bxe^x+ce^{-x}=0 \; &(5) \end{align}$

$(5)-(3) \Rightarrow 2be^x=0 \Rightarrow b=0$
Substitute $b=0$ in $(3) \Rightarrow c=0$.
Substitue $b=c=0$ in $(2) \Rightarrow a=0$.

Method II – Limit:
For any $x \neq 0$ divide both sides by $xe^x$. Then, $ae^{-x}+b+cx^{-1}e^{-2x}=0$
L.H.S approaches $b$ as $x \to +\infty \Rightarrow b=0$.
Now $ax+ce^{-x}=0$ for all $x \in \mathbb{R}$.
For any $x \neq 0$ divide both sides by x. Then $a+cx^{-1}e^{-x}=0$.
L.H.S. approaches $a$ as $x \to +\infty \Rightarrow a=0$.
Now $ce^{-x}=0 \Rightarrow c=0$.

Linear Independence in function spaces



Applications in ordinary differential equations [later]

Reference:
Linear independence

Tuesday, 2 June 2015

Direct sum

The sum of two subspaces is the subspace $U+V=\text{span}(U \cup V)=\{u+v \in W\:|\:u \in U \wedge v \in V\}$
If $U \cap V=\{\boldsymbol{0}\}$, then the sum is called the direct sum and is denoted by $U \oplus V$.

Put differently, $W=U\oplus V$ iff for every $w \in W$ there exist unique vectors $u \in U$ and $v \in V$ such that $w=u+v$.

If $U_i \leq W$ for $1 \leq i \leq k$, then $W=U_1\oplus \cdots \oplus U_k$ iff $W=U_1+U_2+\cdots+U_k$ and $U_r \cap \sum\limits_{i \neq r} U_i=\{0\}$ for $1 \leq r \leq k$.
Note: It is not sufficient that $U_i \cap U_j=\{0\}$ whenever $i \neq j$.

Examples:
The complex numbers are the direct sum of the real and purely imaginary numbers. $\mathbb{C}=\mathbb{R}\oplus i\mathbb{R}$

$\langle e_1 \rangle+\langle e_2,e_3 \rangle=\langle e_1 \rangle \oplus \langle e_2,e_3 \rangle$ is a direct sum but $\langle e_1,e_2 \rangle+\langle e_2,e_3 \rangle$ is not a direct sum since $\langle e_1,e_2 \rangle \cap \langle e_2,e_3 \rangle= \langle e_2 \rangle \neq \{\boldsymbol{0} \}$.

Any function is a sum of even and odd function.
$E=\{f \in F(\mathbb{R})|f(t)=f(-t)\},O=\{f \in F(\mathbb{R})|f(t)=-f(-t)\}$
$E\oplus O=F(\mathbb{R})$
$f(t)=\frac{1}{2}[f(t)+f(-t)-f(-t)+f(t)]=\frac{1}{2}[f(t)+f(-t)]+\frac{1}{2}[f(t)-f(-t)]$

Denote the vector space of square matrices by $M_{n \times n}(\mathbb{R})$.
Denote by $M_{n \times n}(\mathbb{R})_+, M_{n \times n}(\mathbb{R})_-$ the vector subspaces of symmetric and skew-symmetric matrices.
$M_{n \times n}(\mathbb{R})=M_{n \times n}(\mathbb{R})_+\oplus M_{n \times n}(\mathbb{R})_-$
Let $X \in M_{n \times n}(\mathbb{R})$. Then $X_+=\frac{1}{2}(X+X^T)$ and $X_-=\frac{1}{2}(X-X^T)$.
$X=\frac{1}{2}(X+X^T)+\frac{1}{2}(X-X^T)$

Complex square matrices is the direct sum of Hermitian and skew-Hermitian matrices.
Denote the vector space of complex square matrices by $M_{n \times n}(\mathbb{C})$.
Denote by $M_{n \times n}(\mathbb{C})_+, M_{n \times n}(\mathbb{C})_-$ the vector subspaces of Hermitian and skew-Hermitian matrices.
$M_{n \times n}(\mathbb{C})=M_{n \times n}(\mathbb{C})_+\oplus M_{n \times n}(\mathbb{C})_-$
Let $A \in M_{n \times n}(\mathbb{C})$. Then $A_+=\frac{1}{2}(A+A^*)$ and $A_-=\frac{1}{2}(A-A^*)$.
$A=\frac{1}{2}(A+A^*)+\frac{1}{2}(A-A^*)$

Important idea:
If $U \cap V=\{ \boldsymbol{0} \}$ and $U, V$ are subspaces of a vector space with bases $\{u_1, \cdots, u_k\}$ and $\{v_1, \cdots, v_l\}$ respectively, then $\{u_1, \cdots, u_k, v_1, \cdots, v_l \}$ is a basis for $U \oplus V$. Therefore, knowing that $X \oplus Y=X+Y$ allows us to find a basis of $X+Y$ easily: we just find the union of bases of $X$ and $Y$. Namely, if $V=U_1 \oplus U_2 \oplus \cdots \oplus U_k$ and $B_i$ is a basis of $U_i$ then $B_1 \cup B_2 \cup \cdots \cup B_k$ is a basis of V. In particular, $\text{dim}\:V=\sum\limits_{i=1}^k \text{dim}\: U_i$.

Direct sum of vectors
Any vector $\vec{v}\in V$ can be written as a linear combination of the basis vectors, $\vec{v}=v_1e_1+\cdots+v_ne_n$. We normally express this fact in the form of a column vector, $\vec{v}=(v_1,v_2,\cdots,v_n)^T$. Similarly, for any vector $\vec{w}$, we can write $\vec{w}=(w_1,w_2,\cdots,w_m)^T$. Thus, $\vec{v}=(v_1,v_2,\cdots,v_n,0,0,\cdots,0)^T$ and $\vec{w}=(0,0,\cdots,0,w_1,w_2,\cdots,w_m)^T$. Then a direct sum of two non-zero vectors is $\vec{v}\oplus \vec{w}=(v_1,v_2,\cdots,v_n,w_1,w_2,\cdots,w_m)^T$.

Block-diagonal form
A matrix is actually a linear map from a vector space to another vector space. Let's consider the endomorphism $A:V\to V, \vec{v}\mapsto A\vec{v}$ or $\begin{pmatrix} v_1\\v_2\\ \vdots\\ v_n\end{pmatrix}\mapsto \begin{pmatrix} a_{21}&a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n1}&a_{n2}&\cdots&a_{nn}\end{pmatrix}\begin{pmatrix} v_1\\v_2\\ \vdots\\ v_n\end{pmatrix}.$ Similarly, $B:W\to W, \vec{w}\mapsto B\vec{w}$. On the direct sum space $V\oplus W$, the matrix $A\oplus B$ still acts on the vectors such that $\vec{v}\mapsto A\vec{v}$ and $\vec{w}\mapsto B\vec{w}$. This is achieved by lining up all matrix elements in a block-diagonal form, $$A\oplus B=\begin{pmatrix}A&0_{n\times m}\\ 0_{m\times n}&B\end{pmatrix}.$$
For example, if $n=2,m=3$, and $A=\begin{pmatrix}a_{11}&a_{12}\\a_{21}&a_{22}\end{pmatrix},B=\begin{pmatrix}b_{11}&b_{12}&b_{13}\\b_{21}&b_{22}&b_{23}\\b_{31}&b_{32}&b_{33}\end{pmatrix}$, then $$A\oplus B=\begin{pmatrix}a_{11}&a_{12}&0&0&0\\a_{21}&a_{22}&0&0&0\\0&0&b_{11}&b_{12}&b_{13}\\0&0&b_{21}&b_{22}&b_{23}\\0&0&b_{31}&b_{32}&b_{33}\end{pmatrix}.$$
When you act it on $\vec{v}\oplus \vec{w}$, $$(A\oplus B)(\vec{v}\oplus \vec{w})=\begin{pmatrix}A&0_{n\times m}\\ 0_{m\times n}&B\end{pmatrix}\begin{pmatrix}\vec{v}\\ \vec{w}\end{pmatrix}=\begin{pmatrix}A\vec{v}\\B\vec{w}\end{pmatrix}=(A\vec{v})\oplus(B\vec{w}).$$ If you have two matrices, their multiplications are done on each vector space separately, $$(A_1 \oplus B_1)(A_2\oplus B_2)=(A_1A_2)\oplus (B_1B_2).$$
Note that not every matrix on $V\oplus W$ can be written as a direct sum of a matrix on $V$ and another on $W$. There are $(n+m)^2$ independent matrices on $V\oplus W$, while there are only $n^2$ and $m^2$ matrices on $V$ and $W$, respectively. Remaining $(n+m)^2-n^2-m^2=2mn$ matrices cannot be written as a direct sum.

Other useful formulae are $\text{det}(A\oplus B)=\text{det}(A)\text{det}(B)$ and $\text{Tr}(A\oplus B)=\text{Tr}(A)+\text{Tr}(B)$.

More to learn:
geometric picture of sum of two subspaces and examples
http://algebra.math.ust.hk/vector_space/13_direct_sum/lecture2.shtml

more on direct sum
https://math.dartmouth.edu/archive/m22s02/public_html/VectorSpaces.pdf

examples
http://mathwiki.ucdavis.edu/Algebra/Linear_algebra/04._Vector_spaces/4.4_Sums_and_direct_sum

projection operators
https://people.maths.ox.ac.uk/flynn/genus2/alg0506/LALect02.pdf

problems on direct sum
http://en.wikibooks.org/wiki/Linear_Algebra/Combining_Subspaces

http://math.stackexchange.com/questions/1151335/what-is-direct-sum-decomposition
https://web.math.princeton.edu/~jonfick/2012SpMAT204/Direct_Sums.pdf
http://www.math.nyu.edu/faculty/hausner/quotientspaces.pdf
http://www.math.ttu.edu/~drager/Classes/05Summer/m5399/struct.pdf
http://www.mathsman.co.uk/LinearAlgebra-Ver1.4[1].pdf
http://planetmath.org/directsumofmatrices

Monday, 1 June 2015

Fields

A vector space is a non-empty set consisting of elements called vectors which can be added and multiplied by scalars.

The scalars that operate on vectors in a vector space form a field $\mathbb{F}$, where addition $+$ and multiplication $\cdot$ over $\mathbb{F}$ are performed between scalars.



Prerequisites
Groups (another algebraic structure)

A group, denoted by $(G, \cdot)$, is a non-empty set (finite or infinite) $G$ with a binary operator $\cdot$ such that the following four properties are satisfied:

Closure: if $a$ and $b$ belong to $G$, then $a\cdot b$ also belongs to $G$;
Associative: $a\cdot (b\cdot c)=(a\cdot b)\cdot c$ for all $a,b,c \in G$;
Identity element: there is an element $e \in G$ such that $a\cdot e=e\cdot a=a$ for every element $a \in G$;
Inverse element: for every element $a$ in $G$, there's an element $a'$ such that $a\cdot a'=e$ where $e$ is the identity element.

In general, a group is not necessarily commutative, namely $a\cdot b=b\cdot a$ for all $a,b \in G$. When it does, the group is an abelian group. If $G$ has finitely many elements, $G$ is said to be a finite group. The order of $G$ is the number of elements in $G$; it is denoted by $|G|$ or $\#G$.

Examples
Let's consider the set of integers, $\mathbb{Z}$ with addition and multiplication.

$\small \begin{array}{c|c|c} & \text{Addition} & \text{Multiplication} \\ \hline \text{Closure} & \text{a+b is an integer} & \text{a*b is an integer} \\ \text{Associativity} & a+(b+c)=(a+b)+c & a*(b*c)=(a*b)*c \\ \text{Existence of an identity element} & a+0=a & a*1=a \\ \text{Existence of inverse elements} & a+(-a)=0 & \color{blue}{\text{only for 1 and -1}: 1*1=1,\: -1*(-1)=1} \\ \text{Commutativity} & a+b=b+a & a*b=b*a \end{array}$

$(\mathbb{Z}, +)$ is an infinite abelian group but $(\mathbb{Z}, *)$ is not a group.
$(\mathbb{Z}, *)$ is not a group because most of the elements do not have inverses.


Now consider the set of remainders modulo a positive integer $n$, $\mathbb{Z}_n$ or $Z/nZ$, namely $\{0,1,2,\cdots,n-1\}$ with addition modulo $n$ and multiplication modulo $n$.

$\small \begin{array}{c|c|c} & \text{Addition mod n} & \text{Multiplication mod n} \\ \hline \text{Closure} & a+b \equiv c\:\text{mod n},\: 0\leq c\leq n-1 & a*b \equiv c\:\text{mod n},\: 0\leq c\leq n-1 \\ \text{Associativity} & a+(b+c)=(a+b)+c\:\text{mod n} & a*(b*c)=(a*b)*c\:\text{mod n} \\ \text{Existence of an identity element} & a+0=a\:\text{mod n} & a*1=a\:\text{mod n} \\ \text{Existence of inverse elements} & a+(n-a)=0\:\text{mod n} & \color{blue}{\text{only when a is coprime to n}} \\ \text{Commutativity} & a+b=b+a\:\text{mod n} & a*b=b*a\:\text{mod n}\end{array}$

$(\mathbb{Z}_n, +)$ is a finite abelian group with order $n$ but $(\mathbb{Z}_n, *)$ is not a group. Note that $0$ is an element of $\mathbb{Z}_n$ and $0$ is not coprime to any number so there is no inverse for $0$. Therefore $(\mathbb{Z}_n, *)$ is not a group.


Lastly, consider the set of remainders coprime to the modulus $n$. For example, when $n=10$, the set is $\{1,3,5,7,9\}$. In particular, when $n$ is a prime number, the set is $\{1,2,\cdots,n-1\}$. Let's call this set Coprime-n with addition modulo $n$ and multiplication modulo $n$.

$\small \begin{array}{c|c|c} & \text{Addition mod n} & \text{Multiplication mod n} \\ \hline \text{Closure} & \color{blue}{a+b\: \text{may not be in Coprime-n.}} & a*b \equiv c\:\text{mod n},\: \text{c is in Coprime-n} \\ \text{Associativity} & a+(b+c)=(a+b)+c\:\text{mod n} & a*(b*c)=(a*b)*c\:\text{mod n} \\ \text{Existence of an identity element} & \color{blue}{\text{null}} & a*1=a\:\text{mod n} \\ \text{Existence of inverse elements} & \color{blue}{\text{null}} & \text{exists for every a in Coprime-n} \\ \text{Commutativity} & a+b=b+a\:\text{mod n} & a*b=b*a\:\text{mod n}\end{array}$

Now (Coprime-n, +) is not a group and (Coprime-n, *) is a group. (Coprime-n, *) is abelian and finite. When $n$ is a prime number, the order of (Coprime-n, *) is $n-1$. Note that this only holds when $n$ is a prime number. For example, when $n=10$, the order of (Coprime-10, *) is $5$ not $9$.



A field is a non-empty set $F$ with two binary operators which are usually denoted by $+$ and $*$ (or omitted), that satisfy the field axioms:

(1) Closure: If $a,b \in \mathbb{F}$, then $a+b \in \mathbb{F}$ and $ab \in \mathbb{F}$.
(2) Commutativity: If $a,b \in \mathbb{F}$, there hold $a+b=b+a$ and $ab=ba$.
(3) Associativity: If $a,b \in \mathbb{F}$, there hold $(a+b)+c=a+(b+c)$ and $a(bc)=(ab)c$.
(4) Distributivity: For $a,b,c \in \mathbb{F}$, there hold $a(b+c)=ab+ac$.
(5) Existence of zero: There is a scalar, called zero, denoted by 0, such that $a+0=a$ for any $a \in \mathbb{F}$.
(6) Existence of unity: There is a scalar different from zero, called one, denoted by $1$, such that $1a=a$ for any $a \in \mathbb{F}$.
(7) Existence of additive inverse: For any $a \in \mathbb{F}$, there is a scalar, denoted by $-a$, such that $a+(-a)=0$.
(8) Existence of multiplicative inverse: For any $a \in \mathbb{F}\setminus \{0 \}$, there is a scalar, denoted by $a^{-1}$, such that $aa^{-1}=1$.

If the set $F$ is finite, then the field is said to be a finite field. The order of a finite field is the number of elements in the finite field. Finite fields have practical applications in coding theory, cryptography, algebraic geometry and number theory.

Examples of fields: $\mathbb{Q}, \mathbb{R}, \mathbb{C}$

Nonexample of fields: $(\mathbb{Z}, +, *)$ does not form a field because there are no multiplicative inverses for its non-unit elements. (For any $a \in \mathbb{Z}$, $a^{-1}$ is not included in $\mathbb{Z}$.)

$(\mathbb{Z}_n, +, *)$ in general is not a finite field. For example, $\mathbb{Z}_8 \setminus \{0\}=\{1,2,3,4,5,6,7\}$ along with modulo $8$ multiplication does not form a group. However, when $n$ is a prime number, say $5$, $\mathbb{Z}_5 \setminus \{0\}=\{1,2,3,4\}$ along with modulo $5$ multiplication forms the abelian group $\mathbb{Z}_5^*$. Therefore, $(\mathbb{Z}_5, +, *)$ is a finite field. [...]



A set $U$ is a vector space over a field $\mathbb{F}$ if $U$ is non-empty and there is addition between the elements of $U$, called vectors, and scalar multiplication between elements in $\mathbb{F}$, called scalars, and vectors, such that the following axioms hold.
(1) (Closure) For $u,v \in U$, we have $u+v \in U$. For $u \in U$ and $a \in \mathbb{F}$, we have $au \in U$.
(2) (Commutativity) For $u,v \in U$, we have $u+v=v+u$.
(3) (Associativity) For $u,v,w \in U$, we have $u+(v+w)=(u+v)+w$.
(4) (Existence of zero vector) There is a vector, called zero and denoted by $0$ such that $u+0=u$ for any $u \in U$.
(5) (Existence of additive inverse) For any $u \in U$, there is a vector, denoted by $(-u)$, such that $u+(-u)=0$.
(6) (Associativity of scalar multiplication) For any $a,b \in \mathbb{F}$ and $u \in U$, we have $a(bu)=(ab)u$.
(7) (Property of unit scalar) For any $u \in U$, we have $1u=u$.
(8) (Distributivity) For any $a,b \in \mathbb{F}$ and $u,v \in U$, we have $(a+b)u=au+bu$ and $a(u+v)=au+av$.

Reference:
http://www.doc.ic.ac.uk/~mrh/330tutor/index.html

More to explore:
http://www.math.uconn.edu/~kconrad/blurbs/galoistheory/finitefields.pdf
http://aix1.uottawa.ca/~jkhoury/coding.htm

Application of finite fields: coding theory
http://mathworld.wolfram.com/Error-CorrectingCode.html