Math of the Day: linear algebra

Showing posts with label linear algebra. Show all posts

Friday, 29 April 2016

Dimension of Vector Space

Determine whether each of the following is a vector space and find the dimension and a basis for each that is a vector space:
i) $\mathbb{C}$ over $\mathbb{C}$.
ii) $\mathbb{C}$ over $\mathbb{R}$.
iii) $\mathbb{R}$ over $\mathbb{C}$.
iv) $\mathbb{R}$ over $\mathbb{Q}$.
v) $\mathbb{Q}$ over $\mathbb{R}$.
vi) $\mathbb{Q}$ over $\mathbb{Z}$.
vii) $\mathbb{S}=\{a+b\sqrt{2}+c\sqrt{5}\:|\:a,b,c \in \mathbb{Q}\}$ over $\mathbb{Q,R,\text{or}\: C}$.

Answers:
i) Yes. $\{1\}$ is a basis. Dimension is $1$.
[Say $1+2i,1+i\in \Bbb C$. $(1+2i)\cdot (1+i)=-1+3i=1\cdot (-1+3i)$, where $1$ is the basis element and $-1+3i$ is in the field $\Bbb C$.]

ii) Yes. $\{1,i\}$ is a basis. Dimension is $2$.
[Say $1+i\in \Bbb C$ and $2\in \Bbb R$. $2\cdot (1+i)=2+2i=2\cdot 1+2\cdot i$, where $1,i$ are elements in the basis and $2$ is in the field $\Bbb R$.]

iii) No. $i\in \Bbb C$ and $1\in \Bbb R$, but $i \cdot 1=i\notin \Bbb R$.
iv) Yes. $\{1,\pi,\pi^2,\cdots\}$ are linearly independent over $\Bbb Q$. Dimension is infinite.
v) No. $\sqrt{2}\in \Bbb R$ and $1\in \Bbb Q$, but $\sqrt{2} \cdot 1=\sqrt{2}\notin \Bbb Q$.
vi) No. $\Bbb Z$ is not a field.
vii) Yes only over $\Bbb Q$. $\{1,\sqrt{2},\sqrt{5}\}$ is a basis. Dimension is $3$.

From iii and v, we know that the field is always 'smaller' than its vector space. In fact, this is related to the concept of field extension (to be discussed in our next post).

Let $V=\{(x,y)\:|\:x,y\in \Bbb C\}$. Under the standard addition and scalar multiplication for ordered pairs of complex numbers, is $V$ a vector space over $\Bbb C$? Over $\Bbb R$? Over $\Bbb Q$? If so, find the dimension of $V$.

Answers:
From the previous question, we know $\{(1,0),(0,1)\}$ form a basis for $V$ over $\Bbb C$. Dimension is $2$. $\{(1,0),(i,0),(0,1),(0,i)\}$ form a basis for $V$ over $\Bbb R$. Dimension is $4$. Lastly, the dimension of $V$ over $\Bbb Q$ is infinite.

Saturday, 23 April 2016

A problem on diagonalisation

Give examples of the following types of operators defined by a $2\times 2$ matrix or explain why such an operator can't exist:
i) diagonalisable, invertible,
ii) diagonalisable, not invertible,
iii) not diagonalisable, invertible,
iv) not diagonalisable, not invertible.

Let's just consider the case in $\Bbb R$.

i) $$\boxed{\begin{pmatrix}a&0\\0&b\end{pmatrix},\quad \text{a and b are not necessarily different},\quad a,b\neq 0}$$ Any diagonal matrix $D$ is diagonalisable: $$I_n DI_n=D.\\ \boxed{\begin{pmatrix}a&b\\b&a\end{pmatrix},\quad a\neq b,\quad a\neq -b,\quad\text{a can be zero.}}$$ Calculations: $(a-\lambda)^2-b^2=0\Rightarrow \lambda=a-b,a+b$. $$\ker\begin{pmatrix}a-(a+b)&b\\b&a-(a+b)\end{pmatrix}=\text{span}\bigg\{\begin{pmatrix}1\\1\end{pmatrix}\bigg\}\\ \ker\begin{pmatrix}a-(a-b)&b\\b&a-(a-b)\end{pmatrix}=\text{span}\bigg\{\begin{pmatrix}1\\-1\end{pmatrix}\bigg\}$$ The two eigenvectors form an eigenbasis. Alternatively, one can use the result: if $A\in M_{n\times n}(\Bbb F)$ has $n$ distinct eigenvalues in $\Bbb F$, then $A$ is diagonalisable over $\Bbb F$. $$\boxed{\begin{pmatrix}a&b\\b&c\end{pmatrix},\quad a\neq c,\quad b\neq 0,\quad\text{a can be zero.}}$$ The calculation is similar, but involves the use of quadratic formula. Since $a\neq c$ and $b\neq 0$, we have two distinct eigenvalues. Symmetric matrices are thus diagonalisable. $$\boxed{\begin{pmatrix}a&b\\0&c\end{pmatrix},\quad a\neq c,\quad b\neq 0}$$ Upper triangular matrices with distinct diagonal entries are diagonalisable because of the same reason: they have distinct eigenvalues. The above matrices are invertible since their determinants are non-zero.

ii) $$\boxed{\begin{pmatrix}a&0\\0&0\end{pmatrix},\quad a\neq 0}$$ Again, any diagonal matrix is diagonalisable. $$\boxed{\begin{pmatrix}a&a\\a&a\end{pmatrix},\quad a\neq 0}$$ Calculation: $(a-\lambda)^2-a^2=0\Rightarrow \lambda=0,2a$. When $a=1$, the matrix is called matrix of ones. $$\boxed{\begin{pmatrix}a&a\\b&b\end{pmatrix}, \begin{pmatrix}a&b\\a&b\end{pmatrix},\quad a+b\neq 0,\;\;\text{a,b not both zero.}}$$Calculation: $(a-\lambda)(b-\lambda)-ab=0\Rightarrow \lambda=0,a+b$.

iii) $$\boxed{\begin{pmatrix}a&b\\0&a\end{pmatrix},\begin{pmatrix}a&0\\b&a\end{pmatrix},\quad \text{a and b are not necessarily different},\quad a,b\neq 0}$$ These are known as the shear transformations, which are not diagonalisable. Proof by contradiction: Suppose $\begin{pmatrix}a&b\\0&a\end{pmatrix}$ is diagonalisable. Then there exist a diagonal matrix $D$ and an invertible matrix $V$ s.t. $$\begin{pmatrix}a&b\\0&a\end{pmatrix}=V^{-1}DV.$$ We know just by looking at the matrix that $a$ is an eigenvalue with multiplicity $2$. So $D=\begin{pmatrix}a&0\\0&a\end{pmatrix}=aI_2$. We then have $\begin{pmatrix}a&b\\0&a\end{pmatrix}=aV^{-1}I_2V=aI_2=\begin{pmatrix}a&0\\0&a\end{pmatrix}$ which is a contradiction.

iv) $$\boxed{\begin{pmatrix}0&a\\0&0\end{pmatrix},\quad a\neq 0}$$ Its only eigenvalue is $0$ of multiplicity $2$.

From this problem, we see that invertibility does not necessarily nor sufficiently imply diagonalisability! Another important fact that one can observe from (ii) and (iv) is that if a square matrix $A$ is not invertible, then $\lambda=0$ is an eigenvalue of $A$. The converse also holds: if $\lambda=0$ is an eigenvalue of $A$, then $A$ is not invertible.

Claim: $A\in M_{n\times n}(\Bbb F)$ is singular iff $\lambda=0$ is an eigenvalue of $A$.
Proof: $\lambda=0$ is a solution of the characteristic equation $\lambda^n+c_1\lambda^{n-1}+\cdots+c_n=0$ iff $c_n=0$. By definition, $\det(\lambda I-A)=\lambda^n+c_1\lambda^{n-1}+\cdots+c_n$. When $\lambda=0$, we have $\det(\lambda I-A)=\det(-A)=(-1)^n\det(A)=c_n$. We thus have $\det(A)=0$ iff $c_n=0$ iff $\lambda=0.\;\Box$

Here's a summary:

i) diagonalisable, invertible: symmetric, upper triangular, diagonal
ii) diagonalisable, not invertible: nilpotent, scalar multiples of matrix of ones
iii) not diagonalisable, invertible: shear
iv) not diagonalisable, not invertible: nilpotent.

I tried to enumerate all possibilities but failed. I can't imagine the amount of work it takes to characterise diagonalisation for $3\times 3$ matrices...

Questions:
$$\boxed{\begin{pmatrix}a&b\\c&d\end{pmatrix},\quad (a-d)^2=-4bc,\quad ad-bc\neq 0}$$ has repeated eigenvalues, so I consider it fitting (iii), but can the kernel corresponding to that one eigenvalue be spanned by two eigenvectors? Similar questions for $$\boxed{\begin{pmatrix}a&a\\a&b\end{pmatrix},\quad 5a^2+b^2=2ab,\quad a(b-a)\neq 0}\\
\boxed{\begin{pmatrix}a&a\\b&c\end{pmatrix},\quad a^2+c^2=2ac-4ab,\quad a(c-b)\neq 0}\\
\boxed{\begin{pmatrix}0&a\\b&c\end{pmatrix},\quad c^2=-4ab,\quad ab\neq 0}$$ I'm not even sure if there are other possibilities for (iii) and (iv).

Tuesday, 8 March 2016

Evaluating Vandermonde Determinants

Suppose we have four points $(x_1,y_1),(x_2,y_2),(x_3,y_3),(x_4,y_4)$. If we want to find a cubic polynomial $a+bx+cx^2+dx^3$ passing through these points, we have to solve $$\underbrace{\begin{pmatrix}1&x_1&x_1^2&x_1^3\\1&x_2&x_2^2&x_2^3\\1&x_3&x_3^2&x_3^3\\1&x_4&x_4^2&x_4^3\end{pmatrix}}_{=A}\begin{pmatrix}a\\b\\c\\d\end{pmatrix}=\begin{pmatrix}y_1\\y_2\\y_3\\y_4\end{pmatrix}.$$ There is always a unique solution when $A$ is invertible. $A$ is actually a Vandermonde matrix, which is invertible if all the $x_i$'s are distinct. How do we evaluate a Vandermonde determinant? As an example, let's consider $$A=\begin{pmatrix}1&3&9&27\\1&2&4&8\\1&4&16&64\\1&5&25&125\end{pmatrix}.$$ Replace the last row with the variable $x$. Let $$f(x)=\begin{pmatrix}1&3&9&27\\1&2&4&8\\1&4&16&64\\1&x&x^2&x^3\end{pmatrix}.$$ We see that $f(2)=f(3)=f(4)=0$ and $\det(A)=f(5)$. Since $f(x)$ is a cubic polynomial, we must have $f(x)=k(x-2)(x-3)(x-4)$, where $k$ is a constant. Now we perform cofactor expansion along the last row: $$f(x)=1\cdot c_{41}+x\cdot c_{42}+x^2\cdot c_{43}+x^3\cdot c_{44}.$$ Comparing the coefficients of $x^3$ of the two expressions, we have $$k=c_{44}=\begin{vmatrix}1&3&9\\1&2&4\\1&4&16\end{vmatrix}.$$Do you notice something? This problem is related to recursion! Using the same trick, we can argue that $c_{44}=(4-3)(4-2)(2-3)=-2$. Finally, $f(5)=-2(5-2)(5-3)(5-4)=-12$.

Reference:
vandermonde

Sunday, 9 August 2015

Orthogonal functions

Legendre polynomials (important in spherical harmonics)
There are three general ways of forming these polynomials.
1. Schmidt orthogonalisation of $\{x^n\}$ on $[-1,1]$
Let $p_1(x)=\dfrac{1}{\lVert 1\rVert}$. Then $\lVert 1\rVert^2=\int_{-1}^1 1dx=2$ and thus $\boxed{p_1(x)=\dfrac{1}{\sqrt{2}}}$. Notice that $\int_{-1}^1 f(x)dx=0$ for any odd function $f(x)$. So $x$ is orthogonal to $1$. We then find $p_2(x)=\dfrac{x}{\lVert x \rVert}$. Since $\lVert x \rVert^2=\int_{-1}^1 x^2dx=\dfrac{2}{3}$, we have $\boxed{p_2(x)=\sqrt{\dfrac{3}{2}}x}$. Although $x^2$ is orthogonal to $x$ (and hence $p_2$), it is not orthogonal to $1$. Let $q_3(x)=x^2-\langle x^2,p_1 \rangle p_1$, and then normalise to get $p_3=\dfrac{q_3}{\lVert q_3 \rVert}$. Now $\langle x^2,p_1 \rangle=\dfrac{1}{\sqrt{2}}\int_{-1}^1 x^2 dx=\dfrac{\sqrt{2}}{3}$, so $q_3(x)=x^2-\dfrac{\sqrt{2}}{3}\dfrac{1}{\sqrt{2}}=x^2-\dfrac{1}{3}$. $\lVert q_3 \rVert^2=\int_{-1}^1 (x^2-\dfrac{1}{3})^2dx=\dfrac{8}{45}$. Thus $\boxed{p_3(x)=\sqrt{\dfrac{45}{8}}(x^2-\dfrac{1}{3})=\sqrt{\dfrac{5}{2}}(\dfrac{3}{2}x^2-\dfrac{1}{2})}$.
The whole orthonormal set of functions is $\{\sqrt{\dfrac{2n+1}{2}}P_n(x)\}$, where $P_n(x)$ signifies the Legendre polynomial of rank $n$.

2. Solution of Legendre's differential equation
$$\dfrac{d}{dx}[(1-x^2)\dfrac{df}{dx}]+l(l+1)f=0,$$ where $l$ is a positive integer or zero.
One can verify that the polynomials that are generated by orthogonalisation of $\{x^n\}$ on $[-1,1]$ are solutions of this differential equation. Now we attempt a power series solution $\sum_n a_nx^n$ for $f$. Then $$\dfrac{d}{dx}[(1-x^2)\sum_n na_nx^{n-1}]+[l(l+1)\sum_n a_nx^n]=0\\ \dfrac{d}{dx}[\sum_n na_nx^{n-1}-\sum_n na_nx^{n+1}]+\sum_n l(l+1)a_nx^n=0\\ \sum_n n(n-1)a_nx^{n-2}-\sum_n n(n+1)a_nx^n+\sum_n l(l+1)a_nx^n=0.$$ Collecting all similar powers of $x$, we obtain $$\sum_n x^n\underbrace{[(n+2)(n+1)a_{n+2}-n(n+1)a_n+l(l+1)a_n]}_{(*)}=0.$$ Since the functions $\{x_n\}$ are linearly independent, $(*)$ must be zero. This gives the recurrence relation for the expansion coefficients: $$a_{n+2}=\dfrac{[n(n+1)-l(l+1)]a_n}{(n+2)(n+1)}.$$ This equation allows us to find every other coeffcient when we are given the first. For example, if $a_0=1$, then $a_2=\dfrac{-l(l+1)}{2},a_4=\dfrac{6-l(l+1)}{12}\cdot \dfrac{-l(l+1)}{2}$, and so on.

If $l$ is even, eventually $n$ will equal $l$ for some term, then $n(n+1)$ will equal $l(l+1)$ and the following coefficients will also equal zero. So the even terms of the power series cut off at $n=l$ whereas the infinite series of odd powers diverges at $x=\pm 1$. The finite series of even powers gives $$\begin{align}l=0\quad &a_0=1\quad &f_0=1\\ l=2\quad &a_0=1\\ &a_2=-3\quad &f_2=-3x^2+1,\end{align}$$ which are proportional to the Legendre polynomials derived by orthogonalisation. We can choose $a_0$ such that the polynomial $f_n=1$ at $x=+1$. Then $$\begin{align}l=0\quad &a_0=1\quad &P_0=1\\l=2\quad &a_0=\dfrac{-1}{2}\\&a_2=\dfrac{3}{2}\quad &P_2=\dfrac{3}{2}x^2-\dfrac{1}{2}.\end{align}$$
If $l$ is odd, the series in odd powers terminates at $n=l$, and the series in even powers is an infinite series, divergent at $x=\pm 1$. Again, we choose $a_0$ such that $f_n(+1)=1$. Then $$\begin{align}l=1\quad &a_1=1\quad &P_1=x\\ l=3\quad &a_1=\dfrac{-3}{2}\\ &a_3=\dfrac{5}{2}\quad &P_3=\dfrac{5}{2}x^3-\dfrac{3}{2}x.\end{align}$$
3. Generating function
A generating function for a set of functions is a function of two variables such that when this function is expanded as power series in one of the variables, the coefficients are the set of functions in other variable. Often many useful relations may be derived from a generating function; nonetheless, there are no general methods for constructing a generating function.

The generating function for Legendre polynomials is $$F(x,t)=(1-2xt+t^2)^{-1/2}=\sum_n P_n(x)t^n.$$ It can be expanded as a binomial series $$F(x,t)=1+\dfrac{1\cdot 1}{2\cdot 1!}(2xt-t^2)+\dfrac{1\cdot 3\cdot 1}{2\cdot 2\cdot 2!}(2xt-t^2)^2+\frac{1\cdot 3\cdot 5\cdot 1}{2\cdot 2\cdot 2\cdot 3!}(2xt-t^2)^3+\cdots$$ and then rearranged in powers of $t$: $$\begin{align}F(x,t)&=1+(x)(t)+(\dfrac{-1}{2}+\dfrac{3}{2}x^2)(t^2)+(\dfrac{-3}{2}x+\dfrac{5}{2}x^3)(t^3)+\cdots\\&=\sum_n P_n(x)t^n.\end{align}$$
Hermite polynomials (important in Quantum Mechanics)
They are solutions of the differential equation
$$y''-2xy'+2py=0\quad y(0)=1,\quad y'(0)=1.$$
Chebyshev polynomials (important in numerical analysis)
They are solutions of the differential equation
$$(1-x^2)y''-xy'+n^2y=0\quad y(0)=1,\quad y'(0)=0.$$
They can also be represented as $$T_n(x)=\cos (n\cos^{-1} x).$$
Applications

$\min \int_{-1}^1 |x^3-a-bx-cx^2|dx$
Geometrically, the minimum, call it $Q$, is the square of the distance from $x^3$ to the subspace $S$ spanned by the functions $1,x,x^2$. We can compute the orthogonal projection $\phi(x)$ of $x^3$ into $S$ and use the Pythagoras' theorem: $Q^2=\lVert x^3 \rVert^2-\lVert \phi(x)\rVert^2$. We first need to transform the basis $\{1,x,x^2\}$ into an orthonormal one, namely $\{p_1,p_2,p_3\}$ as above. $\phi(x)=\langle x^3,p_1\rangle p_1+\langle x^3,p_2 \rangle p_2+\langle x^3,p_3 \rangle p_3$. $x^3$ is orthogonal to $p_1$ and $p_2$ by oddness. Since $\langle x^3,p_2 \rangle=\sqrt{\dfrac{3}{2}}\int_{-1}^1 x^3x dx=\sqrt{\dfrac{3}{2}}\dfrac{2}{5}=\dfrac{6}{5}$, we have $\lVert\phi(x)\rVert^2=\dfrac{6}{25}\lVert p_2 \Vert^2=\dfrac{6}{25}$. With $\lVert x^3 \Vert^2=\int_{-1}^1 x^6 dx=\dfrac{2}{7}$, we have $Q=\dfrac{2}{7}-\dfrac{6}{25}=\dfrac{8}{175}$.

physical applications [later]

The Legendre polynomials are important in some quantum-chemical problems. They are the basis for the wave functions for angular momentum, and thus occur in problems involving spherical motion, such as that of the electron in a hydrogen atom or the rotations of a molecule. They are also important in describing the angular dependence of one-electron atomic orbitals; this dependence, in turn, forms the basis of the geometry of chemical compounds.

Reference:
Mathematics for Quantum Chemistry by Jay Martin Anderson

Saturday, 8 August 2015

Mathematics for quantum chemistry

Introduction
Quantum mechanics grew up from two different points of view, which represent two analogous mathematical formulations of eigenvalue problems. One is the wave mechanics of Schrodinger. In wave mechanics, operators are differential expressions, such as $\dfrac{d}{dx}$, and the eigenvalue equation takes the form of a differential equation, and relies on the calculus for its solution. The other formulation is the matrix mechanics of Heisenberg, in which operators are represented by algebraic entities called matrices; instead of a function in the eigenvalue equation, the matrix operator operates on a vector $\zeta$ to transform $\zeta$ into a vector parallel to $\zeta$, but $q$ times as long: $$Q\zeta=q\zeta.$$ This is the matrix-mechanical formulation of the eigenvalue problem. The solution of this form of the eigenvalue problem relies on algebra.

These two approaches to quantum mechanical problems are deeply interrelated; the work of Dirac shows the underlying equivalence of the two points of view, as well as of the corresponding mathematical techniques. [...]

Orthogonal functions
Definition: The expansion interval is the range of the independent variables assumed by the functions under consideration.

Definition: The inner product of the two complex-valued functions $f$ and $g$ of a continuous variable on their expansion interval $[a,b]$ is $$\langle f|g \rangle=\int_a^b f(x)^*g(x)dx.$$
The order is not important for real-valued functions but important for complex-valued functions: $$\begin{align}\langle g|f \rangle&=\int g(x)^*f(x)dx\\&=\big(\int f(x)^*g(x)dx\big)^*\\ &=\langle f|g \rangle^*.\end{align}$$

Definition: Two functions $f(x)$ and $g(x)$ are said to be orthogonal on the interval $[a,b]$ if their inner product on $[a,b]$ is zero: $$\langle f|g \rangle=\int_a^b f^*g=0=\int_a^b g^*f=\langle g|f \rangle.$$

Definition: The norm of a function on the interval $[a,b]$ is the inner product of the function with itself, and may be symbolised by $N$: $$N(f)=\langle f|f \rangle=\int_a^b f^*g=0=\int_a^b f^*f.$$

The norm of a function is a real, positive quantity: $$f^*f=(\text{Re}\:f-i\text{Im}\:f)(\text{Re}\:f+i\text{Im}\:f)=(\text{Re}\:f)^2+(\text{Im}\:f)^2.$$

Definition: A function is said to be normalised if its norm is one: $\langle f|f \rangle=1$.
Suppose $f$ has a norm $N$. Then the function $\dfrac{f}{\sqrt{N}}$ will have a norm of one, since $$\langle \dfrac{f}{\sqrt{N}}|\dfrac{f}{\sqrt{N}} \rangle=\dfrac{1}{N}\langle f|f \rangle=\dfrac{N}{N}=1.$$
Reminder: The expansion interval must be specified before a statement about orthogonality or normality can be made. On $[-1,1]$, $\langle x|x^2 \rangle=\int_{-1}^1 x^*x^2 dx=\int_{-1}^1 x^3 dx=\dfrac{x^4}{4}|_{-1}^1=0$, so we can say that on $[-1,1]$, $x$ and $x^2$ are orthogonal functions. However, on $[0,1]$, $\langle x|x^2 \rangle=\int_0^1 x^3 dx=\dfrac{x^4}{4}|_0^1=\dfrac{1}{4}\neq 0$, so the functions are not orthogonal in this interval.

Definition: $\{F_i\}$ is a complete set of functions such that any other function $f$ may be expressed as a linear combination of members of the set $\{F_i\}$ on a prescribed expansion interval. [Namely, if the set $\{F_i\}$ is complete, then we may expand $f$ in terms of the functions $F_i$: $f(x)=\sum_{n=1}^\infty a_n F_n(x)$.]

Definition: An orthogonal set of functions is a set of functions each of which is orthogonal on a prescribed interval to all other members of the set. [That is, the set $\{F_i\}$ is is an orthogonal set if every member is orthogonal to every other member: $\langle F_i|F_j \rangle=0$ for all $i,j$ such that $i\neq j$.]

Example: The set $\{\sin nx, \cos nx\}$, on $[-\pi,\pi]$, for $n$ zero or positive integers, is an orthogonal set of functions.

Definition: An orthonormal set of functions is an orthogonal set of functions, each of which is normalised. [That is, the set $\{F_i\}$ is orthonormal if $\langle F_i|F_j \rangle=0$ for all $i\neq j$ and $\langle F_i|F_i \rangle=1$ for all $j$.]

These two equations occurs so often in discussing orthonormal functions that a special symbol has been introduced to combine them. The Kronecker delta symbol $\delta_{ij}$ has the meaning $\delta_{ij}=0$ for $i\neq j$, $\delta_{ij}=1$ for $i=j$.

The Fourier series is an expansion of a function in the orthonormal functions which are proportional to $\{\sin mx,\cos mx\}$. Let us derive their norm on the interval $[-\pi,-\pi]$. $$N(\sin mx)=\langle \sin mx|\sin mx \rangle=\int_{-\pi}^\pi \sin^2 mx dx=\dfrac{1}{m}\int_{-m\pi}^{m\pi}\sin^2 y dy=\pi.$$ Similarly, $$N(\cos mx)=\langle \cos mx|\cos mx \rangle=\dfrac{1}{m}\int_{-m\pi}^{m\pi}\cos^2 y dy=\pi.$$ Also, $$N(\sin 0x)=\langle \sin 0x|\sin 0x \rangle=\int_{-\pi}^\pi 0 dx=0$$ and $$N(\cos 0x)=\langle \cos 0x|\cos 0x \rangle=\int_{-\pi}^{\pi}1dx=2\pi.$$ Summarising the results, $$N(\sin mx)=\begin{cases} \pi\quad m\neq 0\\ 0\quad m=0 \end{cases}\\ N(\cos mx)=\begin{cases} \pi\quad m\neq 0\\ 2\pi \quad m=0 \end{cases}$$ or using the Kronecker delta, we have $$N(\sin mx)=\pi-\pi\delta_{m0}\\N(\cos mx)=\pi+\pi\delta_{m0}.$$ To calculate the expansion coefficients for an expansion in orthonormal functions, we use the set of functions $\{\dfrac{1}{\sqrt{2\pi}},\dfrac{1}{\sqrt{\pi}}\sin mx,\dfrac{1}{\sqrt{\pi}}\cos mx\}$ for $m=1,2,\cdots$, for expansion on $[-\pi,\pi]$. For these functions, the expansion coefficients will be $$a_0=\langle \dfrac{1}{\sqrt{2\pi}}|f\rangle\\a_m=\langle \dfrac{1}{\sqrt{\pi}}\cos mx|f\rangle\\b_m=\langle \dfrac{1}{\sqrt{\pi}}\sin mx|f\rangle,$$ where the expansion is $$f(x)=a_0\dfrac{1}{\sqrt{2\pi}}+\sum_{m=1}^\infty a_m(\dfrac{1}{\sqrt{\pi}}\cos mx)+\sum_{m=1}^\infty b_m(\dfrac{1}{\sqrt{\pi}}\sin mx).$$ Usually the constants are removed from the terms by explicitly writing out the expansion coefficients $$f(x)=\dfrac{1}{2\pi}\langle 1|f\rangle+\dfrac{1}{\pi}\sum_{m=1}^\infty [\langle \cos mx|f\rangle \cos mx+\langle \sin mx|f\rangle \sin mx].$$ This gives the final result $$f(x)=\dfrac{c_0}{2}+\sum_{m=1}^\infty c_m \cos mx+\sum_{m=1}^\infty d_m \sin mx\\c_m=\dfrac{1}{\pi}\langle \cos mx|f \rangle\\ d_m=\dfrac{1}{\pi}\langle \sin mx|f \rangle$$ on $[-\pi,\pi]$. This is the usual form of Fourier series. Note the lead term is divided by two because the norm of $\cos 0x$ is $2\pi$, whereas for all other values of $m$, the norm of $\cos mx$ is $\pi$.

Using the property of evenness and oddness of functions, we can make two simple extensions of the Fourier series. The Fourier expansion on $[-\pi,\pi]$ of an odd function is made up of sine terms $$f(x)=\sum_{m=1}^\infty d_m \sin mx$$ since if $f$ is odd, all inner products $\langle \cos mx|f \rangle \equiv 0$. The Fourier expansion on $[-\pi,\pi]$ of an even function is made up only of cosine terms: $$f(x)=\dfrac{c_0}{2}+\sum_{m=1}^\infty c_m \cos mx$$ since if $f$ is even, all inner products $\langle \sin mx|f \rangle \equiv 0$.

[...]

Reference:
Mathematics for Quantum Chemistry by Jay Martin Anderson

Wednesday, 5 August 2015

Orthogonality and Gram Schmidt process

Orthogonal bases
Let $\{v_1,\cdots,v_n\}$ be an orthogonal basis of the Euclidean space $V$. We want to find coordinates of the vector $u$ in this basis, namely the numbers $a_1,\cdots,a_n$ such that $$u=a_1v_1+\cdots+a_nv_n.$$ We can take the inner product with $v_1$, so $$\langle u,v_1\rangle=a_1\langle v_1,v_1\rangle+\cdots+a_n\langle v_n,v_1\rangle.$$ Since $v_1,\cdots,v_n$ are pairwise orthogonal, we have $\langle v_i,v_j \rangle=0$ for $i\neq j$. Thus, $\langle u,v_1\rangle=a_1\langle v_1,v_1\rangle \Rightarrow a_1=\dfrac{\langle u,v_1\rangle}{\langle v_1,v_1\rangle}$. Similarly, multiplying by $v_2,\cdots,v_n$, we have other coefficients $$a_2=\dfrac{\langle u,v_2\rangle}{\langle v_2,v_2\rangle},\cdots,a_n=\dfrac{\langle u,v_n\rangle}{\langle v_n,v_n\rangle}.$$ These coefficients are called Fourier coefficients of the vector $u$ with respect to basis $\{v_1,\cdots,v_n\}$.

Theorem: Let $\{v_1,\cdots,v_n\}$ be an orthogonal basis of the Euclidean space $V$. Then for any vector $u$, $$u=\dfrac{\langle u,v_1\rangle}{\langle v_1,v_1\rangle}v_1+\cdots+\dfrac{\langle u,v_n\rangle}{\langle v_n,v_n\rangle}v_n.$$ This expression is called Fourier decomposition and can be obtained in any Euclidean space.

Projections

The projection of the vector $v$ along the vector $w$ is the vector $\text{proj}_w v=cw$ such that $u=v-cw$ is orthogonal to $w$. We have $\langle u,w \rangle=0$, which means $$\langle v-cw,w \rangle=0 \iff \langle v,w \rangle-c\langle w,w \rangle=0 \iff c=\dfrac{\langle v,w \rangle}{\langle w,w \rangle}.$$ Thus the orthogonal complement $u$ is $$u=v-\text{proj}_w v=v-\dfrac{\langle v,w \rangle}{\langle w,w \rangle}w.$$ With this formula, we can find the distance between a point and a plane. Now, let's generalise our constructions.

Theorem: Let $W$ be a subspace with orthogonal basis $\{w_1,\cdots,w_n\}$. The projection $\text{proj}_w v$ of any vector $v$ along $W$ is $$\text{proj}_w v=\dfrac{\langle v,w_1 \rangle}{\langle w_1,w_1 \rangle}w_1+\cdots+\dfrac{\langle v,w_n \rangle}{\langle w_n,w_n \rangle}w_n.$$ [This follows from the Fourier decomposition above.] In particular, $u=v-\text{proj}_w v$ is orthogonal to the subspace $W$.

Proof: Take the inner product of $u$ with $w_i$, $$\langle u,w_i \rangle=\langle v,w_i \rangle-\bigg(\dfrac{\langle v,w_1 \rangle}{\langle w_1,w_1 \rangle}\langle w_1,w_i \rangle+\cdots+\dfrac{\langle v,w_n \rangle}{\langle w_n,w_n \rangle}\langle w_n,w_i \rangle\bigg).$$ Since $\langle w_i,w_j \rangle =0$ for $i\neq j$, we have $$\begin{align}\langle u,w_i \rangle&=\langle v,w_i \rangle-\dfrac{\langle v,w_i \rangle}{\langle w_i,w_i \rangle} \langle w_i,w_i \rangle\\&=\langle v,w_i \rangle-\langle v,w_i \rangle\\&=0.\end{align}$$ Thus $u$ is orthogonal to every $w_i$, and so it is orthogonal to $W$. $\Box$

Gram Schmidt orthogonalisation process
Let $\{v_1,\cdots,v_n\}$ be a basis in the Euclidean space. We can construct orthogonal basis $\{w_1,\cdots,w_n\}$ of this space. $$\begin{align}w_1&=v_1\\w_2&=v_2-\dfrac{\langle v_2,w_1\rangle}{\langle w_1,w_1\rangle}w_1\\w_3&=v_3-\dfrac{\langle v_3,w_1\rangle}{\langle w_1,w_1\rangle}w_1-\dfrac{\langle v_3,w_2\rangle}{\langle w_2,w_2\rangle}w_2\\ \cdots\\ w_n&=v_n-\dfrac{\langle v_n,w_1\rangle}{\langle w_1,w_1 \rangle}w_1-\dfrac{\langle v_n,w_2\rangle}{\langle w_2,w_2\rangle}w_2-\cdots-\dfrac{\langle v_n,w_{n-1}\rangle}{\langle w_{n-1},w_{n-1}\rangle}w_{n-1}.\end{align}$$

[...]

Friday, 31 July 2015

Exact sequences

A motivating example:
In linear algebra, we know that the linear transformation $f:V\to W$ induces the linear transformations $V\xrightarrow{g}\text{im}(f)\xrightarrow{i}W$ where $i$ is the inclusion map and $g$ is the same as $f$ except that the range of $f$ has been restricted to $\text{im}(f)$. Then $f=i\circ g$ and so we can represent this by a commutative diagram. Intuitively, we can think of exact sequence as a factorisation of maps with certain condition.

Definition: Let $U\xrightarrow{f}V\xrightarrow{g}W$ be a sequence of linear transformations. (This means $f\in L(U,V)$ and $g\in L(V,W)$.) We say $U\xrightarrow{f}V\xrightarrow{g}W$ is an exact sequence at $V$ if $\text{im}(f)=\text{ker}(g)$. A sequence $V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}\cdots \xrightarrow{f_{n-2}}V_{n-1}\xrightarrow{f_{n-1}}V_n$ is an exact sequence if $\text{im}(f_{i-1})=\text{ker}(f_i), 2\leq i\leq n-1$, that is, if the sequence is exact at $V_2,\cdots,V_{n-1}$.

When writing sequences of linear transformations, it is customary to write $0$ for the vector space $\{0\}$.

Lemma:
1. The sequence $0\to V\to 0$ is exact iff $V=\{0\}$.
2. The sequence $0\to U\xrightarrow{f} V$ is exact iff $f$ is injective.
3. The sequence $V\xrightarrow{g}W\to 0$ is exact iff $g$ is surjective.
4. The sequence $0\to V\xrightarrow{f}W\to 0$ is exact iff $f$ is an isomorphism.
5. The sequence $0\to U\xrightarrow{f}V\xrightarrow{g}W\to 0$ is exact iff $f$ is injective and $g$ is surjective, and $\text{im}(f)=\text{ker}(g)$.
Proof: Just note that from the definition, $\text{im}(0\to V)=0$ and $\text{ker}(V\to 0)=V.\:\Box$

Definition: A short exact sequence is a sequence of five vector spaces, which is exact at all three internal nodes: $0\to U\xrightarrow{f}V\xrightarrow{g}W\to 0$. A long exact sequence is a doubly infinite sequence of vector spaces which is exact at every node.

A nice feature of exact sequences is that there are simple rules to create new sequences from old ones. The simplest way to do this is to split and glue exact sequences as follows.

Splitting and gluing exact sequences
Splitting: If $V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}V_3\xrightarrow{f_3}V_4$ is an exact sequence, the two sequences $V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}W\to 0$ and $0\to W\to V_3\xrightarrow{f_3}V_4$ are also exact, where $W=\text{im}(f_2)=\text{ker}(f_3)$ and $f_2$ is the inclusion of $\text{ker}(f_3)$ in $V_3$.

Gluing: Conversely, if $V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}W\to 0$ and $0\to W\to V_3\xrightarrow{f_3}V_4$ are exact sequences with $W \subset V_3$ and $f_2$ being the inclusion, the sequence $V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}V_3\xrightarrow{f_3}$ is also exact.

Proposition: Assume that $0\to V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}\cdots\xrightarrow{f_{n-1}}V_n\to 0$ is an exact sequence of finite dimensional vector spaces, $n\geq 1$. Then $\sum\limits_{i=1}^n (-1)^i \: \text{dim}(V_i)=0$.

Proof by MI: If $n=1$, then $V_1=\{0\}$, and so $\text{dim}(V_1)=0$.
If $n=2$, then $V_1 \cong V_2$ by the lemma, and so $\text{dim}(V_1)-\text{dim}(V_2)=0$.
If $n=3$, then $0\to V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}V_3\to 0$ is an exact sequence.
By the dimension theorem, we have $$\begin{align}\text{dim}(V_2)&=\text{dim}\:\text{ker}(f_2)+\text{dim}\:\text{im}(f_2)\\
&=\text{dim}\:\text{im}(f_1)+\text{dim}(V_3)\\
&=\text{dim}(V_1)-\text{dim}\:\text{ker}(f_1)+\text{dim}(V_3)\\
&=\text{dim}(V_1)+\text{dim}(V_3),\end{align}$$ which means $-\text{dim}(V_1)+\text{dim}(V_2)-\text{dim}(V_3)=0$. So the result holds for $n=3$.
For the general case, suppose $n\geq 4$. To use the induction assumption, we can split the given exact sequence into $$0\to V_1\xrightarrow{f_1}\cdots\xrightarrow{f_{n-3}}V_{n-2}\xrightarrow{f_{n-2}}\text{im}(f_{n-2})\to 0,$$ $$0\to \text{im}(f_{n-2})\xrightarrow{i}V_{n-1}\xrightarrow{f_{n-1}}V_n\to 0$$ because $\text{im}(f_{n-2})=\text{ker}(f_{n-1})$. By induction, we have $\sum\limits_{i=1}^{n-2} (-1)^i \:\text{dim}(V_i)+(-1)^{n-1}\:\text{dim}(\text{im}(f_{n-2}))=0$ and $(-1)^{n-2}\:\text{dim}(\text{im}(f_{n-2}))+(-1)^{n-1}\:\text{dim}(V_{n-1})+(-1)^n\:\text{dim}(V_n)=0$.
Adding these two gives $\sum\limits_{i=1}^n (-1)^i \:\text{dim}(V_i)=0.\:\Box$

Direct proof: Applying the rank nullity theorem to all vector spaces $V_1,\cdots,V_n$, we have $$\begin{align}\sum\limits_{i=1}^{n-1}(-1)^i\:\text{dim}(V_i)&=\sum\limits_{i=1}^{n-1}(-1)^i \:(\text{dim}\:\text{ker}(f_i)+\text{dim}\:\text{im}(f_i))\\
&=-\text{dim}\:\text{ker}(f_1)+(-1)^{n-1}\text{dim}\:\text{im}(f_{n-1})\\&+\sum\limits_{i=2}^{n-1}(-1)^i \:(\text{dim}\:\text{ker}(f_i)-\text{dim}\:\text{im}(f_{i-1})).\end{align}$$
But $\text{dim}\:\text{ker}(f_1)=0$ since $f_1$ is injective, $\text{dim}\:\text{im}(f_{n-1})=\text{dim}(V_n)$ since $f_{n-1}$ is surjective, and $\text{dim}\:\text{ker}(f_i)=\text{dim}\:\text{im}(f_{n-1})$ for all $i=2,\cdots,n-1$ by definition.
Therefore $\sum\limits_{i=1}^n (-1)^i \: \text{dim}(V_i)=(-1)^{n-1}\:\text{dim}(V_n)+(-1)^n\:\text{dim}(V_n)=0.\:\Box$

Linear algebra
Dimension theorems
For any short exact sequence of finite-dimensional vector spaces $0\to V_1\xrightarrow{T_1}V_2\xrightarrow{T_2}V_3 \to 0$, $\text{dim}(V_2)=\text{dim}(V_1)+\text{dim}(V_3)$. [This is just the $n=3$ case of the proposition above. Here we provide a proof without the aid of rank nullity theorem.]

Proof: Suppose $\text{dim}(V_1)=n$. Then we can take $\{e_i\}_{i=1}^n$ to be a basis of $V_1$. Thus $\{T_1(e_i)\}_{i=1}^n$ span the image of $T_1$. Since $T_1$ is injective, $\{T_1(e_i)\}_{i=1}^n$ are linearly independent in $V_2$. We can find $\{v_j\}_{j=1}^r$ such that $\{T_1(e_i)\}_{i=1}^n \cup \{v_j\}_{j=1}^r$ form a basis of $V_2$. Thus $\text{dim}(V_2)=n+r$. Now $T_2$ is surjective, so $\{T_2(v_j)\}_{j=1}^r$ span $V_3$. It remains to show that $\{T_2(v_j)\}_{j=1}^r$ is linearly independent. Suppose $\sum_j a_jT_2(v_j)=0$. Then $\sum_j T_2(a_jv_j)=0\Rightarrow \sum a_jv_j \in \text{ker}(T_2)=\text{im}(T_1)$. Since $\{T_1(e_i)\}_{i=1}^n$ span the image of $T_1$, we have $\sum\limits_{j=1}^r a_jv_j=\sum\limits_{i=1}^n b_i T_1(e_i)$ for some constants $\{b_i\in \mathbb{R}\}_{i=1}^n$. In particular, $\sum\limits_{i=1}^n -b_i T_1(e_i)+\sum\limits_{j=1}^r a_jv_j=0$. Since $\{T_1(e_i)\}_{i=1}^n \cup \{v_j\}_{j=1}^r$ are linearly independent in $V_2$, all the coefficients $\{b_i\}$ and $\{a_j\}$ must be zero. Therefore, $\text{dim}(V_3)=r$. We have established that $\text{dim}(V_2)=n+r=\text{dim}(V_1)+\text{dim}(V_3).\:\Box$

Using this, we can derive many results in linear algebra.

Theorem: If $T:V\to W$, then $0 \to \text{ker}(T) \xrightarrow{i} V \xrightarrow{T} \text{im}(T) \to 0$ is a short exact sequence, where $i$ is the inclusion map.
Corollary (Rank-nullity): If $T:V\to W$ is a linear map of finite-dimensional vector spaces, then $\text{dim}(\text{ker}(T))+\text{dim}(\text{im}(T))=\text{dim}(V)$.

Theorem: If $A,B \subset V$ are subspaces such that $V=\text{span}(A\cup B)$, then $0\to A \cap B \to A \oplus B \to V \to 0$ is a short exact sequence, where $A \cap B\to A\oplus B$ is the diagonal map $v\mapsto (v,v)$, and the map $A\oplus B\to V$ is the difference map $(a,b)\mapsto a-b$.
Corollary: $\text{dim}(V)=\text{dim}(A)+\text{dim}(B)-\text{dim}(A \cap B)$.

Theorem: If $W\subset V$ is a subspace, then $0\to W \xrightarrow{i} V \xrightarrow{g} V/W \to 0$ is exact, where $i$ is the inclusion map and $g$ is the quotient map $g(v)=[v]$.
Corollary: $\text{dim}(V/W)=\text{dim}(V)-\text{dim}(W)$.

Dual space
Proposition: If $U\xrightarrow{f}V\xrightarrow{g}W$ is an exact sequence of linear transformations of vector spaces, then $W^*\xrightarrow{g^*}V^*\xrightarrow{f^*}U^*$ is also an exact sequence of linear transformations.

Corolloary: Let $f\in L(V,W)$ and let $f^*\in L(W^*,V^*)$ be the induced linear transformation on dual spaces.
1. If $f$ is injective, then $f^*$ is surjective.
2. If $f$ is surjective, then $f^*$ is injective.
Proof:
If $f$ is injective, then $0\to V\to W$ is exact at $V$ and so $W^*\to V^*\to 0$ is an exact sequence. Thus $f^*$ is surjective.
If $f$ is surjective, then $V\to W\to 0$ is exact at $W$ and so $0\to W^*\to V^*$ is an exact sequence. Thus $f^*$ is injective.

More intuition about exact sequences
In Euclidean space $\mathbb{R}^n$, if $V$ is a subspace of $\mathbb{R}^n$, then we think of it as filling out "some of" the dimensions in $\mathbb{R}^n$ and its orthogonal complement $V^\perp$ fills out the other directions. Together they span $\mathbb{R}^n$ with no redundancies: $\mathbb{R}^n=V\oplus V^\perp$. See here for more examples of direct sum.

More generally, we don't have an inner product and so we can't form orthogonal complements, but we can still talk about submodules and quotients.

If $A$ is a submodule of $B$, then $A$ fills up "some of the directions" in $A$, and the remaining directions are encoded in $B/A$. When we have a submodule $A\subset B$, a surjection $B \to C$, and if $A$ happens to be the kernel of the map $B \to C$, then we are in the previous situation: $A$ fills out some of the directions in $B$, and all the complementary directions are encoded in $C$. So we introduce the terminology $0\to A\to B\to C\to 0$ is a short exact sequence to capture this situation.

Graph theory: Euler's formula
Inclusion-exclusion

References
Intuitive meaning of exact sequence
Linear algebra [p38-40,75,76,84]
Exact sequences in linear algebra [p4-6]
Definition of short exact sequence [p1,2]
Splitting and gluing [p1,2,5]
Simple groups and short exact sequences [p2,4]

More to explore:
Homological algebra

Tuesday, 28 July 2015

Quotient spaces

Recall the definition of sum of subspaces: if $U,W$ are subspaces of $V$, then $U+W=\{u+w\:|\:u \in U\:\text{and}\: w\in W\}$. In case one of the sets contain only one element, we can say $\{u_0\}+W=u_0+W=\{u_0+w\:|\: w\in W\}$. We call $u_0+W$ a coset (translate) of $W$. [A coset $v+W$ is not a subspace unless $v \in W$. The simple reason is that after translation, the zero element is no longer in $v+W$. In fact, such cosets are examples of affine subspaces.] Geometrically, all of the sets $u_0+W$ are parallel pictures of $W$ that are translated in $V$ as we change $u_0$. We can also say that $U$ and $W$ are parallel and denote $W \parallel U$ if $U=u_0+W$ for some $u_0\in V$.

Note that several of these translates can be equal. Let's take a look at an example. Let $W=\{(x,0)\:|\: x\in \mathbb{R}\},v_1=(0,1),v_2=(1,1)$. Then $v_1+W=\{(x,1)\:|\:x\in \mathbb{R}\}=v_2+W$. Also notice that $v_2-v_1=(1,0)\in W$. Thus translating $W$ by $v_1$ or $v_2$ is the same. This motivates the following proposition.

Proposition (i): $v+W=v'+W \iff v-v'\in W$.
Proof: Suppose $v+W=v'+W$. Then $v=v'+w$ for some $w\in W$, implying $v-v'=w\in W$.
Conversely, suppose $v-v'\in W$. Note that if $z\in W$, then $z+W=W$ since $W$ is a subspace. Then $v+W=v'+(v-v')+W=v'+W.\:\Box$

According to the proposition, two elements $v$ and $v'$ of $V$ determine the same element of $V /W$ if and only if $v-v'\in W$. So we can think of $V/W$ as the set of all elements of $V$ under the extra condition that two elements of $V$ are declared to be the same if their difference is in $W$. Thus we can define $V/W$ to be the set of equivalence classes of the equivalence relation $∼$ on $V$ defined by $v \sim v'$ if $v-v'\in W$. Notice how similar this definition of relation is to congruence relation. So we can also say $v \equiv v' \:\text{mod}\:W$ if $v-v' \in W$. As an exercise, try to prove that the equivalence classes of $\equiv \text{mod}\:W$ are precisely the cosets of $W$.

Definition: A quotient set $V/W$, pronounced $V\:\text{mod}\:W$, is the set of all cosets of $W$, defined by $V/W=[v]=\{v+W\:|\:v\in V\}$.
[Informally, a quotient set is what we get when we "divide" a set $A$ by $B\subset A$, wherein we set all elements of $B$ to the identity in $A$. For example, if $A=\mathbb{Z}$ and $B=\{5n\:|\:n\in \mathbb{Z}\}$, then we're making all multiples of $5$ zero, so the quotient is $\{0,1,2,3,4\}$ (the set of all equivalence classes).]

We want to turn $V/W$ into a vector space (quotient space), so we have to define addition and scalar multiplication.

Definition: The sum of two elements $v+W$ and $v'+W$ of $V/W$ is defined by $(v+W)+(v'+W) := (v+v')+W$.
For $a\in \mathbb{F}$, the scalar multiplication of $a$ on $v+W$ is defined by $a\cdot (v+W) := av+W$.

It is necessary to check that these definitions are well defined, that is, these operations should not depend on which element of $v+W$ we choose to represent $v+W$.

Proposition (ii): Suppose $v+W=v'+W$. Then $(v+W)+(v''+W)=(v'+W)+(v''+W)$ for any $v''+W\in V/W$ and $a\cdot (v+W)=a\cdot (v'+W)$ for any $a\in \mathbb{F}$.
Proof:
WLOG it is enough to show that $(v+W)+(v''+W) \subset (v'+W)+(v''+W)$. Let $u\in(v+W)+(v''+W)=(v+v'')+W$. Then there exists $w\in W$ such that $u=(v+v'')+w$. Since $v+W=v'+W$, we have $v-v'\in W$. Denote this element by $w'$. Then $v=v'+w'$. Thus $u=(v+v'')+w=((v'+w')+v'')+w=(v'+v'')+(w'+w)$, showing that $u\in (v'+v'')+W$ since $w'+w\in W$.
We conclude that $(v+W)+(v''+W)=(v'+W)+(v''+W)$.
WLOG it is enough to show that $a\cdot (v+W)\subset a\cdot (v'+W)$. Let $u\in a\cdot (v+W)=av+W$. Then $u=av+w$ for some $w\in W$. Denote by $w'$ the element $v-v'\in W$. Then $u=av+w=a(v'+w')+w=av'+(aw'+w)\in av'+W$ since $aw'+w\in W$ as $W$ is closed under addition and scalar multiplication.
We have $a\cdot (v+W)=a\cdot (v'+W)$
Thus both addition and scalar multiplication on $V/W$ is well-defined. $\Box$

Note that the above depends crucially on the assumption that $W$ is a subspace of $V$; if $W$ were just an arbitrary subset of $V$, addition and scalar multiplication on $V/W$ would not be well-defined. With these operations then, $V/W$ becomes a vector space over $\mathbb{F}$.

It is important to keep in mind that the quotient space is in general not a subspace, nor naturally equivalent to one. But because of the presence of a natural inner product on $\mathbb{R}^n$, there is a natural way to choose a second subspace orthogonal to the given one, and that second subspace serves as a natural isomorphic model of the quotient space by the given subspace. Namely, if $V$ has an inner product, then $V/W$ is isomorphic to $W^\perp=\{v\:|\:\langle v,w \rangle =0\: \forall\: w\in W\}$.

Summary of concepts: sum of subspaces, coset (translate), subspaces, equivalence relation & equivalence class, congruence relation, quotient set

Examples:

The quotient space in $\mathbb{R}^3$ by a line is a two-dimensional vector space that is isomorphic to a plane. It does not necessarily equal a plane in $\mathbb{R}^3$. In particular, the quotient space doesn't need to be a subset of $\mathbb{R}^3$. The quotient space is something very abstract. It's an entirely new vector space.

The quotient space in $\mathbb{R}^3$ by a plane is the set of all planes parallel to the plane given by the equation $z=0$. So any plane with equation $z=k$ is an element of the quotient space, which is isomorphic to $\mathbb{R}$, but note that the quotient space does not necessarily equal a line in $\mathbb{R}^3$.

In the quotient $\mathbb{R}[x]/(x^2+1)$, the polynomial $x^2+1$ is equivalent to $0$. It follows from the fact that $0$ and $x^2+1$ are in the same equivalence class since $x^2+1=0+(x^2+1)$. So we have $x^2=-1$, which means $x=i$ or $x=-i$. Since an arbitrary element of $\mathbb{R}[x]$ looks like $a_0+a_1x+a_2x^2+\cdots+a^nx^n\; a_i\in \mathbb{R}$, in the quotient $\mathbb{R}[x]/(x^2+1)$ the polynomial becomes $f=a_0+a_1 i+a_2 i^2+\cdots+a^n i^n$. Since $i^2=-1$, we have $i^3=-i, i^4=1$. Collecting terms we can write $f$ as $a+bi$ for some $a,b\in \mathbb{R}$. Therefore, we can say $\mathbb{R}[x]/(x^2+1)\cong \mathbb{C}$.

Quotient space $\mathbb{P}_n/\mathbb{R}$
Let $V=\mathbb{P}_n=\{a_nx^n+a_{n-1}x^{n-1}+\cdots+a_1x+a_0\:|\:a_0,a_1,\cdots,a_n\in \mathbb{R}\}$ and $W=\{a_0\:|\:a_0\in \mathbb{R}\}=\mathbb{R}$.
Let $[f(x)], [g(x)]\in V/W$. Then $[f(x)]=[g(x)] \iff f(x)-g(x) \in W \iff f(x)-g(x)=c$ for $c\in \mathbb{R}$.
Let a differential operator be $D:V \to V$ where $D[f(x)]=\frac{df}{dx}$.
We have $D[c]=\frac{d}{dx}(c)=0$, which means $\text{ker}(D)=[c]=W$, the set of constant polynomials. This operator is not 1-1 because different functions after differentiation can give the same constant polynomial.
Now let's change the domain of the differential operator to $\overline{D}:V/W \to V$ where $\overline{D}[f(x)]=\frac{df}{dx}$. Now $\text{ker}(\overline{D})=[c]=W=[0]+W=0_{V/W}$. So the operator is 1-1 in the quotient space.

The key idea is this: if a mapping is not 1-1 in which different points map to the same point, we can take all those different points to be elements of a common coset. In other words, we are identifying points that are equivalent to one another in some equivalence relation. With the quotient space formed by squashing those points to the same coset, we have a 1-1 mapping. Similarly, we can modify the codomain and turn a mapping into a surjection, which leads to the following proposition.

Proposition (iii) Canonical epimorphism: The function $\pi:V\to V/W$ defined by $\pi(v)=[v]=v+W$ is a linear transformation with $\text{ker}(\pi)=W, \text{im}(\pi)=V/W$.
Proof:
$\pi(u+v)=(u+v)+W=(u+W)+(v+W)=\pi(u)+\pi(v)$
$\pi(av)=(av)+W=a(v+W)=a\pi(v)$
It follows that $\pi$ is a linear transformation.
The image of $\pi$ is $V/W$ by definition.
For the kernel of $\pi$, $v\in \text{ker}(\pi) \iff v+W=0+W \iff v\in W.\:\Box$

Corollory: Let $V$ be a vector space and $W$ be a subspace of $V$. Then $\text{dim}(W)+\text{dim}(V/W)=\text{dim}(V)$.
Proof: Let $\pi:V\to V/W$ be the canonical epimorphism. By the dimension theorem, we have $\text{dim}\:\text{ker}(\pi)+\text{dim}\:\text{im}(\pi)=\text{dim}(V)$. But $\text{ker}(\pi)=W$ and $\text{im}(\pi)=V/W$, so the claim follows. $\Box$

Theorem: Let $W$ be a subspace of $V$ with quotient mapping $T:V\to V/W$. A subspace $U$ is a complement to $W$ iff $T|_U:U\to V/W$ is an isomorphism.

Universal property of the quotient
Let $V$ be a vector space over $\mathbb{F}$ and $W$ be a subspace of $V$. Define $\pi:V\to V/W$ by $v \mapsto [v]$. If $W'$ is a vector space over $\mathbb{F}$ and $f:V\to W'$ is a linear map whose kernel contains $W$, then there exists a unique linear map $g:V/W\to W'$ such that $f=g\circ \pi$. This is summarized by the following commutative diagram:

$V \xrightarrow{\qquad f \qquad} W'\\ \; \pi \searrow \quad \qquad \nearrow g\\ \qquad\;\; V/W$

Proof:
Let $W'$ be a vector space over $\mathbb{F}$, $f:V\to W'$ be a linear map with $W \subset \text{ker}f$. Then define $g:V/W\to W'$ to be the map $[v]\mapsto f(v)$.
$g$ is well-defined because if $[v]\in V/W$ and $v_1,v_2\in V$ are both representatives of $[v]$, then there exists $w\in W$ such that $v_1=v_2+W$.
Hence $f(v_1)=f(v_2+w)=f(v_2)+f(w)=f(v_2)$ since f is linear.
The linearity of $g$ comes from linearity of $f$.
Also, by definition $f=g\circ \pi$.
It remains to prove $g$ is unique.
Suppose $h:V/W\to W'$ is such that $h\circ \pi=f=g\circ \pi$. Then $h([v])=g([v])$ for all $v\in V$. Since $\pi$ is surjective, $h=g.\:\Box$

First isomorphism theorem
Let $V,W$ be vector spaces over $\mathbb{F}$ and $T:V \to W$ a linear map. Then $V/\text{ker}(T) \cong \text{im}(T)$, namely $V/\text{ker}(T)$ is isomorphic to $\text{im}(T)$.

Proof: Consider the mapping $\overline{T}:V/\text{ker}(T) \to W \;\;\; \overline{T}:[v]=v+\text{ker}(T) \mapsto T(v)$.

A commutative diagram would express this as

$V \xrightarrow{\qquad T \qquad} W\\ \; \pi \searrow \quad \qquad \nearrow \overline{T}\\ \qquad V/ \text{ker}(T)$

We can use similar arguments as the previous example to prove that $\overline{f}$ is well-defined and linear.
Now suppose $\overline{T}([v])=0$. Then $T(v)=0$, which implies $v\in \text{ker}(T)$. This means $[v]=0_{V/\text{ker}(T)}$. Then $\overline{T}$ is injective. Hence if we confine the codomain of $\overline{T}$ to $\text{im}(T)$, then $\overline{T}$ becomes a bijection and hence $V/\text{ker}(T) \cong \text{im}(T).\:\Box$

Example:
$T:\mathbb{R}^{n\times n} \to \mathbb{R}^{n\times n}\quad T(A)=A-A^T$
$\text{ker}(T)=\{A\in \mathbb{R}^{n\times n}\:|\:A=A^T\}$ set of symmetric matrices
$\text{im}(T)=\{B=A-A^T\:|\:A\in \mathbb{R}^{n\times n}\}$
$B^T=A^T-A=-B$ set of antisymmetric matrices
$\mathbb{R}^{n\times n}/ \{n\times n\;\text{symmetric matrices}\} \cong \{n\times n\;\text{antisymmetric matrices}\}$

Strong form of the first isomorphism theorem
Let $T:V\to W'$ be a linear transformation of vector spaces over $\mathbb{F}$. Suppose $W \subset \text{ker}(T)$. Then there exists a unique linear transformation $\overline{T}:V/W\to \text{im}(T)$ such that $\overline{T}\circ \pi=T$, where $\pi:V\to V/W$, and $\overline{T}$ is surjective. Also, $\text{ker}(\overline{T})\cong \text{ker}(T)/W$ and $(V/W)/(\text{ker}(T)/W)\cong \text{im}(T)$.

Second isomorphism theorem
Let $U,W$ be subspaces $V$. Then $(U+W)/U \cong W/(U \cap W)$.

Third isomorphism theorem
Let $V$ be a vector space and $U\subset W\subset V$ be subspaces. Then $(V/U)/(W/U)\cong V/W$.

References:
An awesome video on quotient space
Definition of quotient set
Basis for $V/U$
Examples of quotient spaces
Intuition about quotient spaces
Constructing $\mathbb{C}$ from $\mathbb{R}$

Saturday, 25 July 2015

Commutative diagrams

Commutative diagrams are graphic or reasoning devices that express the same result in two or more ways by composition. [See here for the formal definition.] They give us freedom to choose the method that best suits our purpose. In complex situations, they help us make sense of relationships we may have otherwise missed. Examples of commutative diagrams abound in linear algebra and abstract algebra.

Examples:
Extension by linearity
Let $V$ be a vector space over $\mathbb{F}$ with basis $B$. Then given any vector space $W$ over $\mathbb{F}$ and any function $f:B\to W$, there exist a unique linear transformation (called linear extension) $\overline{f}:V\to W$ which extends $f$ in the sense that $\overline{f}(v)=f(v)$ for all $v\in B$, so that the diagram below commutes:

$B \xrightarrow{\quad f \quad} W\\ \; i \searrow \quad \;\; \nearrow \overline{f}\\ \qquad \; V$

[$\overline{f}(v)=f(v)$ for all $v\in B$ can be replaced by $\overline{f}|_B=f$.]

This theorem tells us that to define a linear map, we only need to specify where the basis elements are mapped to. Once this is done, we can always extend the mapping to the whole space by linearity. Also, there is no constraint on how such basis elements should be mapped. That is, a linear map is entirely determined by its action on a basis.

Now suppose a linear map $f:X \to Y$ is given between finite-dimensional spaces $X,Y$ and that $\{e_1,\cdots,e_n\},\{v_1,\cdots,v_m\}$ are bases for $X,Y$ respectively. From what we have seen above, the mapping $f$ is uniquely determined by specifying $f(e_i),i=1,\cdots,n$. Since $\{v_1,\cdots,v_m\}$ is a basis for $Y$, each $f(e_i)$ can be expressed as a linear combination of $\{v_1,\cdots,v_m\} \Rightarrow f(e_i)=\sum\limits_{j=1}^m a_{ji}v_j$.

In matrix notation, this means

$\begin{pmatrix} f(e_1) & \cdots & f(e_n)\end{pmatrix}=\begin{pmatrix} v_1 & \cdots & v_m \end{pmatrix} \begin{pmatrix} a_{11} & \cdots & a_{1n}\\ a_{21} & \cdots & a_{2n} \\ & \cdots & \\ a_{m1} & \cdots & a_{mn} \end{pmatrix}.$

Hence every element $x=\sum\limits_{i=1}^n x_ie_i$ is mapped, by the linearity of $f$, as

$f(\sum\limits_{i=1}^n x_ie_i)=\sum\limits_{i=1}^n x_i f(e_i)=\sum\limits_{i=1}^n \sum\limits_{j=1}^m a_{ji}x_iv_j=\sum\limits_{i=1}^m(\sum\limits_{j=1}^n a_{ij}x_j)v_i$.

In other words, when we express $f(\sum\limits_{i=1}^n x_ie_i)$ as $\sum\limits_{i=1}^m y_iv_i$, the coefficients $y_1,\cdots,y_m$ are given by

$\begin{pmatrix}y_1\\y_2\\ \vdots \\y_m \end{pmatrix}=\begin{pmatrix} a_{11} & \cdots & a_{1n}\\ a_{21} & \cdots & a_{2n} \\ & \cdots & \\ a_{m1} & \cdots & a_{mn} \end{pmatrix} \begin{pmatrix}x_1\\x_2\\ \vdots \\x_m\end{pmatrix}$.

This is the matrix representation of $f$ (with respect to the bases $\{e_1,\cdots,e_n\}$ and $\{v_1,\cdots,v_m\}$).

Change of coordinates
Let $V$ be a n-dimensional vector space. Let $\{v_1,v_2,\cdots,v_n\}$ be a basis for $V$ and $g_1:V\to \mathbb{F}^n$ be the coordinate mapping corresponding to this basis. Let $\{u_1,u_2,\cdots,u_n\}$ be another basis for $V$ and $g_2:V\to \mathbb{F}^n$ be the coordinate mapping corresponding to this basis.

$\qquad V\\ g_1 \swarrow \;\; \searrow g_2\\ \mathbb{F}^n \xrightarrow{\qquad\;} \mathbb{F}^n$

The composition $g_2\circ g_1^{-1}$ is a linear operator on $\mathbb{F}^n$. It has the form $x \mapsto Ax$, where $A$ is an $n\times n$ matrix. $A$ is the transition matrix from $\{v_1,v_2,\cdots,v_n\}$ to $\{u_1,u_2,\cdots,u_n\}$. Columns of $A$ are coordinates of the vectors $\{v_1,v_2,\cdots,v_n\}$ with respect to the basis $\{u_1,u_2,\cdots,u_n\}$.

Change of coordinates for a linear operator
Let $T:V\to V$ be a linear operator on a vector space $V$.
Let $A$ be the matrix of $L$ relative to a basis $\{a_1,a_2,\cdots,a_n\}$ for $V$.
Let $B$ be the matrix of $L$ relative to a basis $\{b_1,b_2,\cdots,b_n\}$ for $V$.
Let $U$ be the transition matrix from the basis $\{a_1,a_2,\cdots,a_n\}$ to $\{b_1,b_2,\cdots,b_n\}$.

$\begin{array}[c]{ccc}\qquad \text{a-coordinates of v}&\xrightarrow{\quad A\quad}&\text{a-coordinates of T(v)}\\ \scriptstyle{U}\Bigg\downarrow& &\Bigg\downarrow \scriptstyle{U}\\ \qquad \text{b-coordinates of v} & \xrightarrow{\quad B\quad} & \text{b-coordinates of T(v)} \end{array}$

It follows that $UAx=BUx$ for all $x \in \mathbb{F}^n \Rightarrow UA=BU.$ Then $A=U^{-1}BU$ and $B=UAU^{-1}$.

Another similar commutative diagram [p1]
More similar diagrams [p10,14,15]

Isomorphism
A linear map $T:V\to W$ is isomorphism if we can find $L:W\to V$ such that $TL=I_W$ and $LT=I_V$.

$\begin{array}{lll} V & \xrightarrow{\quad T\quad} & W \\
\uparrow {I_V} & & \uparrow{I_W} \\
V & \xleftarrow{\quad L\quad} & W \\
\end{array}$

Universal property of the quotient
Let $V$ be a vector space over $\mathbb{F}$ and $W$ be a subspace of $V$. Define $\pi:V\to V/W$ by $v \mapsto [v]$. If $W'$ is a vector space over $\mathbb{F}$ and $f:V\to W'$ is a linear map whose kernel contains $W$, then there exists a unique linear map $g:V/W\to W'$ such that $f=g\circ \pi$. This is summarized by the following commutative diagram:

$V \xrightarrow{\qquad f \qquad} W'\\ \; \pi \searrow \quad \qquad \nearrow g\\ \qquad\;\; V/W$

Proof:
Let $W'$ be a vector space over $\mathbb{F}$, $f:V\to W'$ be a linear map with $W \subset \text{ker}f$. Then define $g:V/W\to W'$ to be the map $[v]\mapsto f(v)$.
$g$ is well-defined because if $[v]\in V/W$ and $v_1,v_2\in V$ are both representatives of $[v]$, then there exists $w\in W$ such that $v_1=v_2+W$.
Hence $f(v_1)=f(v_2+w)=f(v_2)+f(w)=f(v_2)$ since f is linear.
The linearity of $g$ comes from linearity of $f$.
Also, by definition $f=g\circ \pi$.
It remains to prove $g$ is unique.
Suppose $h:V/W\to W'$ is such that $h\circ \pi=f=g\circ \pi$. Then $h([v])=g([v])$ for all $v\in V$. Since $\pi$ is surjective, $h=g.\:\Box$

Homomorphism theorem
Let $X,Y$ be vector spaces over $\mathbb{F}$ and $f:X \to Y$ a linear map. Then $X/\text{ker}f \cong \text{im}f$, namely $X/\text{ker}f$ is isomorphic to $\text{im}f$.

Proof: Consider the mapping $\overline{f}:X/\text{ker}f \to Y \; \overline{f}:[x]=x+\text{ker}f \mapsto f(x)$.

A commutative diagram would express this as

$X \xrightarrow{\qquad f \qquad} Y\\ \; \pi \searrow \quad \qquad \nearrow \overline{f}\\ \qquad X/ \text{ker}f$

We can use similar arguments as the previous example to prove that $\overline{f}$ is well-defined and linear.
Now suppose $\overline{f}([x])=0$. Then $f(x)=0$, which implies $x\in \text{ker}f$. This means $[x]=0_{X/\text{ker}f}$. Then $\overline{f}$ is injective. Hence if we confine the codomain of $\overline{f}$ to $\text{im}f$, then $\overline{f}$ becomes a bijection and hence $X/\text{ker}f \cong \text{im}f.\:\Box$

Complexification of vector spaces

Universal property of free groups
Let $G$ be a group wth a generating set $X \subset G$. Then $G$ is free on $X$ iff every map $\phi:X\to H$ from $X$ into a group $H$ can be extended to a unique homomorphism $\overline{\phi}:G\to H$, so that the diagram below commutes:

$X \xrightarrow{\quad \phi \quad} H\\ \; i \searrow \quad \;\; \nearrow \overline{\phi}\\ \qquad \; G$

where $X \xrightarrow{i} G$ is the inclusion of $X$ into $G$.

As you may have noticed, extension by linearity, universal property of quotient, universal property of free groups, and universal property of complexification all share a similar commutative diagram. All these come with three sets and three maps (one of them being injective or surjective; another being natural/standard embedding; lastly a unique map). They are called inital properties in category theory. Check out the formal definition of universal property.

Applications:

The reason we can use Fourier transform to solve differential equations is that the following diagram commutes.

$\begin{array}[c]{ccc}\qquad \text{problem}&\xrightarrow{\text{Fourier transform}}&\text{transformed problem}\\ \scriptstyle{\text{solve (hard)}}\Bigg\downarrow& &\Bigg\downarrow \scriptstyle{\text{solve (easy)}}\\ \qquad \text{solution} & \xleftarrow{\text{Inverse Fourier transform}} & \text{transformed solution} \end{array}$

Outside of the realm of mathematics, we can also see instances of commutative diagram. A famous example is Hess's law in physical chemistry, which states that the change of enthalpy in a chemical reaction is independent of the pathway between the initial and final states. What this means is that we have the following commutative diagram:

$\text{reactants A} \xrightarrow{\Large \text{route 1}} \text{products B}\\ \quad \qquad\ \searrow \quad \qquad \qquad \nearrow\\ \qquad \quad \quad \text{intermediates}\\ \qquad \qquad \quad\text{(route 2)}$

In fact, we have been using this kind of 'reasoning' all along.

Examples:

In the derivation of normal distribution, we express the same probability in two different ways.
Some trigonometric identities can be proved employing this idea: evaluating the integral $\int \sin 2x \cos x \:dx$ in two different ways can yield the identity $\cos 3x=4\cos^3 x-3\cos x$.
Proofs using combinatorial arguments count the same quantity in two different ways.

More to explore:
Category theory
Universal property
More examples of commutative diagrams
Definition of commutative diagrams
Free groups

References:
Commutative diagram
Universal property of free groups [p6]
Examples of commutative digrams [p4-6,28]
Quotient space [p1-3]
Extension by linearity
From Vector Spaces to Function Spaces: Introduction to Functional Analysis with Applications by Yutaka Yamamoto [p16,17,29]

Tuesday, 16 June 2015

Complexification of vector spaces

Motivation to complexify real vector spaces:
One reason is to solve equations. If we want to prove theorems about real solutions to a system of real linear equations or a system of real linear differential equations, it can be convenient to examine the complex solution space, which gives us information about the real solution space.

Difference between direct sum and Cartesian product of two vector spaces
For the direct sum, the result is a new object of the same type: it has an algebraic structure. But Cartesian product is just a set. For example $R\oplus R$ is the set $R\times R$ with the operation $(a,b)+(c,d)=(a+c,b+d)$.

Complexifying with direct sums
Definition: The complexification of a real vector space $W$ over $\mathbb{C}$ is defined to be $W_\mathbb{C}=W\oplus W$, the set of all ordered pairs $(w_1,w_2)$, where $w_1,w_2 \in W$ obtained by formally extending scalar multiplication to include multiplication by complex numbers, with vector addition and multiplication by complex scalars defined as:

$(w_1+iv_1)+(w_2+iv_2)=(w_1+w_2)+i(v_1+v_2)$ for $w_1,v_1,w_2,v_2 \in W$

$(a+bi)(w_1,w_2)=(aw_1-bw_2,bw_1+aw_2)$ for $a,b \in \mathbb{R}, w_1,w_2 \in W$

This rule of multiplication is reasonable if you think about a pair $(w_1,w_2)\in W\oplus W$ as a formal sum of $w_1+iw_2$: $(a+bi)(w_1+iw_2)=aw_1+aiw_2+biw_1-bw_2=(aw_1-bw_2)+i(bw_1+aw_2)$.

In particular, $i(w_1,w_2)=(-w_2,w_1)$.
[$(a+bi)(w_1,w_2)"="a(w_1,w_2)+bi(w_1,w_2)$ For this expression to equal $(aw_1-bw_2)+i(bw_1+aw_2)$, we must have $i(w_1,w_2)=(-w_2,w_1)$.]

We can check that $W_\mathbb{C}$ is a complex vector space by the multiplication rule. Since $i(w,0)=(0,w)$, we have $(w_1,w_2)=(w_1,0)+(0,w_2)=(w_1,0)+i(w_2,0)$. The $\mathbb{R}$-linear function $w \mapsto (w,0)$ is called the standard embedding of $W$ into $W_\mathbb{C}$. With this, we can regard $W_\mathbb{C}$ as $W+iW$.

Examples:
The complexifications of $\mathbb{R}^n,M_n(\mathbb{R}),\mathbb{R}[X]$ are isomorphic to $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$ by sending an ordered pair $(w_1,w_2)$ in $\mathbb{R}^n \oplus \mathbb{R}^n,M_n(\mathbb{R})\oplus M_n(\mathbb{R}),\mathbb{R}[X]\oplus \mathbb{R}[X]$ to $w_1+iw_2$ in $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$.

[$\mathbb{R}[X]$ is the set of all polynomials with real coefficients in the variable $x$.]

Namely, we can identify the complexification $(\mathbb{R}^2)_\mathbb{C}$ with $\mathbb{C}^2$ by $(w_1,w_2)\mapsto w_1+iw_2$$\in \mathbb{R}^2+i\mathbb{R}^2=\mathbb{C}^2$ and this sends the basis vectors $\tiny\begin{pmatrix} 1\\0 \end{pmatrix}$ and $\tiny\begin{pmatrix} 0\\1 \end{pmatrix}$ of $\mathbb{R}^2 \subset (\mathbb{R}^2)_\mathbb{C}$ to $\tiny\begin{pmatrix} 1\\0 \end{pmatrix}$ and $\tiny\begin{pmatrix} 0\\1 \end{pmatrix}$ in $\mathbb{C}^2$, which are the standard basis vectors of $\mathbb{C}^2$ as a complex vector space.

Similarly, the identifications of $(\mathbb{R}^n)_\mathbb{C},M_n(\mathbb{R})_\mathbb{C},\mathbb{R}[X]_\mathbb{C}$ with $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$ turn every real basis of $\mathbb{R}^n,M_n(\mathbb{R}),\mathbb{R}[X]$ into a complex basis of $\mathbb{C}^n,M_n(\mathbb{C}),\mathbb{C}[X]$.

Denote the space of complex $n\times n$ matrices by $M_n(\mathbb{C})$, the space of anti-Hermitian $A^*=-A$ matrices by $u(n)$, the space of Hermitian matrices $A^*=A$ by $iu(n)$. Then $M_n(\mathbb{C})=u(n)\oplus iu(n)$. Thus $u(n)$ is a real form in $M_n(\mathbb{C})$ and $M_n(\mathbb{C})$ is a complexification of $u(n)$.

Definitions:
A map $T$ between complex vector spaces is said to be $\mathbb{C}$-linear if $T(\lambda w)=\lambda T(w)$ for all $w$ in the domain in $T$ and for all $\lambda \in \mathbb{C}$.
A map $T$ between complex vector spaces is said to be $\mathbb{C}$-antilinear or conjugate linear if it is additive and if $T(\lambda w)=\overline{\lambda}T(w)$ for all $w$ in the domain in $T$ and for all $\lambda \in \mathbb{C}$.

Theorem: If $W=0$, then $W_\mathbb{C}=0$. If $W\neq 0$ and $\{e_1,\cdots,e_n\}$ is an $\mathbb{R}$-basis of $W$, then $\{(e_1,0),\cdots,(e_n,0)\}$ is a $\mathbb{C}$-basis of $W_\mathbb{C}$. In particular, $\text{dim}_\mathbb{C}(W_\mathbb{C})=\text{dim}_\mathbb{R}(W)$ for all $W$.

Proof: We want to prove $\{(e_1,0),\cdots,(e_n,0)\}$ is a basis for $W_\mathbb{C}$: spanning set & linearly independence.
Suppose $W\neq 0$ and $\{e_1,\cdots,e_n\}$ is an $\mathbb{R}$-basis of $W$.
Spanning:
If $\{e_1,\cdots,e_n\}$ is an $\mathbb{R}$-basis of $W$, then $\{(e_1,0),\cdots,(e_n,0),(0,e_1),\cdots,(0,e_n)\}$ is a basis for the real vector space $W\oplus W$.
Let $w\in W_\mathbb{C}$. Using the basis for $W\oplus W$, there exist $a_1,\cdots,a_n,b_1,\cdots,b_n \in \mathbb{R}$ such that
$\begin{align}w&=a_1(e_1,0)+\cdots+a_n(e_n,0)+b_1(0,e_1)+\cdots+b_n(0,e_n) & \qquad [*]\\
&=a_1(e_1,0)+\cdots+a_n(e_n,0)+ib_1(e_1,0)+\cdots+ib_n(e_n,0) &[ib(e_j,0)=b(0,e_j)]\\
&=(a_1+ib_1)(e_1,0)+\cdots+(a_n+ib_n)(e_n,0) & \qquad [**]\end{align}$
$[*]: \mathbb{R}$-linear combination of $(e_j,0)$ and $(0,e_j)$
$[**]: \mathbb{C}$-linear combination of $(e_j,0)$
Therefore, $\{(e_1,0),\cdots,(e_n,0)\}$ is a $\mathbb{C}$-linear spanning set of $W_\mathbb{C}$.
Linearly independence:
Suppose we can write $(0,0)$ as a finite $\mathbb{C}$-linear combination of $(e_j,0)$.
$(a_1+ib_1)(e_1,0)+\cdots+(a_n+ib_n)(e_n,0)=(0,0)$ for some real $a_i,b_i\;\;i=1,\cdots,n$
Equivalently, $(a_1e_1+\cdots+a_ne_n,b_1e_1+\cdots+b_ne_n)=(0,0).\qquad [bi(e_j,0)=(0,be_j)]$
Therefore, $\sum\limits_{i=1}^n a_ie_i=0$ and $\sum\limits_{i=1}^n b_ie_i=0$ in $W$.
Since $e_j$'s are linearly independent over $\mathbb{R}$, all the coefficients $a_j$ and $b_j$ are 0, so $a_j+ib_j=0$ for all $j=1,\cdots,n\:\Box$.

Universal property of complexification
Theorem: For any $\mathbb{R}$-linear map $W\xrightarrow{f}V$ from $W$ into a complex vector space $V$, there is a unique $\mathbb{C}$-linear map $W_\mathbb{C}\xrightarrow{\tilde{f}}V$ making the diagram

$\;\; W \xrightarrow{\quad \quad} W_\mathbb{C}\\ \quad f \searrow \quad \nearrow \tilde{f}\\ \qquad\quad V$

commute, where the map $W\to W_\mathbb{C}$ is the standard embedding.

This theorem says $W\to W_\mathbb{C}$ is an initial object that admits a unique morphism to all other objects.

Complexification of linear transformation
Theorem: Every $\mathbb{R}$-linear transformation $\varphi:W\to W'$ of real vector spaces extends in a unique way to a $\mathbb{C}$-linear transformation of the complexifications: there is a unique $\mathbb{C}$-linear map $\varphi_\mathbb{C}:W_\mathbb{C}\to W'_\mathbb{C}$ making the diagram

$\qquad \qquad \;W\xrightarrow{\quad\;\varphi\;\quad}W'\\ w\mapsto (w,0) \Bigg\downarrow \qquad\qquad \Bigg\downarrow \varphi(w)\mapsto (\varphi(w),0) \\ \qquad \qquad W_\mathbb{C} \xrightarrow{\quad \varphi_\mathbb{C} \quad} W'_\mathbb{C}$

commute, where the vertical maps are the standard embeddings of real vector spaces into their complexifications.

Proof: If such a $\mathbb{C}$-linear map $\varphi_\mathbb{C}$ exists, then the commutativity of the diagram says $\varphi_\mathbb{C}(w,0)=(\varphi(w),0)$ for all $w\in W$. Therefore, for $(w_1,w_2)\in W_\mathbb{C}$, we have

$\begin{align} \varphi_\mathbb{C}(w_1,w_2)&=\varphi_\mathbb{C}(w_1,0)+\varphi_\mathbb{C}(0,w_2)\\
&=(\varphi_\mathbb{C}(w_1),0)+i\varphi_\mathbb{C}(w_2,0)\\
&=(\varphi(w_1),0)+i(\varphi(w_2),0)\\
&=(\varphi(w_1),0)+(0,\varphi(w_2))\\
&=(\varphi(w_1),\varphi(w_2)) \end{align}$

This tells us what $\varphi_\mathbb{C}$ must be. So define $\varphi_\mathbb{C}:W_\mathbb{C}\to W'_\mathbb{C}$ by $\varphi_\mathbb{C}(w_1,w_2)=(\varphi(w_1),\varphi(w_2))$. We need to check that $\varphi_\mathbb{C}$ is $\mathbb{C}$-linear. Now $\varphi_\mathbb{C}(w_1,w_2)=\varphi_\mathbb{C}(w_1,0)+i\varphi_\mathbb{C}(w_2,0)=(\varphi(w_1),0)+i(\varphi(w_2),0)=\varphi(w_1)+i\varphi(w_2).\:\Box$

In other words, to an operator $\varphi:W\to W'$, there corresponds an operator $\varphi_\mathbb{C}:W_\mathbb{C}\to W'_\mathbb{C}$ given by the formula $\varphi_\mathbb{C}(a+ib)=\varphi(a)+i\varphi(b)$. The operator $\varphi_\mathbb{C}$ is called the complexification of $\varphi$.

If $\{w_1,\cdots,w_n\}$ is a basis for $W$, $\{w'_1,\cdots,w'_m\}$ is a basis for $W'$, and $A$ is the matrix representation of $\varphi$ with respect to these bases, then $A$, regarded as a complex matrix, is also the representation of $W_\mathbb{C}$ with respect to the corresponding bases in $W_\mathbb{C}$ and $W'_\mathbb{C}$.

So the complexification process is a formal, coordinate-free way of saying: take the matrix $A$ of $\varphi$, with its real entries, but operate on it as a complex matrix. The advantage of making this abstracted definition is that we are not required to fix a choice of coordinates and use matrix representations. Say we want to make arguments about the complex eigenvalues and eigenvectors for a transformation $T:W\to W'$, while non-real eigenvalues and eigenvectors, by definition, cannot exist for a transformation between real vector spaces. What we really mean are the eigenvalues and eigenvectors of $\varphi_\mathbb{C}$. Also, the complexification process generalizes without change for infinite-dimensional spaces.

Complexifying with tensor products [pending]

Conjugations
In the complexification $W_\mathbb{C}$ of a real vector space $W$, we can define complex conjugation by $\overline{w}=\overline{u+iv}=u-iv$.

In an arbitrary complex vector space, however, there is no natural, basis-independent way of defining complex conjugation of vectors. For example, if we set the complex conjugate of a vector $u=(a+bi)e_j$ to be $\overline{u}=(a-bi)e_j$, this definition will give a different answer in the basis $\{ie_j\}$ since $u=[-i(a+bi)](ie_j) \Rightarrow \overline{u}=[i(a-bi)](ie_j)=-(a-bi)e_j$.

Thus the concept of complex conjugation of vectors requires prior knowledge of the 'real part' of a complex vector space. The complexification of a real space has precisely the required extra structure needed to define complex conjugation of vectors.

Later, we will use conjugation to describe the real subspaces which have the given complex vector space as its complexification.

Realification/decomplexificaction
One way of creating a real vector space $W_\mathbb{R}$ from a complex vector space $W$ is to forget altogether about the possibility of multiplying vectors by complex numbers and only allow scalar multiplication with real numbers. In abstract algebra, we call this the restriction of scalars from complex numbers to real numbers. In this process a pair of vectors $u$ and $iu$ must be regarded as linearly independent vectors in $W_\mathbb{R}$ for any non-zero vector $u\in W$. Thus if $W$ is finite-dimensional and $\text{dim}W=n$, then $W_\mathbb{R}$ is $2n$-dimensional, namely $\text{dim}_\mathbb{R} W=2\text{dim}_\mathbb{C}W$, for if $\{e_1,\cdots,e_n\}$ is a basis of $W$ then $\{e_1,\cdots,e_n,ie_1,\cdots,ie_n\}$ is a linearly independent set of vectors spanning $W_\mathbb{R}$.

We can verify that if $A=B+iC$ is the matrix of a linear map $A:V\to W$ with respect to bases $\{e_1,\cdots,e_n\}, \{\epsilon_1,\cdots,\epsilon_m\}$ and the matrices $B$ and $C$ are real, then the matrix of the linear map $A_\mathbb{R}$ with respect to the bases $\{e_1,\cdots,e_n\},\{ie_1,\cdots,ie_n\}$ and $\{\epsilon_1,\cdots,\epsilon_m\},\{i\epsilon_1,\cdots,i\epsilon_m\}$ is of the form $\begin{pmatrix} B & -C \\ C & B \end{pmatrix}$.

Example:
Say $n=2$, if $\{a+ib,c+id\}$ is a basis for a complex vector space $W$, then $\{(a,b,0,0),(-b,a,0,0),(0,0,c,d),(0,0,-d,c)\}$ is the associated basis for $W_\mathbb{R}=\mathbb{R}^{2\times2}=\mathbb{R}^4$.

More to explore:
Complexification of Lie algebras – a tool useful for representation theory
Complex spectral theorem – deducing a real classification for orthogonal matrices
Spliting characteristic polynomials
Extension of scalars & restriction of scalars

References:
Complexification
More to explore
Difference between Cartesian product and tensor product of two vector spaces
[?] Change of basis matrix
Complexification and realification [p3,5]
A Course in Modern Mathematical Physics Groups, Hilbert Space and Differential Geometry by Peter Szekeres [p154-156]
Ordinary Differential Equations by Vladimir I. Arnol'd [p177-189]
Problems and Theorems in Linear Algebra by Viktor Vasil' evich Prasolov [p55-57]

Sunday, 14 June 2015

Morphisms

General maps

one-to-one	injective
onto	surjective
any map	bijective

Linear maps

any map	homomorphism
one-to-one	monomorphism
onto	epimorphism
one-to-one and onto	isomorphism

Linear self-maps

any map	endomorphism
one-to-one and onto	automorphism

Examples of isomorphisms and maps that identifies the isomorphism:

$M_{1,3}(\Bbb F) \cong M_{3,1}(\Bbb F)\\ (x_1,x_2,x_3)\mapsto \begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}$

$M_{2,2}(\Bbb F) \cong \Bbb F^4\\ \begin{pmatrix}a&b\\ c&d \end{pmatrix}\mapsto (a,b,c,d)$

$M_{2,3}(\Bbb F) \cong M_{3,2}(\Bbb F)\\ A\mapsto A^T$

The plane $z=0$ in $\Bbb R^3$ is isomorphic to $\Bbb R^2$.
$(x,y,0)\mapsto (x,y)$

$P_n\cong \Bbb R^{n+1}\\ a_0+a_1x+\cdots+a_nx^n\mapsto (a_0,a_1,\cdots,a_n)$

$P$ is isomorphic to $\Bbb R_0^\infty$.
$a_0+a_1x+\cdots+a_nx^n\mapsto (a_0,a_1,\cdots,a_n,0,0,\cdots)$

$M_{m,n}(\Bbb F) \cong L(\Bbb F^n, \Bbb F^m)$
$A\mapsto L_A$, where $L_A(x)=Ax$.

Any vector space $V$ of dimension $n$ is isomorphic to $\Bbb F^n$.
$v\mapsto [v]_\alpha$, where $\alpha$ is a basis for $V$.

Category theory
A morphism $f:A\to B$ in a category $C$ is an isomorphism if there exists a morphism $g:B \to A$ such that both ways to compose $f$ and $g$ give the identity morphisms on the respective objects, namely $$gf=1_A, \quad fg=1_B.$$
A morphism $f:A\to B$ in a category $C$ is an monomorphism if for every object $C$ and every pair of morphisms $g,h:C\to A$, the condition $fg=fh$ implies $g=h$. Namely, the following diagram commutes:

When we say a diagram commutes, it means all ways to compose morphisms result in an identical morphisms. Therefore, whenever $f$ is an monomorphism and this diagram commutes, we can conclude $g=h$. The key idea is that monomorphisms allow us “cancel” $f$ from the left side of a composition.

The corresponding property for cancelling on the right is defined similarly.

A morphism $f:A\to B$ in a category $C$ is an epimorphism if for every object $C$ and every pair of morphisms $g,h:B\to C$, the condition $gf=hf$ implies $g=h$.

Whenever $f$ is an epimorphism and this diagram commutes, we can conclude $g=h$.

Reference:
Examples of isomorphisms
Properties of morphisms
An awesome video on morphisms

Friday, 5 June 2015

Four fundamental subspaces

Prerequisites

Orthogonal vectors & subspaces
Two vectors $v_1$ and $v_2$ are orthogonal if $v_1^T v_2=\left\lVert v_1 \right\rVert \left\lVert v_2 \right\rVert \cos\theta=0$.
Two subspaces $M_1$ and $M_2$ are orthogonal if $v_1 \bot v_2$ for all $v_1 \in M_1$ and $v_2 \in M_2$.

Orthogonal complement
The orthogonal complement to a subspace $M \subset R^n$ is defined by $M^\bot=\{w \in R^n: w^T v=0 \: \forall v \in M \}$.
$M^\bot$ is the largest (unique) subspace orthogonal to $M$.
The following holds:
$M^\bot$ is a subspace.
$R^n=M \oplus M^\bot$
If $\text{dim}\:M=k$, then $\text{dim}\:M^\bot=n-k$.
$(M^\bot)^\bot=M$

The four fundamental subspaces

Definitions
$A=\begin{pmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \cdots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{pmatrix}=\begin{pmatrix} \vec{a_1} & \cdots & \vec{a_n} \end{pmatrix}=\begin{pmatrix} \overline{a}_1^T \\ \overline{a}_2^T \\ \vdots \\ \overline{a}_m^T \end{pmatrix}$

Column space
$C(A)=\{Ax\:|\:x \in R^n \}=\{ \sum_{j=1}^n \vec{a_j}x_j\:|\:x_j \in R, j=1,2,\cdots,n\}$

Nullspace
$N(A)=\{x \in R^n\:|\:Ax=0\}=\{x \in R^n\:|\:\overline{a}_i^T x=0, i=1,2,\cdots,m\}$

Row space
$C(A^T)=\{A^Tu\:|\:u \in R^m \}=\{ \sum_{i=1}^m \overline{a}_i u_i\:|\:u_i \in R, i=1,2,\cdots,m\}$

Left nullspace
$N(A^T)=\{u \in R^m\:|\:A^T u=0\}=\{u \in R^m\:|\:\vec{a_j}^T u=0, j=1,2,\cdots,n \}=\{u \in R^m\:|\:u^T A=0\}$

The left nullspace is the space of all vectors y such that $A^Ty=0$, or equivalently $y^TA=0$. Thus the term left nullspace.

Geometric picture [later]

Relation to RREF [later]

Summary

Let $A$ be an $m\times n$ matrix, and $r$ be its rank.

subspace	notation	subspace of	dimension
row space	$C(A^T)$	$R^n$	$r$
column space	$C(A)$	$R^m$	$r$
nullspace	$N(A)$	$R^n$	$n-r$
left nullspace	$N(A^T)$	$R^m$	$m-r$

$\text{dim}\:C(A^T)+\text{dim}\:N(A)=\text{dim}\:R^n\\
\text{dim}\:C(A)+\text{dim}\:N(A^T)=\text{dim}\:R^m$

Fundamental Theorem of Linear Algebra

Let A be an $m \times n$ matrix.
1. $N(A)$ is the orthogonal complement of $C(A^T)$ in $R^n$.
2. $N(A^T)$ is the orthogonal complement of $C(A)$ in $R^m$.
3. $N(A) \oplus C(A^T)=R^n$
4. $N(A^T) \oplus C(A)=R^m$

The following orthogonality relations hold:
$C(A)^\bot=N(A^T)\\
N(A^T)^\bot=C(A)\\
C(A^T)^\bot=N(A)\\
N(A)^\bot=C(A^T)$

Explanations:

Orthogonal subspaces
We will examine an $m \times n$ matrix $A$. Then both the row space and the nullspace are subspaces of $R^n$. [$A^T$ is $n \times m$ and $u$ is $m \times 1$. By matrix multiplication, the row space is $n \times 1$.] Suppose $x$ is a vector in the nullspace of $A$. Then $Ax=0$. From the definition of matrix multiplication, we have $$\begin{pmatrix} \text{row 1} \\ \text{row 2} \\ \vdots \\ \text{row m} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{pmatrix}=0.$$ As the row space is the linear combination of the rows, the dot product of $x$ with any vector in the row space must be $0$. So if $v\in C(A^T)$ and $w\in N(A)$, we must have $v^T w=0$ Therefore, the row space and the nullspace of $A$ are orthogonal.

Similarly, every vector in the left nullspace of $A$, $N(A^T)$, is perpendicular to every vector in the column space of $A$, $C(A)$. Hence, the column space of A and the left nullspace of A are orthogonal.

Orthogonal complements
We have established that the row space and the nullspace are orthogonal. It turns out the nullspace is in fact the orthogonal complement of the row space. The dimension of the nullspace is $n-r$, where $r$ is the rank (dimension of row space) of $A$. If there were a vector orthogonal to the row space but not to the nullspace, then the dimension of $C(A^T)^\bot$ would be at least $n-r+1$. But this would be too large for both $C(A^T)$ and $C(A^T)^\bot$ to fit in $R^n$. Therefore, the nullspace is the largest subspace perpendicular to the row space, and we have $C(A^T)^\bot=N(A)$. Using similar arguments, we have $N(A)^\bot=C(A^T)$.

What is so useful about complements is that we can use them to break down vectors into components. Namely, for an $m \times n$ matrix $A$, any vector $x \in R^n$ can be written as the sum of a component $x_r$ in the row space and a component $x_n$ in the nullspace: $x=x_r+x_n$.

Now, let's consider $Ax=b$. We claim that every output vector $b$ comes from one and only one vector in the row space. Write $x=x_r+x_n$. Then $Ax=Ax_r+Ax_n=Ax_r+0=Ax_r$. Thus, we know there is a vector $x_r$ in the row space such that $Ax_r=b$. Suppose $x_r'$ is another vector in the row space such that $Ax_r'=b$. Note that $x_r-x_r'$ is in row space since it is the difference of two vectors in the row space. Now $A(x_r-x_r')=Ax_r-Ax_r'=b-b=0$. Then $x_r-x_r'$ is in the nullspace. As the row space and nullspace are orthogonal complements, the only vector in both of them is $0$. So, $x_r-x_r'=0$ and $x_r=x_r'.\:\Box$

Relation to domain and codomain
Consider an $m \times n$ matrix. The matrix $A$ maps from $R^n$ to $R^m$; we say that its domain is $R^n$ and its codomain is $R^m$. Since a vector in the domain either maps to zero or elsewhere, we can say that the domain includes the nullspace of $A$ and the preimage of the range of $A$. Also, the codomain of $A$ includes the range of $A$ and the part we cannot reach with $A$. Now, the matrix $A^T$ maps from $R^m$ to $R^n$. We can say that the range of $A^T$ is the preimage of the range of $A$ and that the nullspace of $A^T$ is the subspace we cannot reach with $A$.

To sum up, we have the four fundamental subspaces as
• an orthogonal decomposition of the domain of $A$:
nullspace of $A$ $\oplus$ range of $A^T$
• an orthogonal decomposition of the codomain of $A$:
range of $A$ $\oplus$ nullspace of $A^T$

Useful:
http://xoax.net/math/crs/linear_algebra_mit/lessons/Lecture10/

Reference:
https://rip94550.wordpress.com/2009/09/07/linear-algebra-the-four-fundamental-subspaces/

Thursday, 4 June 2015

Spanning and linearly independence

A basis of a vector space V is subset $\{v_1,\cdots,v_n\} \subset V$ satisfying the two properties:
(i) $\{v_1,\cdots,v_n\}$ spans V;
(ii) $\{v_1,\cdots,v_n\}$ is linearly independent.

(i) To show spanning: Take an arbitrary vector $v \in V$, and show that $v$ can be written as a linear combination of $v_1, \cdots,v_n$, namely, $v=a_1v_1+\cdots+a_nv_n$.

(ii) To show linearly independence: Prove that $a_1v_1+\cdots+a_nv_n=0$ implies $a_1=a_2=\cdots=a_n=0$.

Let $au_1(x)+bu_2(x)=0$. Differentiating $\Rightarrow au_1'(x)+bu_2'(x)=0$.
If $W(u_1,u_2)=\begin{vmatrix} u_1 & u_2 \\ u_1' & u_2' \end{vmatrix}=u_1u_2'-u_2u_1' \neq 0$, then $a=b=0$ and $u_1$ and $u_2$ are linearly independent.
Note: $W(u_1, u_2)$ is known as the Wronskian determinant of functions $u_1$ and $u_2$.

Examples:
Spanning

$1$ and $i$ are linearly independent in $\mathbb{C}$ regarded as a vector space over $\mathbb{R}$ but they are linearly dependent if $\mathbb{C}$ is regarded as a complex vector space.

$u_1(x)=\sin x$ and $u_2(x)=\cos x$ are linearly independent.

Proof: Let $a\sin x +b\cos x=0$.
Differentiating $\Rightarrow a\cos x-b\sin x=0$.
So $a=b \sin x (\cos x)^{-1} \Rightarrow b(\sin^2 x (\cos x)^{-1}+\cos x)=0$
$\Rightarrow b (\cos x)^{-1}=0 \Rightarrow b=0$ Thus $a=b=0$.

Alternatively, $W(u_1,u_2)=u_1u_2'-u_2u_1'=-\sin^2 x-\cos^2 x=-1 \Rightarrow$ linear independence.

Show that the functions $x, xe^x, e^{-x}$ are linearly independent in the vector space $C^\infty(\mathbb{R})$.

Proof: Suppose $ax+bxe^x+ce^{-x}=0$ for all $x \in \mathbb{R}$, where $a,b,c$ are constants.

Method I – Differentiation:
$\begin{align}\: & ax+bxe^x+ce^{-x}=0 \; &(1)\\
\text{Differentiate both sides:}\: & a+be^x+bxe^x-ce^{-x}=0 \; &(2)\\
\text{Differentiate again:}\: & 2be^x+bxe^x+ce^{-x}=0 \; &(3)\\
\text{Differentiate yet again:}\: & 3be^x+bxe^x-ce^{-x}=0 \; &(4)\\
\text{And again:}\: & 4be^x+bxe^x+ce^{-x}=0 \; &(5) \end{align}$

$(5)-(3) \Rightarrow 2be^x=0 \Rightarrow b=0$
Substitute $b=0$ in $(3) \Rightarrow c=0$.
Substitue $b=c=0$ in $(2) \Rightarrow a=0$.

Method II – Limit:
For any $x \neq 0$ divide both sides by $xe^x$. Then, $ae^{-x}+b+cx^{-1}e^{-2x}=0$
L.H.S approaches $b$ as $x \to +\infty \Rightarrow b=0$.
Now $ax+ce^{-x}=0$ for all $x \in \mathbb{R}$.
For any $x \neq 0$ divide both sides by x. Then $a+cx^{-1}e^{-x}=0$.
L.H.S. approaches $a$ as $x \to +\infty \Rightarrow a=0$.
Now $ce^{-x}=0 \Rightarrow c=0$.

Linear Independence in function spaces

Applications in ordinary differential equations [later]

Reference:
Linear independence

Tuesday, 2 June 2015

Direct sum

The sum of two subspaces is the subspace $U+V=\text{span}(U \cup V)=\{u+v \in W\:|\:u \in U \wedge v \in V\}$
If $U \cap V=\{\boldsymbol{0}\}$, then the sum is called the direct sum and is denoted by $U \oplus V$.

Put differently, $W=U\oplus V$ iff for every $w \in W$ there exist unique vectors $u \in U$ and $v \in V$ such that $w=u+v$.

If $U_i \leq W$ for $1 \leq i \leq k$, then $W=U_1\oplus \cdots \oplus U_k$ iff $W=U_1+U_2+\cdots+U_k$ and $U_r \cap \sum\limits_{i \neq r} U_i=\{0\}$ for $1 \leq r \leq k$.
Note: It is not sufficient that $U_i \cap U_j=\{0\}$ whenever $i \neq j$.

Examples:
The complex numbers are the direct sum of the real and purely imaginary numbers. $\mathbb{C}=\mathbb{R}\oplus i\mathbb{R}$

$\langle e_1 \rangle+\langle e_2,e_3 \rangle=\langle e_1 \rangle \oplus \langle e_2,e_3 \rangle$ is a direct sum but $\langle e_1,e_2 \rangle+\langle e_2,e_3 \rangle$ is not a direct sum since $\langle e_1,e_2 \rangle \cap \langle e_2,e_3 \rangle= \langle e_2 \rangle \neq \{\boldsymbol{0} \}$.

Any function is a sum of even and odd function.
$E=\{f \in F(\mathbb{R})|f(t)=f(-t)\},O=\{f \in F(\mathbb{R})|f(t)=-f(-t)\}$
$E\oplus O=F(\mathbb{R})$
$f(t)=\frac{1}{2}[f(t)+f(-t)-f(-t)+f(t)]=\frac{1}{2}[f(t)+f(-t)]+\frac{1}{2}[f(t)-f(-t)]$

Denote the vector space of square matrices by $M_{n \times n}(\mathbb{R})$.
Denote by $M_{n \times n}(\mathbb{R})_+, M_{n \times n}(\mathbb{R})_-$ the vector subspaces of symmetric and skew-symmetric matrices.
$M_{n \times n}(\mathbb{R})=M_{n \times n}(\mathbb{R})_+\oplus M_{n \times n}(\mathbb{R})_-$
Let $X \in M_{n \times n}(\mathbb{R})$. Then $X_+=\frac{1}{2}(X+X^T)$ and $X_-=\frac{1}{2}(X-X^T)$.
$X=\frac{1}{2}(X+X^T)+\frac{1}{2}(X-X^T)$

Complex square matrices is the direct sum of Hermitian and skew-Hermitian matrices.
Denote the vector space of complex square matrices by $M_{n \times n}(\mathbb{C})$.
Denote by $M_{n \times n}(\mathbb{C})_+, M_{n \times n}(\mathbb{C})_-$ the vector subspaces of Hermitian and skew-Hermitian matrices.
$M_{n \times n}(\mathbb{C})=M_{n \times n}(\mathbb{C})_+\oplus M_{n \times n}(\mathbb{C})_-$
Let $A \in M_{n \times n}(\mathbb{C})$. Then $A_+=\frac{1}{2}(A+A^*)$ and $A_-=\frac{1}{2}(A-A^*)$.
$A=\frac{1}{2}(A+A^*)+\frac{1}{2}(A-A^*)$

Important idea:
If $U \cap V=\{ \boldsymbol{0} \}$ and $U, V$ are subspaces of a vector space with bases $\{u_1, \cdots, u_k\}$ and $\{v_1, \cdots, v_l\}$ respectively, then $\{u_1, \cdots, u_k, v_1, \cdots, v_l \}$ is a basis for $U \oplus V$. Therefore, knowing that $X \oplus Y=X+Y$ allows us to find a basis of $X+Y$ easily: we just find the union of bases of $X$ and $Y$. Namely, if $V=U_1 \oplus U_2 \oplus \cdots \oplus U_k$ and $B_i$ is a basis of $U_i$ then $B_1 \cup B_2 \cup \cdots \cup B_k$ is a basis of V. In particular, $\text{dim}\:V=\sum\limits_{i=1}^k \text{dim}\: U_i$.

Direct sum of vectors
Any vector $\vec{v}\in V$ can be written as a linear combination of the basis vectors, $\vec{v}=v_1e_1+\cdots+v_ne_n$. We normally express this fact in the form of a column vector, $\vec{v}=(v_1,v_2,\cdots,v_n)^T$. Similarly, for any vector $\vec{w}$, we can write $\vec{w}=(w_1,w_2,\cdots,w_m)^T$. Thus, $\vec{v}=(v_1,v_2,\cdots,v_n,0,0,\cdots,0)^T$ and $\vec{w}=(0,0,\cdots,0,w_1,w_2,\cdots,w_m)^T$. Then a direct sum of two non-zero vectors is $\vec{v}\oplus \vec{w}=(v_1,v_2,\cdots,v_n,w_1,w_2,\cdots,w_m)^T$.

Block-diagonal form
A matrix is actually a linear map from a vector space to another vector space. Let's consider the endomorphism $A:V\to V, \vec{v}\mapsto A\vec{v}$ or $\begin{pmatrix} v_1\\v_2\\ \vdots\\ v_n\end{pmatrix}\mapsto \begin{pmatrix} a_{21}&a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n1}&a_{n2}&\cdots&a_{nn}\end{pmatrix}\begin{pmatrix} v_1\\v_2\\ \vdots\\ v_n\end{pmatrix}.$ Similarly, $B:W\to W, \vec{w}\mapsto B\vec{w}$. On the direct sum space $V\oplus W$, the matrix $A\oplus B$ still acts on the vectors such that $\vec{v}\mapsto A\vec{v}$ and $\vec{w}\mapsto B\vec{w}$. This is achieved by lining up all matrix elements in a block-diagonal form, $$A\oplus B=\begin{pmatrix}A&0_{n\times m}\\ 0_{m\times n}&B\end{pmatrix}.$$
For example, if $n=2,m=3$, and $A=\begin{pmatrix}a_{11}&a_{12}\\a_{21}&a_{22}\end{pmatrix},B=\begin{pmatrix}b_{11}&b_{12}&b_{13}\\b_{21}&b_{22}&b_{23}\\b_{31}&b_{32}&b_{33}\end{pmatrix}$, then $$A\oplus B=\begin{pmatrix}a_{11}&a_{12}&0&0&0\\a_{21}&a_{22}&0&0&0\\0&0&b_{11}&b_{12}&b_{13}\\0&0&b_{21}&b_{22}&b_{23}\\0&0&b_{31}&b_{32}&b_{33}\end{pmatrix}.$$
When you act it on $\vec{v}\oplus \vec{w}$, $$(A\oplus B)(\vec{v}\oplus \vec{w})=\begin{pmatrix}A&0_{n\times m}\\ 0_{m\times n}&B\end{pmatrix}\begin{pmatrix}\vec{v}\\ \vec{w}\end{pmatrix}=\begin{pmatrix}A\vec{v}\\B\vec{w}\end{pmatrix}=(A\vec{v})\oplus(B\vec{w}).$$ If you have two matrices, their multiplications are done on each vector space separately, $$(A_1 \oplus B_1)(A_2\oplus B_2)=(A_1A_2)\oplus (B_1B_2).$$
Note that not every matrix on $V\oplus W$ can be written as a direct sum of a matrix on $V$ and another on $W$. There are $(n+m)^2$ independent matrices on $V\oplus W$, while there are only $n^2$ and $m^2$ matrices on $V$ and $W$, respectively. Remaining $(n+m)^2-n^2-m^2=2mn$ matrices cannot be written as a direct sum.

Other useful formulae are $\text{det}(A\oplus B)=\text{det}(A)\text{det}(B)$ and $\text{Tr}(A\oplus B)=\text{Tr}(A)+\text{Tr}(B)$.

More to learn:
geometric picture of sum of two subspaces and examples
http://algebra.math.ust.hk/vector_space/13_direct_sum/lecture2.shtml

more on direct sum
https://math.dartmouth.edu/archive/m22s02/public_html/VectorSpaces.pdf

examples
http://mathwiki.ucdavis.edu/Algebra/Linear_algebra/04._Vector_spaces/4.4_Sums_and_direct_sum

projection operators
https://people.maths.ox.ac.uk/flynn/genus2/alg0506/LALect02.pdf

problems on direct sum
http://en.wikibooks.org/wiki/Linear_Algebra/Combining_Subspaces

http://math.stackexchange.com/questions/1151335/what-is-direct-sum-decomposition
https://web.math.princeton.edu/~jonfick/2012SpMAT204/Direct_Sums.pdf
http://www.math.nyu.edu/faculty/hausner/quotientspaces.pdf
http://www.math.ttu.edu/~drager/Classes/05Summer/m5399/struct.pdf
http://www.mathsman.co.uk/LinearAlgebra-Ver1.4[1].pdf
http://planetmath.org/directsumofmatrices