Math of the Day: July 2015

Friday, 31 July 2015

Exact sequences

A motivating example:
In linear algebra, we know that the linear transformation $f:V\to W$ induces the linear transformations $V\xrightarrow{g}\text{im}(f)\xrightarrow{i}W$ where $i$ is the inclusion map and $g$ is the same as $f$ except that the range of $f$ has been restricted to $\text{im}(f)$. Then $f=i\circ g$ and so we can represent this by a commutative diagram. Intuitively, we can think of exact sequence as a factorisation of maps with certain condition.

Definition: Let $U\xrightarrow{f}V\xrightarrow{g}W$ be a sequence of linear transformations. (This means $f\in L(U,V)$ and $g\in L(V,W)$.) We say $U\xrightarrow{f}V\xrightarrow{g}W$ is an exact sequence at $V$ if $\text{im}(f)=\text{ker}(g)$. A sequence $V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}\cdots \xrightarrow{f_{n-2}}V_{n-1}\xrightarrow{f_{n-1}}V_n$ is an exact sequence if $\text{im}(f_{i-1})=\text{ker}(f_i), 2\leq i\leq n-1$, that is, if the sequence is exact at $V_2,\cdots,V_{n-1}$.

When writing sequences of linear transformations, it is customary to write $0$ for the vector space $\{0\}$.

Lemma:
1. The sequence $0\to V\to 0$ is exact iff $V=\{0\}$.
2. The sequence $0\to U\xrightarrow{f} V$ is exact iff $f$ is injective.
3. The sequence $V\xrightarrow{g}W\to 0$ is exact iff $g$ is surjective.
4. The sequence $0\to V\xrightarrow{f}W\to 0$ is exact iff $f$ is an isomorphism.
5. The sequence $0\to U\xrightarrow{f}V\xrightarrow{g}W\to 0$ is exact iff $f$ is injective and $g$ is surjective, and $\text{im}(f)=\text{ker}(g)$.
Proof: Just note that from the definition, $\text{im}(0\to V)=0$ and $\text{ker}(V\to 0)=V.\:\Box$

Definition: A short exact sequence is a sequence of five vector spaces, which is exact at all three internal nodes: $0\to U\xrightarrow{f}V\xrightarrow{g}W\to 0$. A long exact sequence is a doubly infinite sequence of vector spaces which is exact at every node.

A nice feature of exact sequences is that there are simple rules to create new sequences from old ones. The simplest way to do this is to split and glue exact sequences as follows.

Splitting and gluing exact sequences
Splitting: If $V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}V_3\xrightarrow{f_3}V_4$ is an exact sequence, the two sequences $V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}W\to 0$ and $0\to W\to V_3\xrightarrow{f_3}V_4$ are also exact, where $W=\text{im}(f_2)=\text{ker}(f_3)$ and $f_2$ is the inclusion of $\text{ker}(f_3)$ in $V_3$.

Gluing: Conversely, if $V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}W\to 0$ and $0\to W\to V_3\xrightarrow{f_3}V_4$ are exact sequences with $W \subset V_3$ and $f_2$ being the inclusion, the sequence $V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}V_3\xrightarrow{f_3}$ is also exact.

Proposition: Assume that $0\to V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}\cdots\xrightarrow{f_{n-1}}V_n\to 0$ is an exact sequence of finite dimensional vector spaces, $n\geq 1$. Then $\sum\limits_{i=1}^n (-1)^i \: \text{dim}(V_i)=0$.

Proof by MI: If $n=1$, then $V_1=\{0\}$, and so $\text{dim}(V_1)=0$.
If $n=2$, then $V_1 \cong V_2$ by the lemma, and so $\text{dim}(V_1)-\text{dim}(V_2)=0$.
If $n=3$, then $0\to V_1\xrightarrow{f_1}V_2\xrightarrow{f_2}V_3\to 0$ is an exact sequence.
By the dimension theorem, we have $$\begin{align}\text{dim}(V_2)&=\text{dim}\:\text{ker}(f_2)+\text{dim}\:\text{im}(f_2)\\
&=\text{dim}\:\text{im}(f_1)+\text{dim}(V_3)\\
&=\text{dim}(V_1)-\text{dim}\:\text{ker}(f_1)+\text{dim}(V_3)\\
&=\text{dim}(V_1)+\text{dim}(V_3),\end{align}$$ which means $-\text{dim}(V_1)+\text{dim}(V_2)-\text{dim}(V_3)=0$. So the result holds for $n=3$.
For the general case, suppose $n\geq 4$. To use the induction assumption, we can split the given exact sequence into $$0\to V_1\xrightarrow{f_1}\cdots\xrightarrow{f_{n-3}}V_{n-2}\xrightarrow{f_{n-2}}\text{im}(f_{n-2})\to 0,$$ $$0\to \text{im}(f_{n-2})\xrightarrow{i}V_{n-1}\xrightarrow{f_{n-1}}V_n\to 0$$ because $\text{im}(f_{n-2})=\text{ker}(f_{n-1})$. By induction, we have $\sum\limits_{i=1}^{n-2} (-1)^i \:\text{dim}(V_i)+(-1)^{n-1}\:\text{dim}(\text{im}(f_{n-2}))=0$ and $(-1)^{n-2}\:\text{dim}(\text{im}(f_{n-2}))+(-1)^{n-1}\:\text{dim}(V_{n-1})+(-1)^n\:\text{dim}(V_n)=0$.
Adding these two gives $\sum\limits_{i=1}^n (-1)^i \:\text{dim}(V_i)=0.\:\Box$

Direct proof: Applying the rank nullity theorem to all vector spaces $V_1,\cdots,V_n$, we have $$\begin{align}\sum\limits_{i=1}^{n-1}(-1)^i\:\text{dim}(V_i)&=\sum\limits_{i=1}^{n-1}(-1)^i \:(\text{dim}\:\text{ker}(f_i)+\text{dim}\:\text{im}(f_i))\\
&=-\text{dim}\:\text{ker}(f_1)+(-1)^{n-1}\text{dim}\:\text{im}(f_{n-1})\\&+\sum\limits_{i=2}^{n-1}(-1)^i \:(\text{dim}\:\text{ker}(f_i)-\text{dim}\:\text{im}(f_{i-1})).\end{align}$$
But $\text{dim}\:\text{ker}(f_1)=0$ since $f_1$ is injective, $\text{dim}\:\text{im}(f_{n-1})=\text{dim}(V_n)$ since $f_{n-1}$ is surjective, and $\text{dim}\:\text{ker}(f_i)=\text{dim}\:\text{im}(f_{n-1})$ for all $i=2,\cdots,n-1$ by definition.
Therefore $\sum\limits_{i=1}^n (-1)^i \: \text{dim}(V_i)=(-1)^{n-1}\:\text{dim}(V_n)+(-1)^n\:\text{dim}(V_n)=0.\:\Box$

Linear algebra
Dimension theorems
For any short exact sequence of finite-dimensional vector spaces $0\to V_1\xrightarrow{T_1}V_2\xrightarrow{T_2}V_3 \to 0$, $\text{dim}(V_2)=\text{dim}(V_1)+\text{dim}(V_3)$. [This is just the $n=3$ case of the proposition above. Here we provide a proof without the aid of rank nullity theorem.]

Proof: Suppose $\text{dim}(V_1)=n$. Then we can take $\{e_i\}_{i=1}^n$ to be a basis of $V_1$. Thus $\{T_1(e_i)\}_{i=1}^n$ span the image of $T_1$. Since $T_1$ is injective, $\{T_1(e_i)\}_{i=1}^n$ are linearly independent in $V_2$. We can find $\{v_j\}_{j=1}^r$ such that $\{T_1(e_i)\}_{i=1}^n \cup \{v_j\}_{j=1}^r$ form a basis of $V_2$. Thus $\text{dim}(V_2)=n+r$. Now $T_2$ is surjective, so $\{T_2(v_j)\}_{j=1}^r$ span $V_3$. It remains to show that $\{T_2(v_j)\}_{j=1}^r$ is linearly independent. Suppose $\sum_j a_jT_2(v_j)=0$. Then $\sum_j T_2(a_jv_j)=0\Rightarrow \sum a_jv_j \in \text{ker}(T_2)=\text{im}(T_1)$. Since $\{T_1(e_i)\}_{i=1}^n$ span the image of $T_1$, we have $\sum\limits_{j=1}^r a_jv_j=\sum\limits_{i=1}^n b_i T_1(e_i)$ for some constants $\{b_i\in \mathbb{R}\}_{i=1}^n$. In particular, $\sum\limits_{i=1}^n -b_i T_1(e_i)+\sum\limits_{j=1}^r a_jv_j=0$. Since $\{T_1(e_i)\}_{i=1}^n \cup \{v_j\}_{j=1}^r$ are linearly independent in $V_2$, all the coefficients $\{b_i\}$ and $\{a_j\}$ must be zero. Therefore, $\text{dim}(V_3)=r$. We have established that $\text{dim}(V_2)=n+r=\text{dim}(V_1)+\text{dim}(V_3).\:\Box$

Using this, we can derive many results in linear algebra.

Theorem: If $T:V\to W$, then $0 \to \text{ker}(T) \xrightarrow{i} V \xrightarrow{T} \text{im}(T) \to 0$ is a short exact sequence, where $i$ is the inclusion map.
Corollary (Rank-nullity): If $T:V\to W$ is a linear map of finite-dimensional vector spaces, then $\text{dim}(\text{ker}(T))+\text{dim}(\text{im}(T))=\text{dim}(V)$.

Theorem: If $A,B \subset V$ are subspaces such that $V=\text{span}(A\cup B)$, then $0\to A \cap B \to A \oplus B \to V \to 0$ is a short exact sequence, where $A \cap B\to A\oplus B$ is the diagonal map $v\mapsto (v,v)$, and the map $A\oplus B\to V$ is the difference map $(a,b)\mapsto a-b$.
Corollary: $\text{dim}(V)=\text{dim}(A)+\text{dim}(B)-\text{dim}(A \cap B)$.

Theorem: If $W\subset V$ is a subspace, then $0\to W \xrightarrow{i} V \xrightarrow{g} V/W \to 0$ is exact, where $i$ is the inclusion map and $g$ is the quotient map $g(v)=[v]$.
Corollary: $\text{dim}(V/W)=\text{dim}(V)-\text{dim}(W)$.

Dual space
Proposition: If $U\xrightarrow{f}V\xrightarrow{g}W$ is an exact sequence of linear transformations of vector spaces, then $W^*\xrightarrow{g^*}V^*\xrightarrow{f^*}U^*$ is also an exact sequence of linear transformations.

Corolloary: Let $f\in L(V,W)$ and let $f^*\in L(W^*,V^*)$ be the induced linear transformation on dual spaces.
1. If $f$ is injective, then $f^*$ is surjective.
2. If $f$ is surjective, then $f^*$ is injective.
Proof:
If $f$ is injective, then $0\to V\to W$ is exact at $V$ and so $W^*\to V^*\to 0$ is an exact sequence. Thus $f^*$ is surjective.
If $f$ is surjective, then $V\to W\to 0$ is exact at $W$ and so $0\to W^*\to V^*$ is an exact sequence. Thus $f^*$ is injective.

More intuition about exact sequences
In Euclidean space $\mathbb{R}^n$, if $V$ is a subspace of $\mathbb{R}^n$, then we think of it as filling out "some of" the dimensions in $\mathbb{R}^n$ and its orthogonal complement $V^\perp$ fills out the other directions. Together they span $\mathbb{R}^n$ with no redundancies: $\mathbb{R}^n=V\oplus V^\perp$. See here for more examples of direct sum.

More generally, we don't have an inner product and so we can't form orthogonal complements, but we can still talk about submodules and quotients.

If $A$ is a submodule of $B$, then $A$ fills up "some of the directions" in $A$, and the remaining directions are encoded in $B/A$. When we have a submodule $A\subset B$, a surjection $B \to C$, and if $A$ happens to be the kernel of the map $B \to C$, then we are in the previous situation: $A$ fills out some of the directions in $B$, and all the complementary directions are encoded in $C$. So we introduce the terminology $0\to A\to B\to C\to 0$ is a short exact sequence to capture this situation.

Graph theory: Euler's formula
Inclusion-exclusion

References
Intuitive meaning of exact sequence
Linear algebra [p38-40,75,76,84]
Exact sequences in linear algebra [p4-6]
Definition of short exact sequence [p1,2]
Splitting and gluing [p1,2,5]
Simple groups and short exact sequences [p2,4]

More to explore:
Homological algebra

Tuesday, 28 July 2015

Quotient spaces

Recall the definition of sum of subspaces: if $U,W$ are subspaces of $V$, then $U+W=\{u+w\:|\:u \in U\:\text{and}\: w\in W\}$. In case one of the sets contain only one element, we can say $\{u_0\}+W=u_0+W=\{u_0+w\:|\: w\in W\}$. We call $u_0+W$ a coset (translate) of $W$. [A coset $v+W$ is not a subspace unless $v \in W$. The simple reason is that after translation, the zero element is no longer in $v+W$. In fact, such cosets are examples of affine subspaces.] Geometrically, all of the sets $u_0+W$ are parallel pictures of $W$ that are translated in $V$ as we change $u_0$. We can also say that $U$ and $W$ are parallel and denote $W \parallel U$ if $U=u_0+W$ for some $u_0\in V$.

Note that several of these translates can be equal. Let's take a look at an example. Let $W=\{(x,0)\:|\: x\in \mathbb{R}\},v_1=(0,1),v_2=(1,1)$. Then $v_1+W=\{(x,1)\:|\:x\in \mathbb{R}\}=v_2+W$. Also notice that $v_2-v_1=(1,0)\in W$. Thus translating $W$ by $v_1$ or $v_2$ is the same. This motivates the following proposition.

Proposition (i): $v+W=v'+W \iff v-v'\in W$.
Proof: Suppose $v+W=v'+W$. Then $v=v'+w$ for some $w\in W$, implying $v-v'=w\in W$.
Conversely, suppose $v-v'\in W$. Note that if $z\in W$, then $z+W=W$ since $W$ is a subspace. Then $v+W=v'+(v-v')+W=v'+W.\:\Box$

According to the proposition, two elements $v$ and $v'$ of $V$ determine the same element of $V /W$ if and only if $v-v'\in W$. So we can think of $V/W$ as the set of all elements of $V$ under the extra condition that two elements of $V$ are declared to be the same if their difference is in $W$. Thus we can define $V/W$ to be the set of equivalence classes of the equivalence relation $∼$ on $V$ defined by $v \sim v'$ if $v-v'\in W$. Notice how similar this definition of relation is to congruence relation. So we can also say $v \equiv v' \:\text{mod}\:W$ if $v-v' \in W$. As an exercise, try to prove that the equivalence classes of $\equiv \text{mod}\:W$ are precisely the cosets of $W$.

Definition: A quotient set $V/W$, pronounced $V\:\text{mod}\:W$, is the set of all cosets of $W$, defined by $V/W=[v]=\{v+W\:|\:v\in V\}$.
[Informally, a quotient set is what we get when we "divide" a set $A$ by $B\subset A$, wherein we set all elements of $B$ to the identity in $A$. For example, if $A=\mathbb{Z}$ and $B=\{5n\:|\:n\in \mathbb{Z}\}$, then we're making all multiples of $5$ zero, so the quotient is $\{0,1,2,3,4\}$ (the set of all equivalence classes).]

We want to turn $V/W$ into a vector space (quotient space), so we have to define addition and scalar multiplication.

Definition: The sum of two elements $v+W$ and $v'+W$ of $V/W$ is defined by $(v+W)+(v'+W) := (v+v')+W$.
For $a\in \mathbb{F}$, the scalar multiplication of $a$ on $v+W$ is defined by $a\cdot (v+W) := av+W$.

It is necessary to check that these definitions are well defined, that is, these operations should not depend on which element of $v+W$ we choose to represent $v+W$.

Proposition (ii): Suppose $v+W=v'+W$. Then $(v+W)+(v''+W)=(v'+W)+(v''+W)$ for any $v''+W\in V/W$ and $a\cdot (v+W)=a\cdot (v'+W)$ for any $a\in \mathbb{F}$.
Proof:
WLOG it is enough to show that $(v+W)+(v''+W) \subset (v'+W)+(v''+W)$. Let $u\in(v+W)+(v''+W)=(v+v'')+W$. Then there exists $w\in W$ such that $u=(v+v'')+w$. Since $v+W=v'+W$, we have $v-v'\in W$. Denote this element by $w'$. Then $v=v'+w'$. Thus $u=(v+v'')+w=((v'+w')+v'')+w=(v'+v'')+(w'+w)$, showing that $u\in (v'+v'')+W$ since $w'+w\in W$.
We conclude that $(v+W)+(v''+W)=(v'+W)+(v''+W)$.
WLOG it is enough to show that $a\cdot (v+W)\subset a\cdot (v'+W)$. Let $u\in a\cdot (v+W)=av+W$. Then $u=av+w$ for some $w\in W$. Denote by $w'$ the element $v-v'\in W$. Then $u=av+w=a(v'+w')+w=av'+(aw'+w)\in av'+W$ since $aw'+w\in W$ as $W$ is closed under addition and scalar multiplication.
We have $a\cdot (v+W)=a\cdot (v'+W)$
Thus both addition and scalar multiplication on $V/W$ is well-defined. $\Box$

Note that the above depends crucially on the assumption that $W$ is a subspace of $V$; if $W$ were just an arbitrary subset of $V$, addition and scalar multiplication on $V/W$ would not be well-defined. With these operations then, $V/W$ becomes a vector space over $\mathbb{F}$.

It is important to keep in mind that the quotient space is in general not a subspace, nor naturally equivalent to one. But because of the presence of a natural inner product on $\mathbb{R}^n$, there is a natural way to choose a second subspace orthogonal to the given one, and that second subspace serves as a natural isomorphic model of the quotient space by the given subspace. Namely, if $V$ has an inner product, then $V/W$ is isomorphic to $W^\perp=\{v\:|\:\langle v,w \rangle =0\: \forall\: w\in W\}$.

Summary of concepts: sum of subspaces, coset (translate), subspaces, equivalence relation & equivalence class, congruence relation, quotient set

Examples:

The quotient space in $\mathbb{R}^3$ by a line is a two-dimensional vector space that is isomorphic to a plane. It does not necessarily equal a plane in $\mathbb{R}^3$. In particular, the quotient space doesn't need to be a subset of $\mathbb{R}^3$. The quotient space is something very abstract. It's an entirely new vector space.

The quotient space in $\mathbb{R}^3$ by a plane is the set of all planes parallel to the plane given by the equation $z=0$. So any plane with equation $z=k$ is an element of the quotient space, which is isomorphic to $\mathbb{R}$, but note that the quotient space does not necessarily equal a line in $\mathbb{R}^3$.

In the quotient $\mathbb{R}[x]/(x^2+1)$, the polynomial $x^2+1$ is equivalent to $0$. It follows from the fact that $0$ and $x^2+1$ are in the same equivalence class since $x^2+1=0+(x^2+1)$. So we have $x^2=-1$, which means $x=i$ or $x=-i$. Since an arbitrary element of $\mathbb{R}[x]$ looks like $a_0+a_1x+a_2x^2+\cdots+a^nx^n\; a_i\in \mathbb{R}$, in the quotient $\mathbb{R}[x]/(x^2+1)$ the polynomial becomes $f=a_0+a_1 i+a_2 i^2+\cdots+a^n i^n$. Since $i^2=-1$, we have $i^3=-i, i^4=1$. Collecting terms we can write $f$ as $a+bi$ for some $a,b\in \mathbb{R}$. Therefore, we can say $\mathbb{R}[x]/(x^2+1)\cong \mathbb{C}$.

Quotient space $\mathbb{P}_n/\mathbb{R}$
Let $V=\mathbb{P}_n=\{a_nx^n+a_{n-1}x^{n-1}+\cdots+a_1x+a_0\:|\:a_0,a_1,\cdots,a_n\in \mathbb{R}\}$ and $W=\{a_0\:|\:a_0\in \mathbb{R}\}=\mathbb{R}$.
Let $[f(x)], [g(x)]\in V/W$. Then $[f(x)]=[g(x)] \iff f(x)-g(x) \in W \iff f(x)-g(x)=c$ for $c\in \mathbb{R}$.
Let a differential operator be $D:V \to V$ where $D[f(x)]=\frac{df}{dx}$.
We have $D[c]=\frac{d}{dx}(c)=0$, which means $\text{ker}(D)=[c]=W$, the set of constant polynomials. This operator is not 1-1 because different functions after differentiation can give the same constant polynomial.
Now let's change the domain of the differential operator to $\overline{D}:V/W \to V$ where $\overline{D}[f(x)]=\frac{df}{dx}$. Now $\text{ker}(\overline{D})=[c]=W=[0]+W=0_{V/W}$. So the operator is 1-1 in the quotient space.

The key idea is this: if a mapping is not 1-1 in which different points map to the same point, we can take all those different points to be elements of a common coset. In other words, we are identifying points that are equivalent to one another in some equivalence relation. With the quotient space formed by squashing those points to the same coset, we have a 1-1 mapping. Similarly, we can modify the codomain and turn a mapping into a surjection, which leads to the following proposition.

Proposition (iii) Canonical epimorphism: The function $\pi:V\to V/W$ defined by $\pi(v)=[v]=v+W$ is a linear transformation with $\text{ker}(\pi)=W, \text{im}(\pi)=V/W$.
Proof:
$\pi(u+v)=(u+v)+W=(u+W)+(v+W)=\pi(u)+\pi(v)$
$\pi(av)=(av)+W=a(v+W)=a\pi(v)$
It follows that $\pi$ is a linear transformation.
The image of $\pi$ is $V/W$ by definition.
For the kernel of $\pi$, $v\in \text{ker}(\pi) \iff v+W=0+W \iff v\in W.\:\Box$

Corollory: Let $V$ be a vector space and $W$ be a subspace of $V$. Then $\text{dim}(W)+\text{dim}(V/W)=\text{dim}(V)$.
Proof: Let $\pi:V\to V/W$ be the canonical epimorphism. By the dimension theorem, we have $\text{dim}\:\text{ker}(\pi)+\text{dim}\:\text{im}(\pi)=\text{dim}(V)$. But $\text{ker}(\pi)=W$ and $\text{im}(\pi)=V/W$, so the claim follows. $\Box$

Theorem: Let $W$ be a subspace of $V$ with quotient mapping $T:V\to V/W$. A subspace $U$ is a complement to $W$ iff $T|_U:U\to V/W$ is an isomorphism.

Universal property of the quotient
Let $V$ be a vector space over $\mathbb{F}$ and $W$ be a subspace of $V$. Define $\pi:V\to V/W$ by $v \mapsto [v]$. If $W'$ is a vector space over $\mathbb{F}$ and $f:V\to W'$ is a linear map whose kernel contains $W$, then there exists a unique linear map $g:V/W\to W'$ such that $f=g\circ \pi$. This is summarized by the following commutative diagram:

$V \xrightarrow{\qquad f \qquad} W'\\ \; \pi \searrow \quad \qquad \nearrow g\\ \qquad\;\; V/W$

Proof:
Let $W'$ be a vector space over $\mathbb{F}$, $f:V\to W'$ be a linear map with $W \subset \text{ker}f$. Then define $g:V/W\to W'$ to be the map $[v]\mapsto f(v)$.
$g$ is well-defined because if $[v]\in V/W$ and $v_1,v_2\in V$ are both representatives of $[v]$, then there exists $w\in W$ such that $v_1=v_2+W$.
Hence $f(v_1)=f(v_2+w)=f(v_2)+f(w)=f(v_2)$ since f is linear.
The linearity of $g$ comes from linearity of $f$.
Also, by definition $f=g\circ \pi$.
It remains to prove $g$ is unique.
Suppose $h:V/W\to W'$ is such that $h\circ \pi=f=g\circ \pi$. Then $h([v])=g([v])$ for all $v\in V$. Since $\pi$ is surjective, $h=g.\:\Box$

First isomorphism theorem
Let $V,W$ be vector spaces over $\mathbb{F}$ and $T:V \to W$ a linear map. Then $V/\text{ker}(T) \cong \text{im}(T)$, namely $V/\text{ker}(T)$ is isomorphic to $\text{im}(T)$.

Proof: Consider the mapping $\overline{T}:V/\text{ker}(T) \to W \;\;\; \overline{T}:[v]=v+\text{ker}(T) \mapsto T(v)$.

A commutative diagram would express this as

$V \xrightarrow{\qquad T \qquad} W\\ \; \pi \searrow \quad \qquad \nearrow \overline{T}\\ \qquad V/ \text{ker}(T)$

We can use similar arguments as the previous example to prove that $\overline{f}$ is well-defined and linear.
Now suppose $\overline{T}([v])=0$. Then $T(v)=0$, which implies $v\in \text{ker}(T)$. This means $[v]=0_{V/\text{ker}(T)}$. Then $\overline{T}$ is injective. Hence if we confine the codomain of $\overline{T}$ to $\text{im}(T)$, then $\overline{T}$ becomes a bijection and hence $V/\text{ker}(T) \cong \text{im}(T).\:\Box$

Example:
$T:\mathbb{R}^{n\times n} \to \mathbb{R}^{n\times n}\quad T(A)=A-A^T$
$\text{ker}(T)=\{A\in \mathbb{R}^{n\times n}\:|\:A=A^T\}$ set of symmetric matrices
$\text{im}(T)=\{B=A-A^T\:|\:A\in \mathbb{R}^{n\times n}\}$
$B^T=A^T-A=-B$ set of antisymmetric matrices
$\mathbb{R}^{n\times n}/ \{n\times n\;\text{symmetric matrices}\} \cong \{n\times n\;\text{antisymmetric matrices}\}$

Strong form of the first isomorphism theorem
Let $T:V\to W'$ be a linear transformation of vector spaces over $\mathbb{F}$. Suppose $W \subset \text{ker}(T)$. Then there exists a unique linear transformation $\overline{T}:V/W\to \text{im}(T)$ such that $\overline{T}\circ \pi=T$, where $\pi:V\to V/W$, and $\overline{T}$ is surjective. Also, $\text{ker}(\overline{T})\cong \text{ker}(T)/W$ and $(V/W)/(\text{ker}(T)/W)\cong \text{im}(T)$.

Second isomorphism theorem
Let $U,W$ be subspaces $V$. Then $(U+W)/U \cong W/(U \cap W)$.

Third isomorphism theorem
Let $V$ be a vector space and $U\subset W\subset V$ be subspaces. Then $(V/U)/(W/U)\cong V/W$.

References:
An awesome video on quotient space
Definition of quotient set
Basis for $V/U$
Examples of quotient spaces
Intuition about quotient spaces
Constructing $\mathbb{C}$ from $\mathbb{R}$

Saturday, 25 July 2015

Commutative diagrams

Commutative diagrams are graphic or reasoning devices that express the same result in two or more ways by composition. [See here for the formal definition.] They give us freedom to choose the method that best suits our purpose. In complex situations, they help us make sense of relationships we may have otherwise missed. Examples of commutative diagrams abound in linear algebra and abstract algebra.

Examples:
Extension by linearity
Let $V$ be a vector space over $\mathbb{F}$ with basis $B$. Then given any vector space $W$ over $\mathbb{F}$ and any function $f:B\to W$, there exist a unique linear transformation (called linear extension) $\overline{f}:V\to W$ which extends $f$ in the sense that $\overline{f}(v)=f(v)$ for all $v\in B$, so that the diagram below commutes:

$B \xrightarrow{\quad f \quad} W\\ \; i \searrow \quad \;\; \nearrow \overline{f}\\ \qquad \; V$

[$\overline{f}(v)=f(v)$ for all $v\in B$ can be replaced by $\overline{f}|_B=f$.]

This theorem tells us that to define a linear map, we only need to specify where the basis elements are mapped to. Once this is done, we can always extend the mapping to the whole space by linearity. Also, there is no constraint on how such basis elements should be mapped. That is, a linear map is entirely determined by its action on a basis.

Now suppose a linear map $f:X \to Y$ is given between finite-dimensional spaces $X,Y$ and that $\{e_1,\cdots,e_n\},\{v_1,\cdots,v_m\}$ are bases for $X,Y$ respectively. From what we have seen above, the mapping $f$ is uniquely determined by specifying $f(e_i),i=1,\cdots,n$. Since $\{v_1,\cdots,v_m\}$ is a basis for $Y$, each $f(e_i)$ can be expressed as a linear combination of $\{v_1,\cdots,v_m\} \Rightarrow f(e_i)=\sum\limits_{j=1}^m a_{ji}v_j$.

In matrix notation, this means

$\begin{pmatrix} f(e_1) & \cdots & f(e_n)\end{pmatrix}=\begin{pmatrix} v_1 & \cdots & v_m \end{pmatrix} \begin{pmatrix} a_{11} & \cdots & a_{1n}\\ a_{21} & \cdots & a_{2n} \\ & \cdots & \\ a_{m1} & \cdots & a_{mn} \end{pmatrix}.$

Hence every element $x=\sum\limits_{i=1}^n x_ie_i$ is mapped, by the linearity of $f$, as

$f(\sum\limits_{i=1}^n x_ie_i)=\sum\limits_{i=1}^n x_i f(e_i)=\sum\limits_{i=1}^n \sum\limits_{j=1}^m a_{ji}x_iv_j=\sum\limits_{i=1}^m(\sum\limits_{j=1}^n a_{ij}x_j)v_i$.

In other words, when we express $f(\sum\limits_{i=1}^n x_ie_i)$ as $\sum\limits_{i=1}^m y_iv_i$, the coefficients $y_1,\cdots,y_m$ are given by

$\begin{pmatrix}y_1\\y_2\\ \vdots \\y_m \end{pmatrix}=\begin{pmatrix} a_{11} & \cdots & a_{1n}\\ a_{21} & \cdots & a_{2n} \\ & \cdots & \\ a_{m1} & \cdots & a_{mn} \end{pmatrix} \begin{pmatrix}x_1\\x_2\\ \vdots \\x_m\end{pmatrix}$.

This is the matrix representation of $f$ (with respect to the bases $\{e_1,\cdots,e_n\}$ and $\{v_1,\cdots,v_m\}$).

Change of coordinates
Let $V$ be a n-dimensional vector space. Let $\{v_1,v_2,\cdots,v_n\}$ be a basis for $V$ and $g_1:V\to \mathbb{F}^n$ be the coordinate mapping corresponding to this basis. Let $\{u_1,u_2,\cdots,u_n\}$ be another basis for $V$ and $g_2:V\to \mathbb{F}^n$ be the coordinate mapping corresponding to this basis.

$\qquad V\\ g_1 \swarrow \;\; \searrow g_2\\ \mathbb{F}^n \xrightarrow{\qquad\;} \mathbb{F}^n$

The composition $g_2\circ g_1^{-1}$ is a linear operator on $\mathbb{F}^n$. It has the form $x \mapsto Ax$, where $A$ is an $n\times n$ matrix. $A$ is the transition matrix from $\{v_1,v_2,\cdots,v_n\}$ to $\{u_1,u_2,\cdots,u_n\}$. Columns of $A$ are coordinates of the vectors $\{v_1,v_2,\cdots,v_n\}$ with respect to the basis $\{u_1,u_2,\cdots,u_n\}$.

Change of coordinates for a linear operator
Let $T:V\to V$ be a linear operator on a vector space $V$.
Let $A$ be the matrix of $L$ relative to a basis $\{a_1,a_2,\cdots,a_n\}$ for $V$.
Let $B$ be the matrix of $L$ relative to a basis $\{b_1,b_2,\cdots,b_n\}$ for $V$.
Let $U$ be the transition matrix from the basis $\{a_1,a_2,\cdots,a_n\}$ to $\{b_1,b_2,\cdots,b_n\}$.

$\begin{array}[c]{ccc}\qquad \text{a-coordinates of v}&\xrightarrow{\quad A\quad}&\text{a-coordinates of T(v)}\\ \scriptstyle{U}\Bigg\downarrow& &\Bigg\downarrow \scriptstyle{U}\\ \qquad \text{b-coordinates of v} & \xrightarrow{\quad B\quad} & \text{b-coordinates of T(v)} \end{array}$

It follows that $UAx=BUx$ for all $x \in \mathbb{F}^n \Rightarrow UA=BU.$ Then $A=U^{-1}BU$ and $B=UAU^{-1}$.

Another similar commutative diagram [p1]
More similar diagrams [p10,14,15]

Isomorphism
A linear map $T:V\to W$ is isomorphism if we can find $L:W\to V$ such that $TL=I_W$ and $LT=I_V$.

$\begin{array}{lll} V & \xrightarrow{\quad T\quad} & W \\
\uparrow {I_V} & & \uparrow{I_W} \\
V & \xleftarrow{\quad L\quad} & W \\
\end{array}$

Universal property of the quotient
Let $V$ be a vector space over $\mathbb{F}$ and $W$ be a subspace of $V$. Define $\pi:V\to V/W$ by $v \mapsto [v]$. If $W'$ is a vector space over $\mathbb{F}$ and $f:V\to W'$ is a linear map whose kernel contains $W$, then there exists a unique linear map $g:V/W\to W'$ such that $f=g\circ \pi$. This is summarized by the following commutative diagram:

$V \xrightarrow{\qquad f \qquad} W'\\ \; \pi \searrow \quad \qquad \nearrow g\\ \qquad\;\; V/W$

Proof:
Let $W'$ be a vector space over $\mathbb{F}$, $f:V\to W'$ be a linear map with $W \subset \text{ker}f$. Then define $g:V/W\to W'$ to be the map $[v]\mapsto f(v)$.
$g$ is well-defined because if $[v]\in V/W$ and $v_1,v_2\in V$ are both representatives of $[v]$, then there exists $w\in W$ such that $v_1=v_2+W$.
Hence $f(v_1)=f(v_2+w)=f(v_2)+f(w)=f(v_2)$ since f is linear.
The linearity of $g$ comes from linearity of $f$.
Also, by definition $f=g\circ \pi$.
It remains to prove $g$ is unique.
Suppose $h:V/W\to W'$ is such that $h\circ \pi=f=g\circ \pi$. Then $h([v])=g([v])$ for all $v\in V$. Since $\pi$ is surjective, $h=g.\:\Box$

Homomorphism theorem
Let $X,Y$ be vector spaces over $\mathbb{F}$ and $f:X \to Y$ a linear map. Then $X/\text{ker}f \cong \text{im}f$, namely $X/\text{ker}f$ is isomorphic to $\text{im}f$.

Proof: Consider the mapping $\overline{f}:X/\text{ker}f \to Y \; \overline{f}:[x]=x+\text{ker}f \mapsto f(x)$.

A commutative diagram would express this as

$X \xrightarrow{\qquad f \qquad} Y\\ \; \pi \searrow \quad \qquad \nearrow \overline{f}\\ \qquad X/ \text{ker}f$

We can use similar arguments as the previous example to prove that $\overline{f}$ is well-defined and linear.
Now suppose $\overline{f}([x])=0$. Then $f(x)=0$, which implies $x\in \text{ker}f$. This means $[x]=0_{X/\text{ker}f}$. Then $\overline{f}$ is injective. Hence if we confine the codomain of $\overline{f}$ to $\text{im}f$, then $\overline{f}$ becomes a bijection and hence $X/\text{ker}f \cong \text{im}f.\:\Box$

Complexification of vector spaces

Universal property of free groups
Let $G$ be a group wth a generating set $X \subset G$. Then $G$ is free on $X$ iff every map $\phi:X\to H$ from $X$ into a group $H$ can be extended to a unique homomorphism $\overline{\phi}:G\to H$, so that the diagram below commutes:

$X \xrightarrow{\quad \phi \quad} H\\ \; i \searrow \quad \;\; \nearrow \overline{\phi}\\ \qquad \; G$

where $X \xrightarrow{i} G$ is the inclusion of $X$ into $G$.

As you may have noticed, extension by linearity, universal property of quotient, universal property of free groups, and universal property of complexification all share a similar commutative diagram. All these come with three sets and three maps (one of them being injective or surjective; another being natural/standard embedding; lastly a unique map). They are called inital properties in category theory. Check out the formal definition of universal property.

Applications:

The reason we can use Fourier transform to solve differential equations is that the following diagram commutes.

$\begin{array}[c]{ccc}\qquad \text{problem}&\xrightarrow{\text{Fourier transform}}&\text{transformed problem}\\ \scriptstyle{\text{solve (hard)}}\Bigg\downarrow& &\Bigg\downarrow \scriptstyle{\text{solve (easy)}}\\ \qquad \text{solution} & \xleftarrow{\text{Inverse Fourier transform}} & \text{transformed solution} \end{array}$

Outside of the realm of mathematics, we can also see instances of commutative diagram. A famous example is Hess's law in physical chemistry, which states that the change of enthalpy in a chemical reaction is independent of the pathway between the initial and final states. What this means is that we have the following commutative diagram:

$\text{reactants A} \xrightarrow{\Large \text{route 1}} \text{products B}\\ \quad \qquad\ \searrow \quad \qquad \qquad \nearrow\\ \qquad \quad \quad \text{intermediates}\\ \qquad \qquad \quad\text{(route 2)}$

In fact, we have been using this kind of 'reasoning' all along.

Examples:

In the derivation of normal distribution, we express the same probability in two different ways.
Some trigonometric identities can be proved employing this idea: evaluating the integral $\int \sin 2x \cos x \:dx$ in two different ways can yield the identity $\cos 3x=4\cos^3 x-3\cos x$.
Proofs using combinatorial arguments count the same quantity in two different ways.

More to explore:
Category theory
Universal property
More examples of commutative diagrams
Definition of commutative diagrams
Free groups

References:
Commutative diagram
Universal property of free groups [p6]
Examples of commutative digrams [p4-6,28]
Quotient space [p1-3]
Extension by linearity
From Vector Spaces to Function Spaces: Introduction to Functional Analysis with Applications by Yutaka Yamamoto [p16,17,29]

Tuesday, 14 July 2015

Tensor product

Key ideas: Cartesian product, free vector space, quotient space, equivalence relations

Take the Cartesian product $U\times V$. Consider every element to be a basis element, and form every possible linear combination (over the field $\mathbb{F}$) of $u$'s and $v$'s. This gives us the 'free vector space' over $U\times V$, $F(U\times V)$.

For example, an element of $F(\mathbb{R}\times \mathbb{R})$ can be $2(1,0)+3(2,-2)+4(3,-2)$. We can't simplify this any further.

But we don't actually want this vector space; it's 'too big'. We want to form a quotient space by modding out a subspace, $W$. We're going to specify $W$ by specifying its basis elements, which are all elements of the form: $$(u,v+v')-(u,v)-(u,v')\\(u+u',v) - (u,v) - (u',v)\\(cu,v) - c(u,v)\\(u,cv) - c(u,v).$$
'Modding these out' effectively sets all of these to $0$ in the quotient space $F(U\times V)/W$.

So the vectors of $U\otimes W$ are actually the equivalence classes of $F(U\times V)$ under the following equivalence relations: $$(u,v+v')\sim (u,v)+(u,v')\\(u+u',v)\sim (u,v)+(u',v)\\(cu,v)\sim c(u,v)\sim (u,cv).$$
An element $(u,v)+W$ is written $u\otimes v$. Then the above rules become: $$u\otimes (v+v')=u\otimes v+u\otimes v'\\(u+u')\otimes v=u\otimes v+u'\otimes v\\c(u\otimes v)=(cu\otimes v)=(u\otimes cv).$$
In other words, $\otimes$ is bilinear on $U\otimes V$.

Let's look at $2(1,0)+3(2,-2)+4(3,-2)$ as it occurs in the tensor product $\mathbb{R}\otimes \mathbb{R}$. It becomes:
$$\begin{align}2(1\otimes 0)+3(2\otimes -2)+4(3\otimes -2)&=2\otimes 0+6\otimes-2+12⊗-2\\&=2\otimes 0+(6+12)\otimes-2\\&=2\otimes 0+18\otimes -2\\&=2\otimes (9*0)+18\otimes -2\\&=18\otimes 0+18\otimes -2\\&=18\otimes -2\\&=-2(18\otimes 1)\\&=-36(1\otimes 1).\end{align}$$
In fact, for any two real numbers $a,b$, we have $(a\otimes b)=ab(1\otimes 1)$. So $\{1\}$ is a basis for $\mathbb{R}$ and $\{1\otimes 1\}$ is a basis for $\mathbb{R}\otimes \mathbb{R}$, which is thus a $1$-dimensional vector space over $\mathbb{R}$, and hence isomorphic to $\mathbb{R}$, namely $\mathbb{R}\otimes \mathbb{R} \cong \mathbb{R}$ (as vector spaces). But note that we can do something in the vector space $\mathbb{R}\otimes \mathbb{R}$ we can't do ordinarily in a vector space: we can 'multiply vectors'.

If $U$ has a basis $\{e_1,\cdots,e_n\}$ and $V$ has a basis $\{e'_1,\cdots,e'_m\}$ then $U\otimes V$ has the basis $\{e_i\otimes e'_j\}$, where $i=1,\cdots,n$ and $j=1,\cdots,m$. Thus $U\otimes V$ has dimension $mn$. In general $U\otimes V$ can be thought of as a generic bilinear map on $U\times V$ (if $U=V=\mathbb{F}$, this becomes ordinary field multiplication). What bilinearity means is that $$u\otimes v=(\sum_{i=1}^n u_ie_i) \otimes (\sum_{j=1}^m v_je'_j)=\sum_{i=1}^n \sum_{j=1}^m u_i v_j(e_i\otimes e'_j).$$ Namely, $u\otimes v$ is a vector in $U\otimes V$, spanned by the basis vectors $e_i\otimes e'_j$, and the coefficients are given by $u_iv_j$ for each basis vector.

Let $n=2,m=3$. The tensor product space is $mn=6$-dimensional. The basis vectors are $e_1\otimes e'_1,e_1\otimes e'_2,e_1\otimes e'_3,e_2\otimes e'_1,e_2\otimes e'_2,e_2\otimes e'_3$. Written as six-component column vectors, they are $$e_1\otimes e'_1=(1,0,0,0,0,0)^T,e_1\otimes e'_2=(0,1,0,0,0,0)^T,e_1\otimes e'_3=(0,0,1,0,0,0)^T\\e_2\otimes e'_1=(0,0,0,1,0,0)^T,e_2\otimes e'_2=(0,0,0,0,1,0)^T,e_2\otimes e'_3=(0,0,0,0,0,1)^T.$$ For general vectors $u$ and $v$, the tensor product is $$u\otimes v=\begin{pmatrix}u_1v_1\\u_1v_2\\u_1v_3\\u_2v_1\\u_2v_2\\u_2v_3\end{pmatrix}.$$

Let's now consider the map $A:U\to U, u\mapsto Au$ or $$\begin{pmatrix} u_1\\u_2\\ \vdots\\ u_n\end{pmatrix}\mapsto \begin{pmatrix} a_{21}&a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\a_{n1}&a_{n2}&\cdots&a_{nn}\end{pmatrix}\begin{pmatrix} u_1\\u_2\\ \vdots\\ u_n\end{pmatrix}.$$ On the tensor product space $U\otimes V$, the same matrix can still act on the vectors, so that $u\mapsto Au$, but $v\mapsto v$ untouched. The matrix is $A\otimes I$. $$A\otimes I=\begin{pmatrix}a_{11}&0&0&a_{12}&0&0\\0&a_{11}&0&0&a_{12}&0\\0&0&a_{11}&0&0&a_{12}\\a_{21}&0&0&a_{22}&0&0\\0&a_{21}&0&0&a_{22}&0\\0&0&a_{21}&0&0&a_{22}\\\end{pmatrix}$$ If we act it on $u\otimes v$, $$\begin{align}(A\otimes I)(u\otimes v)&=\begin{pmatrix}a_{11}&0&0&a_{12}&0&0\\0&a_{11}&0&0&a_{12}&0\\0&0&a_{11}&0&0&a_{12}\\a_{21}&0&0&a_{22}&0&0\\0&a_{21}&0&0&a_{22}&0\\0&0&a_{21}&0&0&a_{22}\\\end{pmatrix} \begin{pmatrix}u_1v_1\\u_1v_2\\u_1v_3\\u_2v_1\\u_2v_2\\u_2v_3\end{pmatrix}\\&=\begin{pmatrix}(a_{11}u_1+a_{12}u_2)v_1\\(a_{11}u_1+a_{12}u_2)v_2\\(a_{11}u_1+a_{12}u_2)v_3\\(a_{21}u_1+a_{22}u_2)v_1\\(a_{21}u_1+a_{22}u_2)v_2\\(a_{21}u_1+a_{22}u_2)v_3\end{pmatrix}\\&=(Au)\otimes v \end{align}$$ We can see that the matrix $A$ indeed acts only on $u\in U$, and leaves $v\in V$ untouched.

Similarly, the matrix $B:V\to V, w\mapsto Bw$ can also act on $U\otimes V$ as $I\otimes B$: $$I\otimes B=\begin{pmatrix}b_{11}&b_{12}&b_{13}&0&0&0\\b_{21}&b_{22}&b_{23}&0&0&0\\b_{31}&b_{32}&b_{33}&0&0&0\\0&0&0&b_{11}&b_{12}&b_{13}\\0&0&0&b_{21}&b_{22}&b_{23}\\0&0&0&b_{31}&b_{32}&b_{33}\end{pmatrix},$$ which acts on $u\otimes v$ as $$\begin{align}(I\otimes b)(u\otimes v)&=\begin{pmatrix}b_{11}&b_{12}&b_{13}&0&0&0\\b_{21}&b_{22}&b_{23}&0&0&0\\b_{31}&b_{32}&b_{33}&0&0&0\\0&0&0&b_{11}&b_{12}&b_{13}\\0&0&0&b_{21}&b_{22}&b_{23}\\0&0&0&b_{31}&b_{32}&b_{33}\end{pmatrix}\begin{pmatrix}u_1v_1\\u_1v_2\\u_1v_3\\u_2v_1\\u_2v_2\\u_2v_3\end{pmatrix}\\
&=\begin{pmatrix}u_1(b_{11}v_1+b_{12}v_2+b_{13}v_3)\\ u_1(b_{21}v_1+b_{22}v_2+b_{23}v_3)\\u_1(b_{31}v_1+b_{32}v_2+b_{33}v_3)\\ u_2(b_{11}v_1+b_{12}v_2+b_{13}v_3)\\u_2(b_{21}v_1+b_{22}v_2+b_{23}v_3)\\
u_2(b_{31}v_1+b_{32}v_2+b_{33}v_3)\end{pmatrix}\\
&=u\otimes (Bv)\end{align}.$$ If you have two matrices, their multiplications are done on each vector space separately, $$(A_1\otimes I)(A_2\otimes I)=(A_1A_2)\otimes I\\ (I\otimes B_1)(I\otimes B_2)=I\otimes(B_1B_2)\\(A\otimes I)(I\otimes B)=(I\otimes B)(A\otimes I)=(A\otimes B).$$
We can write out $A\otimes B$ explicitly: $$A\otimes B=\begin{pmatrix} a_{11}b_{11}&a_{11}b_{12}&a_{11}b_{13}&a_{12}b_{11}&a_{12}b_{12}&a_{12}b_{13}\\
a_{11}b_{21}&a_{11}b_{22}&a_{11}b_{23}&a_{12}b_{21}&a_{12}b_{22}&a_{12}b_{23}\\
a_{11}b_{31}&a_{11}b_{32}&a_{11}b_{33}&a_{12}b_{31}&a_{12}b_{32}&a_{12}b_{33}\\
a_{21}b_{11}&a_{21}b_{12}&a_{21}b_{13}&a_{22}b_{11}&a_{22}b_{12}&a_{22}b_{13}\\
a_{21}b_{21}&a_{21}b_{22}&a_{21}b_{23}&a_{22}b_{21}&a_{22}b_{22}&a_{22}b_{23}\\
a_{21}b_{31}&a_{21}b_{32}&a_{21}b_{33}&a_{22}b_{31}&a_{22}b_{32}&a_{22}b_{33}
\end{pmatrix}.$$ One can verify that $(A\otimes B)(u\otimes v)=(Au)\otimes (Bv)$.
Other useful formulae are $\text{det}(A\otimes B)=(\text{det}A)^m(\text{det}B)^n$ and $\text{Tr}(A\otimes B)=(\text{Tr}A)(\text{Tr}B)$.

Now we present the formal definitions related to tensor product.

As with most of linear algebra, one has the usual choice between a concrete, base-oriented definition and an abstract, base-free definition. The latter definition has clear advantages since we can use it for important, more general applications. Nevertheless, to build intuition, we first provide the concrete (base-oriented) definition of the tensor product of vector spaces over a field.

Definition:
Suppose $A$ and $B$ are vector spaces over a field $\mathbb{F}$, with bases $\{a_1,\cdots,a_m\}$ and $\{b_1,\cdots,b_n\}$ respectively. The tensor product $A\otimes B$ is defined as the vector space of dimension $mn$ over $\mathbb{F}$ with base $\{a_i \otimes b_j: 1\leq i \leq m, 1\leq j \leq n\}$. For any $a=\sum_i \alpha_i a_i \in A$ and $b=\sum_j \beta_j b_j \in B$ with $\alpha_i, \beta_i$ in $\mathbb{F}$, we define the simple tensor $a\otimes b$ to be $\sum_{i,j} \alpha_i \beta_j a_i\otimes b_j$. Thus $\alpha(a\otimes b)=\alpha a\otimes b=a\otimes \alpha b$, $\forall \alpha \in \mathbb{F}$.

Despite its intuitive immediacy, this definition hampers development of the theory because of its reliance on the choice of base.

We now try an abstract approach, which provides direct, computation-free proofs of all the important properties. An extra benefit is the ability to work with modules and algebras over arbitrary commutative rings, since the existence of a base no longer is required to define the tensor product.

A balanced map is a function $\psi:M\times N \to G$, where $G=(G,+)$ is an Abelian group, satisfying the following properties:
$$\phi(a_1+a_2,b)=\phi(a_1,b)+\phi(a_2,b)\\
\phi(a,b_1+b_2)=\phi(a,b_1)+\phi(a,b_2)\\
\phi(ar,b)=\phi(a,rb)$$
for all $a_i\in M,b_i\in N,r\in R$. The tensor product is a group $M\otimes_R N$, endowed with a balanced map $\phi_{M,N}:M\times N\to M\otimes_R N$, such that for any balanced map $\psi:M\times N \to G$ there is a unique group homomorphism $\overline{\psi}:M\otimes_R N\to G$ such that $$\overline{\psi}\phi_{M,N}=\psi,$$
namely, the diagram
$$M\times N \xrightarrow{\qquad \psi \qquad} G\\ \; \phi_{M,N} \searrow \quad \qquad \nearrow \overline{\psi}\\ \qquad\;\; M\otimes_R N$$
is commutative. Our universal construction is achieved by taking the free Abelian group, that is, the free $\mathbb{Z}$-module, generated by what are to become the simple tensors, and factoring out all the desired relations.

Definition: Given a right $R$-module $M$ and an $R$-module $N$ over an arbitrary ring $R$, $M\otimes_R N$ is defined to be $F/K$, where $F$ is the free $\mathbb{Z}$-module with basis $M\times N$, the set of ordered pairs $\{(a,b):a\in M, b\in N\}$, and where $K$ is the submodule generated by the elements
$$(a_1+a_2,b)-(a_1,b)-(a_2,b);\\ (a,b_1+b_2)-(a,b_1)-(a,b_2);\\ (ar,b)-(a,rb)$$
for all $a_i\in M,b_i\in N,r\in R$.

However abstract the construction of $M\otimes_R N$, it must be "correct", being unique up to isomorphism, by "abstract nonsense".

References
What are tensor products?
Algebra by Ernest Shult
Graduate Algebra: Noncommutative View by Louis Halle Rowen [p137-140]

More to read
How to lose your fear of tensor products

Sunday, 12 July 2015

Famous inequalities

In this post we prove several important inequalities that often arise in functional analysis.

Cauchy-Schwartz inequality
Let $x$ and $y$ be two vectors in $\mathbb{R}^n$. The Cauchy-Schwartz inequality states that $|\langle x,y \rangle| \leq \lVert x \rVert \lVert y \rVert$. Equality holds if and only if $x$ and $y$ are linearly dependent, that is, if $x=\lambda y$ for some scalar $\lambda$.

Proof 1: dot product
Note that $w \cdot w=w_1^2+w_2^2+\cdots+w_n^2 \geq 0$ for any $w \in \mathbb{R}^n$. We rescale $x$ and $y$ to new vectors which have the same length, namely $\lVert x \rVert y$ and $ \lVert y \rVert x$. Then

$$\begin{align} \langle \lVert x \rVert y-\lVert y \rVert x,\lVert x \rVert y-\lVert y \rVert x \rangle &\geq 0\\
\lVert x \rVert^2 \langle y,y \rangle-2\lVert x \rVert \lVert y \rVert \langle x,y \rangle+\lVert y \rVert^2 \langle x,x \rangle & \geq 0\\
2\lVert x \rVert^2 \lVert y \rVert^2-2\lVert x \rVert \lVert y \rVert \langle x,y \rangle & \geq 0\\
\langle x,y \rangle \leq \lVert x \rVert \lVert y \rVert
\end{align}$$

Similarly, setting $\langle \lVert x \rVert y+\lVert y \rVert x,\lVert x \rVert y+\lVert y \rVert x \rangle \geq 0$ gives us $-\langle x,y \rangle \leq \lVert x \rVert \lVert y \rVert$. Combining both cases, we have $|\langle x,y \rangle| \leq \lVert x \rVert \lVert y \rVert.\:\Box$

Proof 2: orthogonality
We orthogonally decompose $x$ into a component parallel to $y$ and a component perpendicular to $y$, and use the fact that the perpendicular component has a non-negative square norm.

$$\begin{align}\lVert x-\frac{\langle x,y \rangle}{\langle y,y \rangle}y \rVert^2 & \geq 0\\ \langle x-\frac{\langle x,y \rangle}{\langle y,y \rangle}y,x-\frac{\langle x,y \rangle}{\langle y,y \rangle}y \rangle & \geq 0\\ \langle x,x \rangle-2\frac{|\langle x,y \rangle|^2}{\langle y,y \rangle}\langle x,y \rangle+\frac{|\langle x,y \rangle|^2}{\langle y,y \rangle} & \geq 0\\ \langle x,x \rangle-\frac{|\langle x,y \rangle|^2}{\langle y,y \rangle} & \geq 0 \quad\Box \end{align}$$

Proof 3: discriminant
$\langle \lambda x-y, \lambda x-y \rangle=\lambda^2\langle x,x \rangle-2\lambda\langle x,y \rangle+\langle y,y \rangle$
This is a quadratic polynomial in $\lambda$, which is nonnegative for every $\lambda$, hence its discriminant $\Delta \leq 0$.
So $\langle x,y \rangle^2-\langle x,x \rangle \langle y,y \rangle \leq 0 \iff |\langle x,y \rangle|^2 \leq \lVert x \rVert^2 \lVert y \rVert^2.\:\Box$

Proof 4: Lagrange's identity [Check this out for more details.]
$|\langle x,y \rangle|=\lVert x \rVert \lVert y \rVert \iff$ $x$ and $y$ are linearly dependent $\iff$ the sequences $(a_i)$ and $(b_i)$ are proportional, where $x=(a_1,a_2,\cdots,a_n)^T$ and $y=(b_1,b_2,\cdots,b_n)^T$.

Define $(a_i)$ and $(b_i)$ to be proportional if $a_ib_j=a_jb_i$ for every pair $i,j$, or equivalently $a_ib_j-a_jb_i=0$ for every pair $i,j$. Thus

$$\begin{align}\sum_{i,j} (a_ib_j-a_jb_i)^2&=0\\
\sum_{i,j} (a_i^2b_j^2-2a_ia_jb_ib_j+a_j^2b_i^2)&=0\\
2(\sum_i a_i^2)(\sum_j b_j^2)-2(\sum_i a_ib_i)^2&=0.\:\Box \end{align}$$

The inequality together with the equality case, follows immediately, provided that the two sequences are positive.

Intuition for this inequality:
We restate the inequality as $|\langle x,y \rangle| \leq \lVert x \rVert$ for all normalised vector $y$, namely $\lVert y \rVert=1$. Then $\langle x,y \rangle$ is the component of the vector $x$ in the direction $y$. So the inequality can be thought of as saying that the component of a vector $x$ in a single direction is bounded by the magnitude of $x$.

Lemma
Let $0<\lambda<1$. Then $t^\lambda \leq 1-\lambda+\lambda t$ for all $t \geq 0$. This inequality becomes equality only when $t=1$.

Proof
Define $f: [0,\infty) \to \mathbb{R}$ by $f(t)=1-\lambda+\lambda t-t^\lambda$.
Then $f'(t)=\lambda-\lambda t^{\lambda-1}=\lambda(1-\dfrac{1}{t^{1-\lambda}})$.
$f'(t) \begin{cases} >0\;\text{if}\; t>1 \\ <0\;\text{if}\; 0<t<1 \end{cases}$
Thus $f(1)=0=\min f$ and $f(t)\geq 0$ for all $t \geq 0$, with equality iff $t=1$.
$\Rightarrow t^\lambda \leq 1-\lambda+\lambda t \;\; \forall t \geq 0, \quad t^\lambda=1-\lambda+\lambda t \;\; \text{iff} \;\; t=1\:\Box$

The inequalities that follow all use the notion of conjugate exponents.

Conjugate exponents are positive real numbers $p,q$ such that $\dfrac{1}{p}+\dfrac{1}{q}=1$.
For example, the pair $1,\infty$ is a pair of conjugate exponents since $p \to 1 \Rightarrow q \to \infty$.
If $p,q$ are integers, the only pair of conjugate exponents is $2,2$.

Young’s inequality
Let $p,q$ be conjugate exponents with $1<q<\infty$. Then $ab \leq \dfrac{a^p}{p}+\dfrac{b^q}{q}$ for all $a,b \geq 0$.
Equality holds iff $a^p=b^q$.

Proof 1: the above lemma
If $b=0$, there is nothing to prove.
Assume $b>0$. Using the lemma, substitute $\frac{1}{p}<1$ for $\lambda$ and $a^pb^{-q}$ for $t$.
Then we have
$$\begin{align}a^pb^{-q} &\leq 1-\frac{1}{p}+\frac{1}{p} a^pb^{-q}\\
ab^{\frac{-q}{p}} &\leq \frac{1}{p} a^pb^{-q}+\frac{1}{q} \quad [\frac{1}{p}+\frac{1}{q}=1]\\
ab^{\frac{-q}{p}+q} &\leq \frac{a^p}{p}+\frac{b^q}{q}\\
ab &\leq \frac{a^p}{p}+\frac{b^q}{q} \quad [\frac{q}{p}+1=q \Rightarrow \frac{-q}{p}+q=1],\end{align}$$
with equality when $t=1$, namely $a^pb^{-q}, a^p=b^q. \:\Box$

[How do we know we can prove $ab \leq \frac{a^p}{p}+\frac{b^q}{q}$ by proving $t^\lambda \leq 1-\lambda+\lambda t$? We divide Young's inequality by $b^q$, which gives us $\frac{a}{b^{q-1}}\leq \frac{1}{p}\cdot \frac{a^p}{b^q}+\frac{1}{q}$.]

Proof 2i: convex function
Recall that a function $f:I \to \mathbb{R}$ is convex if for all $x,y \in I$ and $\lambda \in [0,1]$, $f((1-\lambda)x+\lambda y)\leq (1-\lambda)f(x)+\lambda f(y)$.
We can make use of the convexity of the exponential function.
Substitute $\ln a^p$ for $x$ and $\ln b^q$ for $y$.
Since $\frac{1}{p}+\frac{1}{q}=1$, we have $\large e^{\frac{1}{p}\ln a^p+\frac{1}{q}\ln b^q} \leq \frac{1}{p}e^{\ln a^p}+\frac{1}{q}e^{\ln b^q}$.
Thus $ab \leq \frac{a^p}{p}+\frac{b^q}{q}.\:\Box$

Proof 2ii: concave function
Alternatively, we can use the concavity of the logarithmic function.
$\log(\frac{a^p}{p}+\frac{b^q}{q}) \geq \frac{1}{p}\log a^p+\frac{1}{q}\log b^q=\log(ab)\:\Box$

Proof 3: integration
Consider $f(x)=x^{q-1} \quad q>1$.
Then $g(y)=y^{p-1}$ where $\frac{1}{p}+\frac{1}{q}=1$.
So $ab \leq \int_0^b x^{q-1}\:dx+\int_0^a y^{p-1}\:dy=\frac{b^q}{q}+\frac{a^p}{p}.\:\Box$

Proof 4: differentiation
Let $f(a)=\frac{a^p}{p}+\frac{b^q}{q}-ab$. Then $f'(a)=a^{p-1}-b$.
Setting $f'(a)=0$, we have $a=b^\frac{1}{p-1} \Rightarrow a^p=b^\frac{p}{p-1}=b^q$.
$\min f=f(b^\frac{q}{p})=0 \Rightarrow 0 \leq \frac{a^p}{p}+\frac{b^q}{q}-ab \Rightarrow ab \leq \frac{a^p}{p}+\frac{b^q}{q}.\:\Box$

Remark: Young's inequality is one of the most important inequalities. It says that a product can be estimated by a sum. The [[AM-GM inequality]] is its special case. The Cauchy-Schwartz inequality can be proved from the AM-GM inequality. [to be explored in another post]

Hölder's inequality
Let $p,q$ be conjugate exponents with $1<q<\infty$. For any integer $n\geq 1$, assume that $a_1,\cdots,a_n$ and $b_1,\cdots,b_n$ are nonnegative. Then $\sum\limits_{k=1}^n a_kb_k \leq (\sum\limits_{k=1}^n a_k^p)^\frac{1}{p} (\sum\limits_{k=1}^n b_k^q)^\frac{1}{q}$.

Proof from Young's inequality:
Let $A=(\sum\limits_{k=1}^n a^p)^\frac{1}{p}$ and $B=(\sum\limits_{k=1}^n b^q)^\frac{1}{q}$.
Then $\dfrac{\sum\limits_{k=1}^n a^p}{A^p}=1$ and $\dfrac{\sum\limits_{k=1}^n b^q}{B^q}=1$.
Put $a=\dfrac{a_k}{A}, b=\dfrac{b_k}{B}$ in Young's inequality.
$\begin{align}\frac{a_kb_k}{AB} &\leq \frac{a_k^p}{pA^p}+\frac{b_k^q}{qB^q} \quad \forall k=1,2,\cdots,n\\
\sum\limits_{k=1}^n \frac{a_kb_k}{AB} &\leq \frac{1}{p}+\frac{1}{q}=1\\
\sum\limits_{k=1}^n a_kb_k &\leq (\sum\limits_{k=1}^n a^p)^\frac{1}{p} (\sum\limits_{k=1}^n b^q)^\frac{1}{q}\:\Box \end{align}$

We will delve into the applications of these inequalities in another post.

References:
proof 1 of Young’s inequality, Hölder's inequality
proof 4 of Cauchy-Schwartz inequality
Intuition of Cauchy-Schwartz inequality

Related:
Equality in Young's inequality for convolution
Young's inequality for three variables
Hölder's inequality with three functions

Wednesday, 8 July 2015

Multiplicativity of determinants

$$|AB|=|A||B|$$ Proof 1: direct computation (The most elementary proof)
Let $A=\begin{pmatrix}a&b\\c&d\end{pmatrix}$ and $B=\begin{pmatrix}x&y\\z&t\end{pmatrix}$. Then $AB=\begin{pmatrix}ax+bz&ay+bt\\cx+dz&cy+dt\end{pmatrix}$. We thus have $$\begin{align}|AB|&=(ax+bz)(cy+dt)-(ay+bt)(cx+dz)\\
&=acxy+adxt+bcyz+bdtz-acxy-bcxt-adyz-bdzt\\
&=(ad-bc)xt-(ad-bc)yz\\
&=(ad-bc)(xt-yz)\\
&=|A|\cdot |B| \quad \Box
\end{align}$$ Proof 2: exterior algebra / wedge product
$|AB|=e_1\wedge \cdots \wedge e^n=|A|(Be_1\wedge \cdots \wedge e^n)=|A|\cdot |B|(e_1\wedge \cdots \wedge e_n) \;\Box$

Proof 3: elementary row operations
Suppose A is singular. Then $|A|=0 \Rightarrow |A||B|=0$. On the other hand, $|AB|=0$ since $\text{rank}(AB)\leq\text{rank}(A)$. Thus $|AB|=|A||B|=0$.

Suppose A is invertible. Then $A$ is a product of elementary matrices $A=E_1E_2\cdots E_k$. Thus $AB=E_1E_2\cdots E_kB$. Then $|AB|=|E_1||E_2|\cdots|E_k||B|=|A||B|. \quad \Box$

More to explore
More proofs
Proof by analysis -- fundamental algebra of algebra

Friday, 3 July 2015

AM-GM inequality

Let $a_1,\cdots,a_n$ be positive real numbers. Then $$\sqrt[n]{a_1\cdots a_n}\leq \dfrac{a_1+\cdots+a_n}{n},\qquad (*)$$ with equality iff all $a_k\:(1\leq k\leq n)$ are equal.

Proof by standard induction
Let $H(n)$ stand for the hypothesis that inequality $(*)$ is valid for $n$. For $n=2$, we have $$\sqrt{a_1a_2}\leq \dfrac{a_1+a_2}{2}.\qquad (**)$$ This follows from $(\sqrt{a_1}-\sqrt{a_2})^2\geq 0$. Moreover, equality in $(**)$ holds iff $a_1=a_2$. Now we will show that $H(2)$ and $H(n)$ together imply $H(n+1)$. Let $$A_{n+1}=\dfrac{a_1+\cdots+a_n+a_{n+1}}{n+1}.$$ Then $$\begin{align}A_{n+1}&=\dfrac{1}{2n}[(n+1)A_{n+1}+(n-1)A_{n+1}]\\&=\dfrac{1}{2n}[(a_1+\cdots+a_n)+(a_{n+1}+A_{n+1}+\cdots+A_{n+1})]\\&\geq \dfrac{1}{2n}(n \sqrt[n]{a_1\cdots a_n}+n\sqrt[n]{a_{n+1}A_{n+1}^{n-1}})\\&\geq \sqrt[2n]{a_1\cdots a_n a_{n+1} A_{n+1}^{n-1}},\end{align}$$ where the first inequality follows from $H(n)$, while the second inequality follows from $H(2)$. Now, $H(n+1)$ follows from $$A_{n+1}^{2n}\geq a_1\cdots a_{n+1}A_{n+1}^{n-1}.$$ Equality holds iff $a_1=\cdots=a_n$ and $a_{n+1}=A_{n+1}$, or $a_1=\cdots=a_{n+1}$. $\Box$

Proof by analysis
The inequality $$\inf x +\inf y \leq \inf (x+y)$$ can be generalised to $$\sum\limits_{i=1}^n \inf f_i(x) \leq \inf \sum\limits_{i=1}^n f_i(x).$$ [Below we used $\sum$ to mean $\sum\limits_{i=1}^n$.]
Taking log on both sides of $(*)$, $$\sum \ln a_i\leq n \ln(\dfrac{\sum a_i}{n}).$$
Adding $n$ on both sides, $$\sum (1+\ln a_i)\leq n+n \ln(\dfrac{\sum a_i}{n}).$$
Claim $$f_i(x)=a_ix-\ln x\;\text{and}\;\min f_i(x)=1+\ln a_i.$$ Then $f_i'(x)=a_i-\frac{1}{x}$ and $f_i''(x)=\dfrac{1}{x^2}>0.$ It follows that $\min f_i(x)=1+\ln a_i$.
Now suppose $$f(x)=\sum f_i(x)=(\sum a_i)x-n\ln x.$$ Then $f'(x)=\sum a_i-\frac{n}{x} \Rightarrow x=\dfrac{n}{\sum a_i}$ and $f''(x)=\dfrac{n}{x^2}>0$. It follows that $\min f(x)=n+n\ln \dfrac{\sum a_i}{n}$.
Therefore $n+\ln a_1+\cdots+\ln a_n \leq n+n\ln \dfrac{\sum a_i}{n}\Rightarrow \dfrac{1}{n}\ln(a_1\cdots a_n)\leq \ln\dfrac{\sum a_i}{n}.\;\Box$

An application
Prove that $\{(1+\frac{1}{n})^n\}$ is monotonic increasing and bounded.
Proof: By the AM-GM inequality, we have $$\sqrt[n+1]{(1+\frac{1}{n})^n\cdot 1}<\dfrac{1}{n+1}[n(1+\frac{1}{n})+1]=1+\dfrac{1}{n+1}.$$ This is equivalent to $$(1+\dfrac{1}{n})^n<(1+\dfrac{1}{n+1})^{n+1},$$ which shows that $\{(1+\frac{1}{n})^n\}$ is monotonic increasing as required. To find an upper bound of the sequence, for $n>5$, using the AM-GM inequality once more, we have $$\sqrt[n+1]{(\dfrac{5}{6})^6\cdot 1^{n-5}}<\dfrac{1}{n+1}[6\cdot \dfrac{5}{6}+(n-5)\cdot 1]=\dfrac{n}{n+1},$$ and so $(\dfrac{5}{6})^6<(\dfrac{n}{n+1})^{n+1}$, or equivalently, $(1+\dfrac{1}{n})^{n+1}<(\dfrac{6}{5})^6$. Therefore, $$(1+\dfrac{1}{n})^n<(1+\dfrac{1}{n})^{n+1}<(\dfrac{6}{5})^6<3,\quad (\text{for}\; n>5).$$ Since $(1+\dfrac{1}{n})^n<3$, we conclude that $\{(1+\frac{1}{n})^n\}$ is bounded. $\Box$

More applications

Reference:
Excursions in Classical Analysis: Pathways to Advanced Problem Solving and Undergraduate Research