Tuesday 6 September 2016

Convolution as Weighted Average

We know that the average of $f$ over $[x-T,x+T]$ for any fixed $x$ is given by $$f_{\text{avg}}(x)=\frac{1}{2T}\int_{x-T}^{x+T} f(t)\,\mathrm{d}t.$$ Define $\phi=\frac{\Pi_T}{2T}$, where \begin{align}\Pi_T(t)=\begin{cases}1,&\quad -T\leq t\leq T\\
0,&\quad |t|>T.\end{cases}\end{align} We then have $$\int_{-\infty}^\infty \phi(t)\,\mathrm{d}t=\frac{1}{2T}\int_{-T}^T \Pi_T(t)\,\mathrm{d}t=1.$$ For any fixed $x$, \begin{align}\phi(x-t)=\begin{cases}\frac{1}{2T},&\quad x-T\leq t\leq x+T\\
0,&\quad \text{else}.\end{cases}\end{align} Thus, for any given function $f$ and any fixed value of $x$, \begin{align}f(t)\phi(x-t)=\begin{cases}\frac{1}{2T}f(t),&\quad x-T\leq t\leq x+T\\
0,&\quad \text{else}.\end{cases}\end{align} We conclude that $$f_{\text{avg}}(x)=\int_{-\infty}^\infty f(t)\phi(x-t)\,\mathrm{d}t.$$ More generally, when we replace $\phi$ with an arbitrary function, we have the definition of a convolution: the convolution of two functions $f$ and $g$ is given by $$f*g(x)=\int_{-\infty}^\infty f(t)g(x-t)\,\mathrm{d}t.$$ Reference: The Mathematics of Imaging by Timothy G. Freeman

Friday 1 July 2016

Nth root

We first study the sequence $$x_{n+1}=\dfrac{1}{2}(x_n+\dfrac{a}{x_n}),$$ where $a=2,x_1=1$ and examine the numbers $x_2,x_3,x_4,x_5$ : $$x_2=\dfrac{1}{2}(1+2)=1.5\\x_3=\dfrac{1}{2}(1.5+\dfrac{2}{1.5})\approx 1.4167\\x_4=\dfrac{1}{2}(1.4167+\dfrac{2}{1.4167})\approx1.4142\\x_5=\dfrac{1}{2}(1.4142+\dfrac{2}{1.4142})\approx 1.4142\\ \cdots$$ We can show that the sequence tends to $\sqrt{2}$. More generally, take $l=\lim x_n$, so $l=\lim x_{n+1}$ and so\begin{align*}l&=\dfrac{1}{2}(l+\dfrac{a}{l})\\ 2l^2&=l^2+a\\ l&=\sqrt{a}\;\;\text{(reject the negative number because }\;x_n>0).\end{align*} This sequence helps us generate the square root of any positive number. You may ask: how can we come up with such a sequence? The reasoning is as follows: if $x_1\neq \sqrt{a}$, then we have $x_1<\sqrt{a}$ or $x_1>\sqrt{a}$. In the first case, we have $\sqrt{a} x_1<a$ or $\sqrt{a}<\dfrac{a}{x_1}$. In the other case, if $x_1>\sqrt{a}$, then $\sqrt{a}>\dfrac{a}{x_1}$. Therefore, we have $x_1<\sqrt{a}<\dfrac{a}{x_1}$ or $\dfrac{a}{x_1}<\sqrt{a}<x_1$. If we take the average of $x_1$ and $\dfrac{a}{x_1}$, namely $x_2=\dfrac{1}{2}(x_1+\dfrac{a}{x_1})$, then the result will be between $x_1$ and $\dfrac{a}{x_1}$. Repeat this process several times: $$x_3=\dfrac{1}{2}(x_2+\dfrac{a}{x_2}), x_4=\dfrac{1}{2}(x_3+\dfrac{a}{x_3}) \cdots$$ The sequence $(x_n)$ tends to $\sqrt{a}$.

In fact, the sequence for square root can be generalised to one for nth root. $x_{n+1}=\dfrac{1}{p}\bigg((p-1)x_n+\dfrac{a}{x_n^{p-1}}\bigg)$ The reasoning is similar. We consider $\underbrace{x_n\times x_n\times \cdots\times x_n}_{p-1\; times}\times \dfrac{a}{x_n^{p-1}}=a$ and take the average of these $p$ factors.

These sequences are related to the method of Newton-Raphson. Finding the square root is equivalent to solving the equation $x^2-a=0$. Let $f(x)=x^2-a$. Then, $$x_{n+1}=x_n-\dfrac{f(x_n)}{f'(x_n)}=x_n-\dfrac{x_n^2-a}{2x_n}=\dfrac{1}{2}(x_n+\dfrac{a}{x_n}).$$ As for the nth root, we can solve the equation $x^p-a=0$ and we have $$x_{n+1}=x_n-\dfrac{x_n^p-a}{px_n^{p-1}}=x_n-\dfrac{1}{p}x_n+\dfrac{a}{px_n^{p-1}}=\dfrac{1}{p}\bigg((p-1)x_n+\dfrac{a}{x_n^{p-1}}\bigg).$$

Saturday 21 May 2016

Deriving Power Sums by Polynomial Interpolation

Most of us have seen these formulas: $$\sum_{i=k}^n k=1+2+\cdots+n=\dfrac{1}{2}n(n+1)\\ \sum_{i=k}^n k^2=1+4+\cdots+n^2=\dfrac{1}{6}n(n+1)(2n+1)\\ \sum_{i=k}^n k^3=1+8+\cdots+n^3=\dfrac{1}{4}n^2(n+1)^2\\ \sum_{i=k}^n k^4=1+16+\cdots+n^4=\dfrac{1}{30}n(n+1)(2n+1)(3n^2+3n-1).$$ These can be proved by mathematical induction, but we can actually derive them on our own!

We simply acknowledge the following fact: for any positive integer $p$, the power sum $\sum_{k=1}^n k^p$ can always be expressed as a polynomial $f(n)$ that has degree $p+1$ and has rational coefficients.

Let's try to derive the first and the last formula. For the first, let $f(n)=an^2+bn+c$. We have $f(1)=1,f(2)=3,f(3)=6$. We want to solve the linear system: $$a+b+c=1\\ 4a+2b+c=3\\ 9a+3b+c=6,$$ or equivalently in matrix form $$\begin{pmatrix}1&1&1\\ 4&2&1\\ 9&3&1\end{pmatrix}\begin{pmatrix}a\\b\\c\end{pmatrix}=\begin{pmatrix}1\\3\\6\end{pmatrix}.$$ Note that $$\begin{pmatrix}1&1&1\\ 4&2&1\\ 9&3&1\end{pmatrix}$$ is a Vandermonde matrix. Just type in vander([1 2 3])\[1;3;6] in Matlab and we have the solution: $a=1/2,b=1/2,c=0$.

For the last formula, let $f(n)=an^5+bn^4+cn^3+dn^2+en+f$. Then $f(1)=1,f(2)=17,f(3)=98,f(4)=354,f(5)=979,f(6)=2275$. Type vander([1 2 3 4 5 6]\[1;17;98;354;979;2275] in Matlab and we have $a=1/5, b=1/2, c=1/3, d=0, e=-1/30, f=0$. Finally, factor(1/5*x^5+1/2*x^4+1/3*x^3-1/30*x) gives the formula. The reader can also try to derive the formulas for $\sum_{i=k}^n k^5$ and $\sum_{i=k}^n k^6$.

Sunday 15 May 2016

Manipulations of Sums

Let $a_1,a_2,\cdots$ be any sequence of numbers. We are often interested in sums such as $a_1+\cdots+a_n$. This sum can be written more compactly using either of the following equivalent notations: $$\sum_{i=1}^n a_i\quad \text{or}\quad \sum_{1\leq i\leq n} a_i.$$ In general, if $R(i)$ is any relation involving $i$, the symbol $$\sum_{R(i)} a_i$$ means the sum of all $a_i$, where $i$ is an integer satisfying the condition $R(i)$. The letter $i$ is a dummy variable or index variable that may be renamed without changing the result.

We now discuss four important algebraic operations on sums.

1) The distributive law, for products of sums $$\bigg(\sum_{R(i)} a_i\bigg)\bigg(\sum_{S(j)} b_j\bigg)=\sum_{R(i)}\bigg(\sum_{S(j)} a_ib_j\bigg)$$ Consider for example $$\begin{align}\bigg(\sum_{i=1}^2 a_i\bigg)\bigg(\sum_{j=1}^3 b_j\bigg)&=(a_1+a_2)(b_1+b_2+b_3)\\
&=(a_1b_1+a_1b_2+a_1b_3)+(a_2b_1+a_2b_2+a_2b_3)\\
&=\sum_{i=1}^2\bigg(\sum_{j=1}^3 a_ib_j\bigg).\end{align}$$ 2) Interchanging order of summation $$\sum_{R(i)}\sum_{S(j)} a_{ij}=\sum_{S(j)}\sum_{R(i)} a_{ij}$$ For example, $$\sum_{R(i)}\sum_{j=2}^2 a_{ij}=\sum_{R(i)}(a_{i1}+a_{i2})\\ \sum_{j=1}^2 \sum_{R(i)} a_{ij}=\sum_{R(i)} a_{i1}+\sum_{R(i)} a_{i2}.$$ This means that $$\sum_{R(i)}(b_i+c_i)=\sum_{R(i)}b_i+\sum_{R(i)}c_i,$$ where $b_i=a_{i1}$ and $c_i=a_{i2}$. This operation is very useful because we often know a simple form for $\sum_{R(i)} a_{ij}$ but not for $\sum_{S(j)} a_{ij}$. More generally, we have $$\sum_{R(i)}\sum_{S(i,j)}a_{ij}=\sum_{S'(j)}\sum_{R'(i,j)}a_{ij},$$ where $S'(j)$ is the relation 'there is an integer $i$ s.t. both $R(i)$ and $S(i,j)$ are true'; and $R'(i,j)$ is the relation 'both $R(i)$ and $S(i,j)$ are true'. For example, if the summation is $\sum_{i=1}^n\sum_{j=1}^i a_{ij}$, then $S'(j)$ is the relation 'there is an integer $i$ s.t. $1\leq i\leq n$ and $1\leq j\leq i$, namely $1\leq j\leq n$; and $R'(i,j)$ is the relation '$1\leq i\leq n$ and $1\leq j\leq i$', namely $j\leq i\leq n$. Thus, $$\sum_{i=1}^n\sum_{j=1}^i a_{ij}=\sum_{j=1}^n\sum_{i=j}^n a_{ij}.$$ 3) Change of variable
$$\sum_{R(i)} a_i=\sum_{R(j)} a_j=\sum_{R(p(j))} a_{p(j)}$$ The first equality holds because we are simply changing the dummy variable. In the second case, $p(j)$ is a function of $j$ that represents a permutation of the range. For each integer $i$ satisfying the relation $R(i)$, there is a unique integer $j$ satisfying the relation $p(j)=i$. This condition is always satisfied in cases $p(j)=c+j$ and $p(j)=c-j$, where $c$ is an integer not depending on $j$. For instance, $$\sum_{1\leq i\leq n} a_i=\sum_{1\leq i-1\leq n} a_{i-1}=\sum_{2\leq i\leq n+1} a_{i-1}.$$ Note that for infinite sums, replacing $j$ by $p(j)$ is valid iff $\sum_{R(j)}|a_j|$ exists.

4) Manipulating the domain
If $R(i)$ and $S(i)$ are arbitrary relations, we have $$\sum_{R(i)}a_i+\sum_{S(i)}a_i=\sum_{R(i)\;\text{or}\;S(i)}a_i+\sum_{R(i)\;\text{and}\;S(i)}a_i.$$ For example, $$\sum_{1\leq i\leq m}a_i+\sum_{m\leq i\leq n}a_i=\bigg(\sum_{1\leq i\leq n}a_i\bigg)+a_m,$$ assuming that $1\leq m\leq n$.

Examples from high school:
[The sum of a geometric progression] Assume that $r\neq 1,n\geq 0$. A conventional way of deriving the formula is to consider $S=a+ar+\cdots+ar^n$ and $S-rS$. $$a+ar+\cdots+ar^n\\ \underline{-)\; ar+ar^2+\cdots+ar^{n+1}}\\ a-ar^{n+1}=S(1-r)\\ \Rightarrow S=a\dfrac{1-r^{n+1}}{1-r}$$ This can be easily derived too by manipulating sums: $$\begin{align}a+ar+\cdots+ar^n&=a+\sum_{i=1}^n ar^i\\
&=a+r\sum_{i=1}^n ar^{i-1}\\
&=a+r\sum_{i=0}^{n-1} ar^i\\
&=a+r\sum_{i=0}^n ar^i-ar^{n+1}\\
&=a\dfrac{1-r^{n+1}}{1-r}\end{align}$$
[The sum of a arithmetic progression] Assume $n\geq 0$. Then $$\begin{align}a+(a+d)+\cdots+(a+nd)&=\sum_{i=0}^n (a+di)\\
&=\sum_{n-i=0}^n (a+d(n-i))\\
&=\sum_{i=0}^n (a+dn-di)\\
&=\sum_{i=0}^n (2a+dn)-\sum_{i=0}^n (a+di)\\
&=(n+1)(2a+dn)-\sum_{i=0}^n (a+di)\\
&=a(n+1)+\dfrac{1}{2}dn(n+1)\\
&=\dfrac{1}{2}(n+1)(2a+dn)\end{align}$$ Another example:
Let $$\begin{align}S&=\sum_{i=0}^n\sum_{j=0}^i a_ia_j\\
&=\sum_{j=0}^n\sum_{i=j}^n a_ia_j\\
&=\sum_{i=0}^n\sum_{j=i}^n a_ia_j
\end{align}$$ Then $$\begin{align}2S&=\sum_{i=0}^n\bigg(\sum_{j=0}^i a_ia_j+\sum_{j=i}^n a_ia_j\bigg)\\
&=\sum_{i=0}^n\bigg((\sum_{j=0}^n a_ia_j)+a_ia_i\bigg)\\
&=\bigg(\sum_{i=0}^n a_i\bigg)^2+\bigg(\sum_{i=0}^n a_i^2\bigg)\end{align}$$ We have derived the identity $$\sum_{i=0}^n\sum_{j=0}^i a_ia_j=\dfrac{1}{2}\bigg(\bigg(\sum_{i=0}^n a_i\bigg)^2+\bigg(\sum_{i=0}^n a_i^2\bigg)\bigg).$$

Examples in analysis
In Riemann integration, we have the following result: Suppose $f\in \mathscr{R}[a,b]$ and $g\in \mathscr{C}([m,M])$, where $M=\sup\{f(x)\mid a\leq x\leq b\}$ and $m=\inf\{f(x)\mid a\leq x\leq b\}$. Then $g\circ f\in \mathscr{R}[a,b]$.

Idea: Consider continuous points and discontinuous points separately.

Proof: Let $\varepsilon>0$. Since $g$ is uniformly continuous on $[m,M]$, there exists $\delta>0$ s.t. $|g(x)-g(y)|<\varepsilon$ as $x,y\in [m,M]$ and $|x-y|<\delta$. Also, since $f\in \mathscr{R}[a,b]$, there exists a partition $P=\{x_0,x_1,\cdots,x_n\}$ of $[a,b]$ s.t. $U(f,P)-L(f,P)<\varepsilon \delta$. Consider $A=\{i\mid osc(f)< \delta\}$ (index w.r.t. continuous points) and $A^c=\{i\mid osc(f)\geq \delta\}$ (index w.r.t. discontinuous points). $i\in A$ does not cause much trouble. To handle $i\in A^c$, consider $$\begin{align}\sum_{i\in A^c}(x_i-x_{i-1})&=\dfrac{1}{\delta}\sum_{i\in A^c}\delta (x_i-x_{i-1})\\
&\leq \dfrac{1}{\delta}\sum_{i=1}^n(M_i(f)-m_i(f))(x_i-x_{i-1})\\
&=\dfrac{1}{\delta}(U(f,P)-L(f,P))\\
&<\varepsilon. \end{align}$$Therefore, $$\begin{align}U(g\circ f,P)-L(g\circ f,P)&=\sum_{i=1}^n\bigg(\sup_{[x_{i-1},x_i]}(g\circ f)-\inf_{[x_{i-1},x_i]}(g\circ f)\bigg)(x_i-x_{i-1})\\
&=\sum_{i\in A}\bigg(\sup_{[x_{i-1},x_i]}(g\circ f)-\inf_{[x_{i-1},x_i]}(g\circ f)\bigg)(x_i-x_{i-1})\\&+\sum_{i\in A^c}\bigg(\sup_{[x_{i-1},x_i]}(g\circ f)-\inf_{[x_{i-1},x_i]}(g\circ f)\bigg)(x_i-x_{i-1})\\
&\leq \varepsilon \sum_{i\in A}(x_i-x_{i-1})+C\sum_{i\in A^c}(x_i-x_{i-1})\\
&\leq \varepsilon(b-a)+C\varepsilon\\
&=(b-a+C)\varepsilon,\end{align}$$ where $C=\max\{g(u)\mid m\leq u\leq M\}-\min\{g(u)\mid m\leq u\leq M\}$. It follows that $g\circ f\in \mathscr{R}[a,b]$. $\Box$

Say we want to evaluate $\sum_{n=1}^\infty \dfrac{2n-1}{2^n}$ (which is convergent by ratio test). Consider $$\begin{align}s_n&=\sum_{k=1}^n \dfrac{2k-1}{2^k}\\
&=\dfrac{1}{2}+\sum_{k=2}^n\bigg(\dfrac{2k-1}{2^k}-\dfrac{2k-3}{2^k}\bigg)+\sum_{k=2}^n \dfrac{2k-3}{2^k}\\
&=\dfrac{1}{2}+\sum_{k=2}^n\dfrac{1}{2^{k-1}}+\sum_{k=2}^n \dfrac{2k-3}{2^k}\\
&=\dfrac{1}{2}+\sum_{k=2}^n\dfrac{1}{2^{k-1}}+\sum_{k=2}^n \dfrac{2(k-1)-1}{2^{k-1+1}}\\
&=\dfrac{1}{2}+\sum_{k=2}^n\dfrac{1}{2^{k-1}}+\sum_{k=1}^{n-1} \dfrac{2k-1}{2^{k+1}}\\
&=\dfrac{1}{2}+\sum_{k=2}^n\dfrac{1}{2^{k-1}}+\dfrac{1}{2}s_n-\dfrac{2n-1}{2^n}\\
&=1+2\bigg(1-\dfrac{1}{2^{n-1}}\bigg)-\dfrac{2n-1}{2^n}\end{align}$$ As $n\to \infty$, $s_n\to 3$.

Many manipulations of sums and other formulas become significantly simpler if we use the bracket notation: $$[\text{statement}]=\begin{cases} 1,\quad \text{if the statement is true;}\\ 0,\quad \text{if the statement is false.}\end{cases}$$ Then we can write $$\sum_{R(i)} a_i=\sum_i a_i[R(i)]$$ since the terms of that infinite sums are zero when $R(i)$ is false. In fact, we have seen this expression before -- Kronecker delta symbol is a special case of bracket notation: $$\delta_{ij}=[i=j]=\begin{cases}1,\quad \text{if }i=j\\ 0,\quad \text{if }i\neq j\end{cases}.$$ With bracket notation, we can derive rule (3) from rule (1) and (2): $$\begin{align}\sum_{R(p(j))} a_{p(j)}&=\sum_j a_{p(j)}[R(p(j))]\\
&=\sum_j \sum_i a_i [R(i)][i=p(j)]\\
&=\sum_i a_i[R(i)]\sum_j [i=p(j)].\end{align}$$ If $p$ is a permutation of the range, then we are left with $\sum_i a_i [R(i)]$, which is $\sum_{R(i)} a_i$.

We can also derive rule (2): $$\sum_{i=1}^n\sum_{j=1}^i a_{ij}=\sum_{i,j} a_{ij}[1\leq i\leq n][1\leq j\leq i]=\sum_{i,j}a_{ij}[1\leq j\leq n][j\leq i\leq n]=\sum_{j=1}^n\sum_{i=j}^n a_{ij},$$ since $[1\leq j\leq n][1\leq j\leq i]=[1\leq j\leq i\leq n]=[1\leq j\leq n][j\leq i\leq n]$.

More to explore:
Möbius inversion formula
arithmetic function

Reference
illustration

Saturday 7 May 2016

Evaluate $\int_0^\pi \log(1-2a\cos x+a^2)\;dx$

From the identity $$1-a^{2n}=(1-a^2)\prod_{i=1}^{n-1}(1-2a\cos \dfrac{i\pi}{n}+a^2),\quad a\in \Bbb R$$ prove that $$\int_0^\pi \log(1-2a\cos x+a^2)\;dx=\begin{align}\begin{cases}0\quad &|a|<1\\ 2\pi\log|a|\quad &|a|>1.\end{cases}\end{align}$$


The identity follows from finding the roots of $\dfrac{1-a^{2n}}{1-a^2}$, whose proof is similar to this problem.

I: Elementary real analysis
$$1-a^{2n}=(1-a^2)\prod_{i=1}^{n-1}(1-2a\cos \dfrac{i\pi}{n}+a^2)\\
\dfrac{1-a^{2n}}{1-a^2}=\prod_{i=1}^{n-1}(1-2a\cos \dfrac{i\pi}{n}+a^2)\\
1+a^2+a^4+\cdots+a^{2n-2}=\prod_{i=1}^{n-1}(1-2a\cos \dfrac{i\pi}{n}+a^2)\\
\dfrac{\pi}{n}\log(1+a^2+a^4+\cdots+a^{2n-2})=\dfrac{\pi}{n}[\log(1-2a\cos \dfrac{\pi}{n}+a^2)+\log(1-2a\cos \dfrac{2\pi}{n}+a^2)+\cdots\\+\log(1-2a\cos \dfrac{(n-1)\pi}{n}+a^2)]$$Recall the motivation of the definition of Riemann sum: we want to approximate an area bounded by a curve by calculating areas of rectangles. Riemann sum is the sum of areas of rectangles. We partition $[0,\pi]$ into $n$ subintervals and find the sum of areas of rectangles each with width $\dfrac{\pi}{n}$ and lengths $\log(1-2a\cos \dfrac{i\pi}{n}+a^2)$ for $i=1,2,\cdots,n$. Therefore, we have $$\begin{align}\int_0^\pi \log(1-2a\cos x+a^2)\;dx&=\lim_{n\to \infty}\dfrac{\pi}{n}\sum_{i=1}^n \log(1-2a\cos \dfrac{i\pi}{n}+a^2)\\
&=\lim_{n\to \infty}\dfrac{\pi}{n}\log(1+a^2+a^4+\cdots+a^{2n-2}).\end{align}$$ When $|a|<1$, $\log(1+a^2+a^4+\cdots+a^{2n-2})=0$. It follows that the integral is $0$ when $|a|<1$. As for $|a|>1$, the trick is to make use of the case $|a|<1$. Note that $a>1$ implies that $\dfrac{1}{a}<1$: $$\lim_{n\to \infty} \dfrac{\pi}{n}\log\bigg(a^{2n-2}(\dfrac{1}{a^{2n-2}}+\dfrac{1}{a^{2n-4}}+\cdots+1)\bigg)=\lim_{n\to \infty} \dfrac{2n-2}{n}\pi \log a=2\pi\log a.$$
II: Complex analysis

More to explore
other proofs
another proof
yet another proof
Evaluating a similar integral by complex analysis methods