In order to apply these methods to a system Ax = b, A should be
- Symmetric
- Positive definite
For such cases, function f is defined such that,
\begin{equation} f = \frac{1}{2} x^{T}Ax - b^{T}x+c \end{equation}
Then the gradient of f
\begin{equation} \nabla f = Ax-b \end{equation}
Hence, if f is minimised, then \nabla f =0 and it will make sure tat Ax = b. It can be seen too in the other way around.
\begin{equation} \begin{split} \nabla f = Ax-b = -r \\ \text{ r is residual} \end{split} \end{equation}
Steepest descent method
Solve Ax = b where,
A = \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix} , b = \begin{bmatrix} 1 \\ 1 \end{bmatrix}
Solution:
\begin {split} f & =\frac{1}{2} x^{T}Ax - b^{T}x+c\\ & =\frac{1}{2} \begin{bmatrix} x_{1} & x_{2} \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} - \begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} +c\\ &= \frac{1}{2} \begin{bmatrix} x_{1} & x_{2} \end{bmatrix} \begin{bmatrix} x_{1} \\ 2x_{2} \end{bmatrix} - \begin{bmatrix} x_{1} + x_{2} \end{bmatrix} +c\\ &= \frac{1}{2}x_{1} ^{2}+x_{2} ^{2} - x_{1} - x_{2}+c\\ f=0 \rightarrow &\frac{1}{2}(x_{1} ^{2} - 2x_{1}+1)+(x_{2} ^{2} - 2x_{2}.\frac{1}{2}+(\frac{1}{2})^{2})+c - \frac{3}{4}=0\\ &\rightarrow \frac{(x_{1} -1)^{2}}{2}+(x_{2}-1)^{2}=\frac{3}{4}-c \text{. If }f \text{ passing through origin, c=0}\\ &\rightarrow \frac{(x_{1} -1)^{2}}{(\sqrt{\frac{3}{2}})^{2}}+\frac{(x_{2} -\frac{1}{2})^{2}}{(\sqrt{\frac{3}{4}})^{2}} = 1\\ \end {split}
It is an ellipse with centre (1, \frac{1}{2}) and pass through origin. Select the initial guess as origin. x^{0} = (0, 0) and the solution is the centre of the ellipse which is x^{c} = (1, \frac{1}{2})
\begin{equation} x^{1} = x^{0}+\alpha_{0} r_{0} \end{equation}
Where r_{0} is the residual which indicate the first search direction and \alpha_{0} is the search distance.
\begin{equation} f(x^{1}) = \frac{1}{2} (x^{0}+\alpha_{0} r_{0})^{T}A(x^{0}+\alpha_{0} r_{0}) - b^{T}(x^{0}+\alpha_{0} r_{0})+c \end{equation}
For f(x^{1}) to be minimum, its derivative with respect to \alpha_{0} shall be zero.
\frac{1}{2} r_{0}^{T}A(x^{0}+\alpha_{0} r_{0})+ \frac{1}{2}(x^{0}+\alpha_{0} r_{0})^{T}Ar_{0} - b^{T}r_{0}=0
\begin{equation} \alpha_{0}r_{0}^{T}Ar_{0}+\frac{1}{2}r_{0}^{T}Ax_{0}+\frac{1}{2}x_{0}^{T}Ar_{0}- b^{T}r_{0}=0 \end{equation}
But we have
\begin {split} x_{0}^{T}Ar_{0} & =(x_{0}^{T}Ar_{0})^{T}\\ & = (Ar_{0})^{T}x_{0}^{T}\\ & =r_{0}^{T}A^{T}x_{0}\\ & =r_{0}^{T}Ax_{0}\\ (b^{T}r_{0})^{T}& =r_{0}^{T}b \end {split}
using the above, (6) becomes,
\alpha_{0}r_{0}^{T}Ar_{0}+r_{0}^{T}\overbrace{(Ax_{0}-b)}^{-r_{0}}=0
\alpha_{0}r_{0}^{T}Ar_{0}+r_{0}^{T}{(Ax_{0}-b)}=0
\begin{equation} \alpha_{0}=\frac{r_{0}^{T}r_{0}}{r_{0}^{T}Ar_{0}} \end{equation}
\begin{split} x^{1} = x^{0}+\alpha_{0} r_{0}\\ \alpha_{1}=\frac{r_{1}^{T}r_{1}}{r_{1}^{T}Ar_{1}}\\ x^{2} = x^{1}+\alpha_{1} r_{1}\\ ....................... \end{split}
The direction of r_{1}is perpendicular to r_{0}. Hence we will be moving in step like fashion to reach the final solution x_{c}.
Conjugate gradient method
p_{0}=r_{0}.
\begin{equation} p_{1} = k (x_{c} - x^{1}) \end{equation}
\begin {split} p_{1} & =k (x_{c} - x^{1})\\ A p_{1} & =k (Ax_{c} - Ax^{1})\\ & =k (b - Ax^{1})\\ & =k r^{1} \\ r_{0}^{T}A p_{1} & =k r_{0}^{T} r^{1} \\ & =0\\ p_{0}^{T}A p_{1} & =0 \end {split}
p_{0} is A orthogonal to new direction p_{1}.
\begin{equation} p_{0}^{T}A p_{1} =0 \end{equation}
Now we can write p_{1} as linear combination of r_{0} and r_{1}.
\begin{equation} p_{1} = r_{1} - \beta _{1} p_{0} \end{equation}
substituting (10) in (9),
\begin{equation} \beta _{1} = \frac{p_{0}^{T}A r_{1} }{p_{0}^{T}A p_{0} } \end{equation}
\begin{equation} x^{2} = x^{1}+\alpha_{1} p_{1} \end{equation}
What is \alpha_{1}?. It should be such that f is minimum.
\begin{equation} f(x^{2}) = \frac{1}{2} (x^{1}+\alpha_{1} p_{1})^{T}A(x^{1}+\alpha_{1} p_{1}) - b^{T}(x^{1}+\alpha_{1} p_{1})+c \end{equation}
equating the partial derivative of f with \alpha_{1}to zero
\frac{1}{2} p_{1}^{T}A(x^{1}+\alpha_{1} p_{1})+ \frac{1}{2}(x^{1}+\alpha_{1} p_{1})^{T}Ap_{1} - b^{T}p_{1}=0
\begin{equation} \alpha_{1}p_{1}^{T}Ap_{1}+\frac{1}{2}p_{1}^{T}Ax_{1}+\frac{1}{2}x_{1}^{T}Ap_{1}- b^{T}p_{1}=0 \end{equation}
\alpha_{1}p_{1}^{T}Ap_{1}+p_{1}^{T}\overbrace{(Ax_{1}-b)}^{-r_{1}}=0
\begin{equation} \alpha_{1}=\frac{p_{1}^{T}r_{1}}{p_{1}^{T}Ap_{1}} \end{equation}
Continuing the example…
Taking x^{0} = (0,0)
\begin {split} r_{0} & = b - A x^{0} = b = \begin{bmatrix} 1 \\ 1 \end{bmatrix} - \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 0 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \\ r_{0}^{T}r_{0} & = \begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix} =2\\ r_{0}^{T}Ar_{0} & = \begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix} =3\\ \alpha_{0}& =\frac{r_{0}^{T}r_{0}}{r_{0}^{T}Ar_{0}}=\frac{2}{3}\\ x^{1} & = x^{0}+\alpha_{0} r_{0} = \begin{bmatrix} \frac{2}{3} \\ \frac{2}{3} \end{bmatrix} \\ r_{1} &= b - Ax^{1} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} - \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} \frac{2}{3} \\ \frac{2}{3} \end{bmatrix} = \begin{bmatrix} \frac{1}{3} \\ \frac{-1}{3} \end{bmatrix} \end {split}
Steepest descent continues to calculate \alpha_{1}, x^{2}, r_{2} ...
Conjugate gradient method:
Note that p_{0} = r_{0}
\begin {split} \beta _{1} &= \frac{p_{0}^{T}A r_{1} }{p_{0}^{T}A p_{0} } \\ p_{0}^{T}A r_{1} &= \begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} \frac{1}{3} \\ \frac{-1}{3} \end{bmatrix} = \begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} \frac{1}{3} \\ -\frac{2}{3} \end{bmatrix} = -\frac{1}{3} \\ p_{0}^{T}A p_{0} &= r_{0}^{T}A r_{0} =3\\ \beta _{1} &= -\frac{1}{9}\\ p_{1} &= r_{1} - \beta _{1} p_{0}\\ &= \begin{bmatrix} \frac{1}{3} \\ \frac{-1}{3} \end{bmatrix} - \frac{1}{9}\begin{bmatrix} 1 \\ 1 \end{bmatrix}\\ &=\begin{bmatrix} \frac{4}{9} \\ -\frac{2}{9} \end{bmatrix} \\ \alpha_{1}&=\frac{p_{1}^{T}r_{1}}{p_{1}^{T}Ap_{1}}\\ &=\frac{\begin{bmatrix} \frac{4}{9} & -\frac{2}{9} \end{bmatrix} \begin{bmatrix} \frac{1}{3} \\ \frac{-1}{3} \end{bmatrix} } { \begin{bmatrix} \frac{4}{9} & -\frac{2}{9} \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} \frac{4}{9} \\ -\frac{2}{9} \end{bmatrix} }\\ &=\frac{\frac{2}{9}} { \begin{bmatrix} \frac{4}{9} & -\frac{2}{9} \end{bmatrix} \begin{bmatrix} \frac{4}{9} \\ -\frac{4}{9} \end{bmatrix} }\\ &=\frac{\frac{2}{9}} { \frac{8}{27}}\\ &=\frac{3}{4}\\ x^{2} &= x^{1}+\alpha_{1} p_{1}\\ &= \begin{bmatrix} \frac{2}{3} \\ \frac{2}{3} \end{bmatrix} + \frac{3}{4} \begin{bmatrix} \frac{4}{9} \\ -\frac{2}{9} \end{bmatrix} \\ &= \begin{bmatrix} 1 \\ \frac{1}{2} \end{bmatrix} \end {split}
reached the solution.