Gradient searching algorithms

In order to apply these methods to a system Ax = b, A should be

  • Symmetric
  • Positive definite

For such cases, function f is defined such that,

\begin{equation}
f = \frac{1}{2} x^{T}Ax - b^{T}x+c
\end{equation}

Then the gradient of f

\begin{equation}
\nabla f = Ax-b
\end{equation}

Hence, if f is minimised, then \nabla f =0 and it will make sure tat Ax = b. It can be seen too in the other way around.

\begin{equation}
\begin{split}
\nabla f = Ax-b = -r \\ 
\text{  r is residual}
\end{split}
\end{equation}

Steepest descent method

Solve Ax = b where,

A = 
\begin{bmatrix}
   1 & 0 \\
   0 & 2
\end{bmatrix}
,
b = \begin{bmatrix}
   1  \\
   1 
\end{bmatrix}

Solution:

\begin {split}
f  & =\frac{1}{2} x^{T}Ax - b^{T}x+c\\
& =\frac{1}{2}
\begin{bmatrix}
   x_{1}  & x_{2}  
\end{bmatrix}
\begin{bmatrix}
   1 & 0 \\
   0 & 2
\end{bmatrix}
\begin{bmatrix}
   x_{1} \\ x_{2} 
\end{bmatrix}
- 
\begin{bmatrix}
   1 & 1 
\end{bmatrix}
\begin{bmatrix}
   x_{1} \\ x_{2} 
\end{bmatrix}
+c\\

&=
\frac{1}{2}
\begin{bmatrix}
   x_{1}  & x_{2}  
\end{bmatrix}
\begin{bmatrix}
   x_{1} \\ 2x_{2} 
\end{bmatrix}
-
\begin{bmatrix}
   x_{1}  + x_{2}  
\end{bmatrix}
+c\\
&=
\frac{1}{2}x_{1} ^{2}+x_{2} ^{2} - x_{1} - x_{2}+c\\
f=0 \rightarrow &\frac{1}{2}(x_{1} ^{2} - 2x_{1}+1)+(x_{2} ^{2} - 2x_{2}.\frac{1}{2}+(\frac{1}{2})^{2})+c - \frac{3}{4}=0\\
&\rightarrow  \frac{(x_{1} -1)^{2}}{2}+(x_{2}-1)^{2}=\frac{3}{4}-c \text{.  If }f \text{ passing through origin, c=0}\\
&\rightarrow  \frac{(x_{1} -1)^{2}}{(\sqrt{\frac{3}{2}})^{2}}+\frac{(x_{2} -\frac{1}{2})^{2}}{(\sqrt{\frac{3}{4}})^{2}} = 1\\
\end {split}

It is an ellipse with centre (1, \frac{1}{2}) and pass through origin. Select the initial guess as origin. x^{0} = (0, 0) and the solution is the centre of the ellipse which is x^{c} = (1, \frac{1}{2})

\begin{equation} 
x^{1} = x^{0}+\alpha_{0} r_{0}
\end{equation}

Where r_{0} is the residual which indicate the first search direction and \alpha_{0} is the search distance.

\begin{equation}
f(x^{1}) = \frac{1}{2} (x^{0}+\alpha_{0} r_{0})^{T}A(x^{0}+\alpha_{0} r_{0}) - b^{T}(x^{0}+\alpha_{0} r_{0})+c
\end{equation}

For f(x^{1}) to be minimum, its derivative with respect to \alpha_{0} shall be zero.

\frac{1}{2} r_{0}^{T}A(x^{0}+\alpha_{0} r_{0})+ \frac{1}{2}(x^{0}+\alpha_{0} r_{0})^{T}Ar_{0} - b^{T}r_{0}=0
\begin{equation}
\alpha_{0}r_{0}^{T}Ar_{0}+\frac{1}{2}r_{0}^{T}Ax_{0}+\frac{1}{2}x_{0}^{T}Ar_{0}- b^{T}r_{0}=0
\end{equation}

But we have

\begin {split}
x_{0}^{T}Ar_{0} & =(x_{0}^{T}Ar_{0})^{T}\\
& =  (Ar_{0})^{T}x_{0}^{T}\\
& =r_{0}^{T}A^{T}x_{0}\\
& =r_{0}^{T}Ax_{0}\\
(b^{T}r_{0})^{T}& =r_{0}^{T}b
\end {split}

using the above, (6) becomes,

\alpha_{0}r_{0}^{T}Ar_{0}+r_{0}^{T}\overbrace{(Ax_{0}-b)}^{-r_{0}}=0
\alpha_{0}r_{0}^{T}Ar_{0}+r_{0}^{T}{(Ax_{0}-b)}=0
\begin{equation}
\alpha_{0}=\frac{r_{0}^{T}r_{0}}{r_{0}^{T}Ar_{0}}
\end{equation}
\begin{split}
x^{1} = x^{0}+\alpha_{0} r_{0}\\
\alpha_{1}=\frac{r_{1}^{T}r_{1}}{r_{1}^{T}Ar_{1}}\\
x^{2} = x^{1}+\alpha_{1} r_{1}\\
.......................
\end{split}

The direction of r_{1}is perpendicular to r_{0}. Hence we will be moving in step like fashion to reach the final solution x_{c}.

Conjugate gradient method

p_{0}=r_{0}.

\begin{equation}
p_{1} = k (x_{c} - x^{1})
\end{equation}
\begin {split}

p_{1} & =k (x_{c} - x^{1})\\
A p_{1} & =k (Ax_{c} - Ax^{1})\\
 & =k (b - Ax^{1})\\
 & =k r^{1} \\

r_{0}^{T}A p_{1} & =k r_{0}^{T} r^{1} \\
 & =0\\
p_{0}^{T}A p_{1} & =0

\end {split}

p_{0} is A orthogonal to new direction p_{1}.

\begin{equation}
p_{0}^{T}A p_{1} =0
\end{equation}

Now we can write p_{1} as linear combination of r_{0} and r_{1}.

\begin{equation}
p_{1} = r_{1} - \beta _{1} p_{0}
\end{equation}

substituting (10) in (9),

\begin{equation}
\beta _{1} = \frac{p_{0}^{T}A r_{1} }{p_{0}^{T}A p_{0} }
\end{equation}
\begin{equation}
x^{2} = x^{1}+\alpha_{1} p_{1}
\end{equation}

What is \alpha_{1}?. It should be such that f is minimum.

\begin{equation}
f(x^{2}) = \frac{1}{2} (x^{1}+\alpha_{1} p_{1})^{T}A(x^{1}+\alpha_{1} p_{1}) - b^{T}(x^{1}+\alpha_{1} p_{1})+c
\end{equation}

equating the partial derivative of f with \alpha_{1}to zero

\frac{1}{2} p_{1}^{T}A(x^{1}+\alpha_{1} p_{1})+ \frac{1}{2}(x^{1}+\alpha_{1} p_{1})^{T}Ap_{1} - b^{T}p_{1}=0
\begin{equation}
\alpha_{1}p_{1}^{T}Ap_{1}+\frac{1}{2}p_{1}^{T}Ax_{1}+\frac{1}{2}x_{1}^{T}Ap_{1}- b^{T}p_{1}=0
\end{equation}
\alpha_{1}p_{1}^{T}Ap_{1}+p_{1}^{T}\overbrace{(Ax_{1}-b)}^{-r_{1}}=0
\begin{equation}
\alpha_{1}=\frac{p_{1}^{T}r_{1}}{p_{1}^{T}Ap_{1}}
\end{equation}

Continuing the example…

Taking x^{0} = (0,0)

\begin {split}

r_{0} & = b - A x^{0} = b = 
\begin{bmatrix}
   1  \\
   1 
\end{bmatrix} - 
\begin{bmatrix}
   1 & 0 \\
   0 & 2
\end{bmatrix}
\begin{bmatrix}
   0  \\
   0 
\end{bmatrix}  
= \begin{bmatrix}
   1  \\
   1 
\end{bmatrix} 
\\
r_{0}^{T}r_{0} & = 
\begin{bmatrix}
   1  &
   1 
\end{bmatrix} 
\begin{bmatrix}
   1  \\
   1 
\end{bmatrix}
=2\\
r_{0}^{T}Ar_{0} & = 
\begin{bmatrix}
   1  &
   1 
\end{bmatrix} 
\begin{bmatrix}
   1 & 0 \\
   0 & 2
\end{bmatrix}
\begin{bmatrix}
   1  \\
   1 
\end{bmatrix}
=3\\

\alpha_{0}& =\frac{r_{0}^{T}r_{0}}{r_{0}^{T}Ar_{0}}=\frac{2}{3}\\

x^{1} & = x^{0}+\alpha_{0} r_{0} = \begin{bmatrix}
   \frac{2}{3}  \\
   \frac{2}{3} 
\end{bmatrix} \\

r_{1} &= b - Ax^{1} =
 \begin{bmatrix}
   1  \\
   1 
\end{bmatrix}
-
\begin{bmatrix}
   1 & 0 \\
   0 & 2
\end{bmatrix}
\begin{bmatrix}
   \frac{2}{3}  \\
   \frac{2}{3} 
\end{bmatrix} =
\begin{bmatrix}
   \frac{1}{3}  \\
   \frac{-1}{3} 
\end{bmatrix} 

\end {split}

Steepest descent continues to calculate \alpha_{1}, x^{2}, r_{2} ...

Conjugate gradient method:

Note that p_{0} = r_{0}

\begin {split}
\beta _{1} &=  \frac{p_{0}^{T}A r_{1} }{p_{0}^{T}A p_{0} } \\
p_{0}^{T}A r_{1} &=  \begin{bmatrix}
   1  &
   1 
\end{bmatrix} 
\begin{bmatrix}
   1 & 0 \\
   0 & 2
\end{bmatrix}
\begin{bmatrix}
   \frac{1}{3}  \\
   \frac{-1}{3} 
\end{bmatrix} 
=
 \begin{bmatrix}
   1  &
   1 
\end{bmatrix} 
\begin{bmatrix}
   \frac{1}{3}  \\
   -\frac{2}{3} 
\end{bmatrix} 
=   -\frac{1}{3} 
\\
p_{0}^{T}A p_{0} &= r_{0}^{T}A r_{0} =3\\
\beta _{1} &= -\frac{1}{9}\\

p_{1} &= r_{1} - \beta _{1} p_{0}\\
&= \begin{bmatrix}
   \frac{1}{3}  \\
   \frac{-1}{3} 
\end{bmatrix} 
 -
\frac{1}{9}\begin{bmatrix}
   1  \\
   1 
\end{bmatrix}\\ 
&=\begin{bmatrix}
   \frac{4}{9}  \\
   -\frac{2}{9} 
\end{bmatrix} \\

\alpha_{1}&=\frac{p_{1}^{T}r_{1}}{p_{1}^{T}Ap_{1}}\\
&=\frac{\begin{bmatrix}
   \frac{4}{9}  &
   -\frac{2}{9} 
\end{bmatrix}  \begin{bmatrix}
   \frac{1}{3}  \\
   \frac{-1}{3} 
\end{bmatrix} }

{ \begin{bmatrix}
   \frac{4}{9}  &
   -\frac{2}{9} 
\end{bmatrix} 

\begin{bmatrix}
   1 & 0 \\
   0 & 2
\end{bmatrix}

 \begin{bmatrix}
   \frac{4}{9}  \\
   -\frac{2}{9} 
\end{bmatrix} }\\

&=\frac{\frac{2}{9}}

{ \begin{bmatrix}
   \frac{4}{9}  &
   -\frac{2}{9} 
\end{bmatrix} 

 \begin{bmatrix}
   \frac{4}{9}  \\
   -\frac{4}{9} 
\end{bmatrix} }\\


&=\frac{\frac{2}{9}}

{ \frac{8}{27}}\\

&=\frac{3}{4}\\
 
x^{2} &= x^{1}+\alpha_{1} p_{1}\\
&=  \begin{bmatrix}
   \frac{2}{3}  \\
   \frac{2}{3} 
\end{bmatrix}
+
\frac{3}{4} 
\begin{bmatrix}
   \frac{4}{9}  \\
   -\frac{2}{9} 
\end{bmatrix} \\
&=  \begin{bmatrix}
   1  \\
   \frac{1}{2} 
\end{bmatrix}

 

\end {split}

reached the solution.

Leave a Comment

Your email address will not be published.