Steepest descent

From DDWiki

Jump to: navigation, search


Definition: The "Steepest Descent Method" is a specific type of unconstrained multivariate optimization which determines the minima of a continuous function. This method analyzes the gradient of a specific function at a specific initial point, then determines the minima points of the function along that gradient direction, settles on the global minima along that gradient direction, and sets the next point to that global minima. The user then checks to see if the change in value from the initial point to the new point is within terminating criteria, and if not the method is repeated with the new point placing the initial point.

English Definition: The "Steepest Descent Method" takes a function of multiple variables that have no constraint on the range of variables and an initial point, calculates the direction where the function descends the fastest from that point, and finds all the points along that direction where the function does not descend anymore. The new point is compared to the original point and if the difference of function values of those points is small enough then we assume that the function has found a point of optimization (in this case a minima).

Contents

General Equation

The general equation for the steepest descent method is:

 \mathbf{x}_{i+1} = \mathbf{x}_{i} - \alpha_{i}(\nabla f(\mathbf{x}_{i})^{T})

where  \mathbf{x}_{i} is the current vector of independent variables  \mathbf{x} in the function  f(\mathbf{x}_{i})  ,  \nabla f(\mathbf{x}_{i})^{T} is the transpose of the gradient of the function  f(\mathbf{x}_{i})  , and αi is the stepsize for that specific point which defines how much we move in the direction of the gradient to approach the new point of independent variables  f(\mathbf{x}_{i+1})  .

Algorithm

Given:  f(\mathbf{x})  containing  \mathbf{x} independent variables AND  \mathbf{x}_{i} as the initial set of independent variables:

  1. Determine the gradient of the function  \nabla f(\mathbf{x}) and the specific gradient  \nabla f(\mathbf{x}_{i}) at point  \mathbf{x}_{i} .
  2. Plug point  \mathbf{x}_{i} and  \nabla f(\mathbf{x}_{i})^{T} into the general equation to solve for  \mathbf{x}_{i+1} in terms of αi.
  3. Redefine  f(\mathbf{x}_{i+1})  in terms of  \mathbf{x}_{i+1} to result in fi).
  4. Optimize fi) by setting f'(αi) = 0 to acheive critical values of αi.
  5. Compare fi) for each α value found in step 4 and select the global minima αi.
  6. Plug αi into the general equation and solve for  \mathbf{x}_{i+1} .
  7. If  |f(\mathbf{x}_{i+1}) - f(\mathbf{x}_{i})| < \epsilon , where ε is your termination criteria, then  \mathbf{x}_{i+1} is your local minima values. Otherwise set  f(\mathbf{x}_{i+1}) = f(\mathbf{x}_{i}) and repeat steps 1-7.

Example

Consider the following unconstrained optimization problem:

Minimize  f(x)=(x_2-x_1^2)^2+ \frac{1}{10}(1-x_1)^2

Image:Sd_example_eqns.gif

Convergence check: is  \quad |f(x_{1}) - f(x)_{0})| < \epsilon ?

In this case  \quad |f(x_{1}) - f(x)_{0})| =.0413 . In most cases ε is a very small value, usually ε=10-6, and so we have not reached termination criteria yet. Therefore we set  \quad f(x_{1}) = f(x)_{0}) and repeat steps 1-7 again.

After enough iterations we will approach the final solution, as shown in the graph:

Image:Steepest_descent.gif


Advantages and Disadvantages

The "Steepest Descent" method is a very effective method even with a simple algorithm. It can efficiently locate a minima with as few steps as necessary, and the availability of a variable stepsize allows us to move along the gradient without missing the local minima.

However, certain equations can cause problems for this method. If we are at a point with multiple minima and maxima along a gradient direction of equal value, we may not be able to definately determine the global minima. Also, depending on the value of ε the termination criteria may not settle on the local minima if the function decreases at a very small rate along our gradient direction. These problems can usually be fixed by altering the starting point and ε values.