======================================================================
=                        Simulated annealing                         =
======================================================================

                             Introduction
======================================================================
Simulated annealing (SA) is a probabilistic technique for
approximating the global optimum of a given function. Specifically, it
is a metaheuristic to approximate global optimization in a large
search space for an optimization problem. It is often used when the
search space is discrete (e.g., the traveling salesman problem). For
problems where finding an approximate global optimum is more important
than finding a precise local optimum in a fixed amount of time,
simulated annealing may be preferable to alternatives such as gradient
descent.

The name and inspiration come from annealing in metallurgy, a
technique involving heating and controlled cooling of a material to
increase the size of its crystals and reduce their defects. Both are
attributes of the material that depend on its thermodynamic free
energy. Heating and cooling the material affects both the temperature
and the thermodynamic free energy.
The simulation of annealing can be used to find an approximation of a
global minimum for a function with many variables.

This notion of slow cooling implemented in the simulated annealing
algorithm is interpreted as a slow decrease in the probability of
accepting worse solutions as the solution space is explored. Accepting
worse solutions is a fundamental property of metaheuristics because it
allows for a more extensive search for the global optimal solution. In
general, the simulated annealing algorithms work as follows. At each
time step, the algorithm randomly selects a solution close to the
current one, measures its quality, and then decides to move to it or
to stay with the current solution based on either one of two
probabilities between which it chooses on the basis of the fact that
the new solution is better or worse than the current one. During the
search, the temperature is progressively decreased from an initial
positive value to zero and affects the two probabilities: at each
step, the probability of moving to a better new solution is either
kept to 1 or is changed towards a positive value; on the other hand,
the probability of moving to a worse new solution is progressively
changed towards zero.

The simulation can be performed either by a solution of kinetic
equations for density functions or by using the stochastic sampling
method. The method is an adaptation of the Metropolis-Hastings
algorithm, a Monte Carlo method to generate sample states of a
thermodynamic system, published by N. Metropolis et al. in 1953.


                               Overview
======================================================================
The state of some physical systems, and the function 'E'('s') to be
minimized, is analogous to the internal energy of the system in that
state. The goal is to bring the system, from an arbitrary 'initial
state', to a state with the minimum possible energy.


 The basic iteration
=====================
At each step, the simulated annealing heuristic considers some
neighboring state 's*' of the current state 's', and probabilistically
decides between moving the system to state 's*' or staying in-state
's'.  These probabilities ultimately lead the system to move to states
of lower energy.  Typically this step is repeated until the system
reaches a state that is good enough for the application, or until a
given computation budget has been exhausted.


 The neighbours of a state
===========================
Optimization of a solution involves evaluating the neighbours of a
state of the problem, which are new states produced through
conservatively altering a given state. For example, in the travelling
salesman problem each state is typically defined as a permutation of
the cities to be visited, and its neighbours are the set of
permutations produced by reversing the order of any two successive
cities. The well-defined way in which the states are altered to
produce neighbouring states is called a "move", and different moves
give different sets of neighbouring states. These moves usually result
in minimal alterations of the last state, in an attempt to
progressively improve the solution through iteratively improving its
parts (such as the city connections in the traveling salesman
problem).

Simple heuristics like hill climbing, which move by finding better
neighbour after better neighbour and stop when they have reached a
solution which has no neighbours that are better solutions, cannot
guarantee to lead to any of the existing better solutions their
outcome may easily be just a local optimum, while the actual best
solution would be a global optimum that could be different.
Metaheuristics use the neighbours of a solution as a way to explore
the solutions space, and although they prefer better neighbours, they
also accept worse neighbours in order to avoid getting stuck in local
optima; they can find the global optimum if run for a long enough
amount of time.


 Acceptance probabilities
==========================
The probability of making the transition from the current state s to a
candidate new state s' is specified by an 'acceptance probability
function' P(e, e', T), that depends on the energies e = E(s) and e' =
E(s') of the two states, and on a global time-varying parameter T
called the 'temperature'.  States with a smaller energy are better
than those with a greater energy.  The probability function P must be
positive even when e' is greater than e.  This feature prevents the
method from becoming stuck at a local minimum that is worse than the
global one.

When T tends to zero, the probability P(e, e', T) must tend to zero if
e' > e and to a positive value otherwise. For sufficiently small
values of T, the system will then increasingly favor moves that go
"downhill" (i.e., to lower energy values), and avoid those that go
"uphill."  With T=0 the procedure reduces to the greedy algorithm,
which makes only the downhill transitions.

In the original description of simulated annealing, the probability