The three cooling schedules used each have a number of tunable parameters, such as the initial temperature, the rate of cooling and reheating, and the number of iterations at each temperature. We spent quite some time attempting to find optimal values for these parameters.
Specifying an initial temperature for annealing is usually a straightforward procedure. Infinite temperature means that the changes are always accepted (i.e. the acceptance ratio is 1), which produces random configurations. In practice the initial temperature should not be too large, or else the cooling will take too long, and the annealing will spend too much time at higher temperatures where little useful work is being done. The initial temperature is usually chosen to be the lowest possible value that still gives an acceptance ratio that is fairly close to 1. White [49] has suggested that a good value for the starting temperature is the standard deviation in the cost at infinite temperature.
Choosing an initial temperature is much more difficult when we use
a preprocessor to provide a good starting configuration,
rather than the usual random initial configuration. In that case,
we must be very careful that the initial temperature and the
acceptance ratio are not so high that the good initial configuration
is randomized to the point where is loses its usefulness. However, the
initial temperature must be high enough so that we can still make
effective changes in the configuration.
Figure 7 shows the results of various runs using different
initial temperatures for geometric cooling and for adaptive cooling with
reheating.
In both cases there is an optimal value of the initial temperature,
although using the optimal value appears to have much greater effect
in the latter case.
We have used this initial temperature value of ,
for all of our runs.
The acceptance ratio at this temperature value is approximately 0.6.
We also need to specify a criterion for stopping the annealing procedure, when it is believed that the solution is good enough that we do not expect to obtain much improvement by further cooling. We used a standard stopping criterion [29, 30], whereby the run was terminated when no moves had been accepted for five steps (a step is a full Markov chain at a single temperature).
The goal in choosing an annealing schedule is to reduce the temperature as
rapidly as possible while still keeping the system close to equilibrium at
each temperature. This is easy to do at high temperatures, where the system
equilibrates very rapidly, but near the phase transition the number of
iterations required to equilibrate increases dramatically. For an infinite
system it actually diverges at the phase transition, while for a finite system
it increases as a power of the specific heat near the phase transition,
and as a power of the system size at the phase transition
(this is referred to as ``critical slowing down'') [4, 39].
Johnson et al. [30] noted in their SA implementation for the
traveling salesman problem (TSP) that the number of iterations
(or the size of the Markov chain) at each temperature
needed to be at least proportional to the ``neighborhood'' size
in order to maintain a high-quality result. From our experiments
we found the same to be true for the scheduling problem, even though it is
very different from the TSP. In our case the neighborhood size can be taken
to be the number of classes.
A number of techniques have been suggested for dynamically determining when the system has reached equilibrium at a given temperature during annealing [27, 40]. However, for large systems such as we have used, and particularly for temperatures near or below the phase transition, which are the most crucial, it is extremely difficult to determine when the system has equilibrated. We have therefore adopted an empirical approach. The rate at which the temperature is decreased, and the number of iterations at each temperature, are closely interrelated. For example, if we halve the number of iterations at each temperature value, but use twice as many temperature values, the results will be very similar. We have attempted to find values for the combination of these two variables that keep the system close to equilibrium, and produce near-optimal results.
For geometric cooling, the temperature reduction factor
is usually chosen to be in the region 0.9 to 0.99.
We found only a small variation in the results for this parameter range,
and decided to arbitrarily fix
to be 0.95, and use
as
a variable with which to compare solution quality.
We found good results if we set
to be
the product of the number of classes (the system size) and the
variance of the cost values.
For adaptive cooling, we set to be the same as for geometric cooling,
and did a number of runs with different values of the
parameter a, which controls the rate of cooling (see equation 13).
Slower cooling (smaller a) should give better results, but there should
be a point at which further reduction of a does not markedly improve the
result. Our aim is to identify this point, i.e. the largest value of a
(the fastest cooling) which still gives near-optimal results.
Figure 8 shows the results of these runs, indicating a plateau
for a less than about 0.0001. For our adaptive cooling runs we used
a=0.00011.
Some time could probably be saved at high temperatures by specifying
using an adaptive approach based on one of the techniques for determining
when the system has reached equilibrium [27, 40]. However when we use a
preprocessor, we do not start at a very high temperature, so this would not
be a great reduction in the overall simulation time.
The adaptive cooling and reheating methods both rely on the measurement of the specific heat, which is shown in Figure 5. There is a clear maximum, indicating a phase transition. Reheating increases the temperature so that it is above the phase transition, which gives the system a better chance to change the structure of the schedule and thus escape local minima.
If the temperature increase used in reheating is very small or very large, the reheating should have little effect, so we expect to be able to find an optimal value somewhere in between. Figure 9 shows that there is indeed a clear optimum value of the parameter K, which controls the amount of reheating (see equation 12), and we have used this value in our reheating runs.
The cost as a function of temperature is shown in Figure 3
for individual annealing runs using geometric and adaptive cooling.
The results are fairly similar, apart from the expected fluctuations in the
region of the phase transition (maximum of the specific heat) at
, but on average the adaptive cooling gives better results, and is also faster.
Figure 4 shows the cost as a function of the number of steps (temperature values), this time for annealing runs using geometric cooling and adaptive cooling with reheating. Again, the two are quite similar, until we get towards the end of the runs, at low temperature. Here the cost using geometric cooling reaches a plateau, while the cost for adaptive cooling continues to decrease slightly, until the reheating kicks in, causing the procession of spikes. Note that the reheating phase results in a further reduction in the lowest cost, without substantially more computational effort.