The performance of any application of simulated annealing is highly dependent on the method used to select a new trial configuration of the system for the Metropolis update. In order for the annealing algorithm to work well, it must use efficient update moves that can sample the parameter space effectively, and have a reasonably high probability of being accepted.
In our implementation, we have used a sophisticated rule-based system for choosing the moves, which is a modified version of the rule-based expert system used as the preprocessor for annealing, which is described in Section 3.2. Using the rule-based system ensures that all the trial moves satisfy the hard constraints, while the Metropolis update used in the annealing algorithm takes care of minimizing the costs of the medium and soft constraints.
One of the main modifications to the rule-based system is that while the version used in the preprocessor is completely deterministic, the version used in choosing the moves for annealing selects at random from multiple possibilities that satisfy the rules equally well. Also, some of the rules dealing with the medium and soft constraints are softened or eliminated here, since reducing the cost of these constraints is done using the annealing algorithm. This extra freedom in choosing new schedules, plus the extra degree of randomness inherent in the annealing update, helps prevent the system getting trapped in a local minimum before it can reach a valid schedule, which is the problem with the standard deterministic rule-based system.
For class scheduling, a move involves assigning a class to an unused timeslot and room, or swapping two classes in time (timeslots) and/or space (rooms). The simplest method for doing this is to choose the classes, timeslots and rooms at random. However this is impractical, since random allocation or swapping of classes will almost always violate one of the hard constraints or greatly increase the cost of the medium and soft constraints (especially if we already have a fairly good schedule), and will thus almost always be rejected in the Metropolis procedure. This simple method is therefore extremely inefficient, since a lot of computation is required to compute the change in cost and do the Metropolis step, only to reject the move in the vast majority of cases.
What is needed is a strategy for choosing moves that are more likely to be accepted. A simple example is in the choice of room. If a new room is randomly chosen from the list of all rooms, it will most likely be rejected, since it may be too small for the class, or an auditorium when, for example, a laboratory is needed. A more effective strategy is to create a subset of all the rooms which fulfill the hard constraints on the room for that particular class, such as the size and type of room. A candidate new room for that class can then be chosen from this subset of feasible rooms, and the acceptance probability will be much higher, saving time that would otherwise be wasted on moves that would just have been rejected by the Metropolis algorithm. This is precisely what is done by our rule-based system (see Section 3.2), although rather than choosing at random from a list of feasible rooms that obey the hard constraints, it goes a step further and attempts to chose the ``best'' available room, for example by selecting the room that most closely matches the size requirement, and preferentially selecting rooms from the home building of the department.
Using the rule-based system to select the moves means that the classes to be moved are chosen not at random, but following the same rules used for the preprocessor, for example, unscheduled classes are handled first, and classes are selected in order based on size. Since the classes are not chosen at random, it is very unlikely that the detailed balance criterion is satisfied for the Metropolis updates, which means that the configurations produced at each temperature will not have a Boltzmann probability distribution [4, 39]. However we will assume that the actual distribution approximates a Boltzmann distribution well enough that certain standard annealing techniques that assume a Boltzmann distribution, such as adaptive cooling, will still be effective.
Many of the rules used in the rule-based system address the medium and soft constraints, in an effort to reduce the overall cost. In some cases these are relaxed in choosing trial moves for annealing, in order to allow a wider selection of moves. However most of them are still included (although sometimes in modified form), since they generally improve the acceptance probability. For example, swapping a higher level class (e.g. graduate) with a lower level class (e.g. first or a second year) generally has a higher acceptance, since there is little overlap between students taking these classes. We have done a few experiments to test modifications of these ``soft'' rules to try to improve the acceptance and the final results. For example, we found that at high temperatures there is not much difference in the acceptance rate between local (within the same school) and global (between different schools) class swaps, and that most swaps are accepted. At low temperatures the acceptance is much lower, but about the same for local and global swaps. In the mid-range temperatures, local swaps have a slightly higher acceptance rate. Thus the rule of first attempting local class swaps, then global swaps, works well with annealing.
Generally, the rules used to choose the moves can be considered as heuristics for pruning the neighborhood or narrowing the search space, which provides much more efficient moves, as demonstrated by the good acceptance rates as shown in Figure 6.