Methods for General Optimization

Genetic Optimization Algorithms

One possible method that seems to be popular but has its own limitations is genetic optimization algorithms (GA, for short). Here is a pseudo code of the general schema:

Generate a population of k starting solutions S = {S₁, ..., S_k}.
Apply a given local optimization algorithm A to each solution S_i of S, letting the resulting locally optimal solution replace S_i in S.
While not yet converged do the following:

Select K^' distinct subsets of S of size 1 or 2 as parents (the mating strategy).
For each 1-element subset, perform a randomized mutation operation to obtain a new solution.
For each 2-element subset, perform a (possibly randomized) crossover operation to obtain a new solution that reflects aspects of both parents.
Apply local optimization algorithm A to each of the K^' solutions produced in Step 3.3, and let S^' be the set of resulting solutions.
Using a selection strategy, choose K survivors from (S union S^'), and replace the contents of S by these survivors.

Return the best solution in S.

Figure 1.1: Genetic Optimization Algorithm

Analysis

The loop of Step 3 can be viewed as the processing of a single generation in the evolutionary process. One salient or maybe not so salient feature is that the algorithm (Figure 1.1) shows that the operations on different solutions can be performed in parallel if desired, so this is sometimes called the parallel genetic algorithm --see also part B of the answer.

The above schema of Figure 1.1 leaves a number of things un-specified so it can be adapted to a number of global optimization problems, such as TSP and Scheduling. For example, an adaption to the TSP, one needs to specify: K and K^', the methods for generating starting solutions (tours), the local optimization algorithm A, the mating strategy, the nature of the crossover and mutation operators, the selection strategy, and the criterion for convergence.

Another feature is that the application of local optimization to the individual solutions in Steps 2 and 3.4 could be viewed as almost heretical addition! Well, in the context of the original biological motivation for the genetic approach, it embodies the discredited Lamarckian principle that learned traits can be inherited. Nonetheless, such local optimization steps appear to be essential if competitive results are to be obtained for hard problems like TSP. Therefore, for general optimization problems, I'll assume that genetic algorithms of Figure 1.1 use Steps 2 and 3.4.

Even without the local optimization steps of Figure 1.1, a genetic algorithm can be properly classified as a variant on local search. Implicit is the neighborhood structure in which the neighbors of a solution are those solutions that can be reached by a single mutation or mating operation. With the local optimization steps, the schema can also be viewed simply as variant on a best-of-k-runs approach to local optimization. For the TSP problem using the approach discussed here, instead of independent random starts, we use the genetically motivated operations to construct what we hope will be better starting tours, tours that incorporate knowledge we have obtained from previous runs. This latter way of viewing the solution turns out to be quite productive when dealing with optimization problems, like the TSP.

In general, the main disadvantage with GAs is that without any mathematical rigor, it is quite hard to understand why the method should work at all let alone trying to understand any aspect of natural evolution. Perhaps, the main the difficulty lies in the fact that the algorithms combine two different search strategies, a random search by mutation and a biased search by recombination of the strings contained in the population. Another big problem to overcome is that applying genetic algorithms to general combinatorial optimization problems leads to what is called "genetic representation problem". This means that a representation of the problem has to be found in order for mutation and recombination to create feasible offspring with high probability.

Tabu Search

Tabu Search (TS, for short). Like simulated annealing and other techniques, it is motivated by the observation that not all locally optimal solutions need be good solutions. Thus it might be desirable to modify a pure local optimization algorithm by providing some mechanism that helps us escape local optima and continue the search further. One such mechanism would be simply to perform repeated runs of a local optimization algorithm, using a randomized starting heuristics to provide different starting solutions.

One reason for the limited effectiveness of the random restarts policy is that it does not exploit the possibility that locally optimal solutions might cluster together, that is, for any given local optimum, a better one might will be nearby. If this were true, it would be better to restart the search close to the solution just found, rather than at a randomly chosen location. This is in essence what TS does.

The general strategy is always to make the best move found, even if that move makes the current solution worse, i.e., is an uphill move. Thus, assuming that all neighbors of the current solution are examined at each step, TS alternates between looking for a local optimum and , once one has been found, identifying the best neighboring solution, which is then used as the starting point for a new local optimization phase. If one did just this, however, there would be a substantial danger that the best move from this `best neighboring solution' would take us right back to the local optimum we just left or to some other recently visited solution. This is where the tabu in TS comes in. Information about the most recently made moves is kept in one or more tabu lists, and this information is used to disqualify new moves that would undo the work of those recent moves.

Before outlining the pseudo code of the algorithm, we need to specify the following basic ingredients:

X: the set of feasible solutions;
f: the objective function;
N(S): the neighborhood of a solution S in X;
|T|: the size of the tabu list;
|V^*|: the number of neighbor solutions generated at each iteration;
f^*: a lower bound for the objective function f;
A: the aspiration function;
nbmax: the maximum number of iterations between two improvements of the best solution.

The tabu list T of length |T| = k (fixed or variable) which is used as a queue: whenever a move from a solution S to S^* is made, S is added at the end of T and remove the oldest solution from T. Now all moves back to S are now forbidden for the next |T| iterations; S is a tabu solution and any move going back to S is a tabu move. An aspiration function A(Z) is defined for every value Z of the objective function. This function determines when a move is admissible despite being on the tabu list. So, if a move to a neighbor solution S_i is a tabu move but gives f(S_i) =< A(Z = f(S)) then the tabu status of this move is dropped and S_i is considered a normal member of the sample V^*. One termination criterion for the tabu procedure is a limit in the number of consecutive moves for which no improvement occurs.

Initialization

S = initial solution in X;
nbiter = 0; // current iteration.
bestiter = 0; // iteration when the best solution has been found.
bestsol = S; // best solution.
T = empty;
initialize the aspiration function A;

while (f(S) > f^*) and ((nbiter - bestiter) < nbmax) do

nbiter = nbiter + 1;
generate a set V^* of solutions S_i in N(S) which are either not tabu or such that A(f(S)) >= f(S_i);
choose a solution S^* minimizing f over V^*;
update the aspiration function A and the tabu list T;
if f(S^*) < f(bestsol) then bestsol = S^*; bestiter = nbiter;
S = S^*;

Figure 1.2 The general tabu search method

The use of aspiration function is part of aspiration level conditions used by TS, which provide exceptions to the general tabu rules, typically in situations where there is some guarantee that the supposedly forbidden move will not take us back to a previously seen solution. There are also diversification rules, which can provide something like a random restart, intensification rules, which constrain the search to remain in the vicinity of a previously found good solution, and a host of other rules and conditions for dynamically modifying the underlying neighborhood structure.

Some features, advantages and disadvantages of the algorithm:

It is usually referred to as a metaheuristic which controls an inner heuristic designed for the specific problem that is to be solved.
From artificial intelligence (where its roots are), it maintains a history of the search in a number of memories.
The basic principle used: allow non-improving moves to overcome local optimal (i.e. keep transforming the current situation ...).
It avoids cycling by keeping a history of the searching process and prohibits "comebacks" to previous solutions (tabu moves).
The search is conducted in a search space made up of all possible solutions considered (defining this space is part of the algorithmic design of any local improvement method).
At each iteration, the local transformations allowed by the specific "inner" heuristic define a neighborhood N(S) for the current solution S. This neighborhood is a set of solutions obtained by applying a single local transformation to S.
In theory, the local search could go on for ever (unless the optimal value of the problem is known beforehand). In practice, the search has to be stopped at some point.
Instead of dealing with the entire search space, consider probabilistic TS, by choosing a random sample N^'(S) as a subset of N(S). There are advantages to this:
- In most application, a smaller computational effort, since one only evaluates the objective S^' in N^'(S).
- Random choice of N^'(S) acts as an anti-cycling choice, hence shorter tabu lists can be used.
The main disadvantage is that the best solution may be missed.

Parallelizing Genetic Algorithms

A genetic algorithm is a parallel random search with centralized control. The centralized part is the selection scheme. The selection needs the average fitness of the population. The result is a highly synchronized algorithm, which can be quite difficult to implement efficiently on parallel computers. To obtain a reasonable degree of parallelism, there is very little other choice but to use a distributed selection scheme. This is achieved as follows: Each individual does the selection by itself. It looks for a partner in its neighborhood only, and the set of neighborhoods defines a spatial population structure.

The following is an outline of a parallel genetic algorithm (PGA) adapting the distributed scheme.

Define a genetic representation of the problem.
Create and initial population and its population structure.
Each individual does local hill climbing.
Each individual selects a partner for mating in its neighborhood.
An offspring is created using genetic operators like recombination and mutation.
The offspring does local hill climbing. It replaces the parent if it is better than some criterion.
If not finished, return to step 4.

Figure 1.3 Parallel Genetic Algorithm

Note that each individual may use a different local hill-climbing method. This is really quite an important feature for problems where the efficiency of a particular hill-climbing method depends on the problem instance. In the above algorithm (Figure 1.3), the information exchange within the whole population is a diffusion process because the neighborhoods of the individuals overlap. Also, all decisions are made by the individuals themselves. Figure 1.3 algorithm is totally distributed without any central control. It models the natural evolution process, which organizes itself.

As we mentioned in the previous section, the algorithm of Figure 1.1 is also a parallel genetic algorithm. If an efficient implementation is desired on a parallel hardware, one needs to pay extra attention to the selection scheme to avoid or minimized overheads, etc. As we can see, it seems the only times during the algorithm where serial operation is required are between generations, and the selection procedure.

The capacity to separate sub-populations (for separate processing) leads to a number of possible population models such as:

Essentially independent populations with occasional copying ("broadcasting") of best solutions to other populations...
Multiple `community' based model using object-oriented ideas to do without any of the serial requirements...

Parallelizing TS

For Tabu Search, there are two kinds of parallelism:

Low-level parallelism: that is, using parallel processing to speed up computationally demanding steps of the "standard" TS.
High-level parallelism: run several search threads in parallel to obtain more information and come up with better solutions. (parallel search threads can also be used in sequential architecture)

An example of low-level parallelism is to take the neighborhood and divide it into p parts of approximately the same size. Then, each part is distributed for evaluation to one of p different processors. Using a number of processors proportional to the size of the problem (p = n/10), we can reduce complexity by a factor of n, where n is the size of the problem. This can lead to improvement in the quality of solutions for large problems.

Another parallel approach (high-level) to this method is to perform independent searches, each of which starts with a different initial solution.

To achieve any significant reduction in computational effort during each iteration, one needs to employ an aspiration criterion of dynamically changing the sizes of tabu list and long-term memory. Also, the intensification strategy need to be based on the intermediate-term rather than long-term memory, restricting the searches in a neighborhood.

Saleh Elmohamed