Given by Geoffrey C. Fox at CPSP713 Case studies in Computational Science on Spring Semester 1996. Foils prepared 15 March 1996
Outside Index
Summary of Material
Physical Optimization applies a set of Optimization (minimization) methods motivated by physical processes to general optimization problems |
These include simulated annealing, neural networks, deterministic annealing, simulated tempering and genetic algorithms |
We look at general TSP, clustering in physical spaces, track finding, navigation, school class scheduling, Random field Ising Models and data decomposition and other computing optimization problems |
We discuss when methods such as neural networks are effective |
Outside Index Summary of Material
Geoffrey C. Fox |
Syracuse University |
315.443.2163 |
Third Keck Symposium on Computational Biology |
Houston, Texas - Nov. 1, 1992 |
Physical Optimization applies a set of Optimization (minimization) methods motivated by physical processes to general optimization problems |
These include simulated annealing, neural networks, deterministic annealing, simulated tempering and genetic algorithms |
We look at general TSP, clustering in physical spaces, track finding, navigation, school class scheduling, Random field Ising Models and data decomposition and other computing optimization problems |
We discuss when methods such as neural networks are effective |
Examples of Physical Computation
|
Physical (Pertaining to Nature) optimization and associated field
|
These can be compared with
|
Nature is often solving optimization problems
|
Physics laws can usually be formulated as variational (optimization) problems |
We can also - and indeed this should be the norm? - combine methods
|
What is the shape of the objective function?
|
Do you need the
|
Local minima in "physics" problems are closely correlated with true minimum |
"Computer Science" grand challenges in optimization might look different |
Different methods have different trade offs
|
Neural networks do simple things on large data sets and parallelize easily |
Expert systems do complex things on small data sets and parallelize with difficulty |
Combinatorial Optimization
|
Physical Optimization
|
As computers get more powerful, we need to solve larger problems and physical computation methods get more attractive
|
Illustrates that "computer science" benefits from broad intellectual base that includes biology and physics as well as mathematics and electrical engineering |
Minimize E = E(parameters y) |
y can be continuous or discrete, or a mix |
Introduce a fake temperature T and set b = 1/T
|
Probability of state y = exp(-bE)/Z where |
As b ® ¥, minimum E (ground state) dominates |
Find y(T) by minimizing Free Energy |
F = E - TS = -1/b log Z |
Annealing tracks global minima by initializing search at temperature T by minima found at temperature T + dT |
Movement at given temperature |
Global Minimum moving with decreasing temperature |
Simulated annealing: find y(T) by Monte Carlo as mean over configurations at that temperature |
Neural networks: y is discrete. Find y by mean field approximation |
Elastic net: y is discrete, but use improved mean field including some or all constraints |
Deterministic annealing: leave important y out of sum S. Find by simple iterative optimization |
The Test Problem (x=real center) |
For each data point x there is an energy Ex(j) for its association with the cluster Cj. The probability that x belongs to cluster Cj is: |
where Zx is the partition function |
and Fx is the free energy |
The total free energy is: |
Use The Squared Distance Energy |
Optimizing the free energy: |
yields the solution for the centers of the clusters: |
Note 1/b1/2 is distance scale at which clusters are observed
|
Isodata is "greedy" algorithm likely to find local minima |
More clusters Emerge as temperature is lowered and one looks at system with better resolution |
The clustering problem - like any good physical system - exhibits phase transitions as one lowers the temperature |
Can be viewed as constrained clustering which approach "derives" elastic net from deterministic annealing |
Classical Neural Network Approach to TSP
|
Elastic net (roughly)
|
Simic showed how neural network (Hopfield-Tank) and elastic net came from making different mean field approximations to the same physical free energy |
Review of TSP Application in neural variables |
Typical Hopfield Tank Energy Functions |
Typical Elastic Net Energy Functions |
Energy = Real Goal (S d)
|
Consider a physical system with this energy function at temperature T |
Minimize F = free energy = E - TS as T ® 0 |
Use "mean field" approximation to make annealing deterministic |
Hopfield-Tank Neural Network
|
Simic's Derivation of Elastic Net
|
Showing Temperature dependence of Path |
In elastic net use x ( i ) ( position in city space at time i ) |
Equivalently consider a multistate neuron hi taking values in space x |
Binary neurons ---> Ising model |
Multistate neurons ---> Potts model ( if discrete x ) or continuous spin model ( x continuous) |
Can be applied to single or multiple vehicle problem |
Note temperature T allows large changes at high T |
e.g. to jump over a large obstacle, |
T is again physical scale |
Neural Net variables hi(x,t) are redundant |
Elastic Net (string) variables xi(t) |
New effect: >1 (2 in fact) Lagrange multipliers |
exp[-b1 E1 -b2 E2] bi = bi(T) |
In this problem, goal (E1) easy to satisfy but constraints (E2) hard |
At high temperatures make E2 go away
|
(elastic string of zero length) is global minimum |
Decrease temperature and gradually "switch on" constraint such that as T-> 0, constraint is rigorously enforced
|
e.g. b1 = 1/T b2 = 1/T2 |
Neural Networks From Gurewitz-Rose-Fox |
Neural Networks From Gurewitz-Rose-Fox |
(Ghandi, Fox unpublished) |
Most work on navigation concentrates on exact methods (combinatorial optimization) for a few degrees of freedom |
Physical optimization allows very many vehicles to be navigated
|
But we must use elastic net - neural networks gave us right general idea, but too many constraints |
Hamiltonian (used) = Hamiltonianbasic (i.e. that given by Nature) + |
HC = Hamiltonian ( Constraints from experimental measurements or chemical knowledge) |
Then can either Evolve system by Monte Carlo |
or equivalently HC is minimized by Simulated Annealing |
Or one can Evolve system by Newton's Laws |
or equivalently HC is minimized by Deterministic Annealing |
General, Travelling Salesman, Quadratic Assignment
|
Scheduling
|
Track Finding
|
Robot Path Planning, Navigation
|
Character Recognition
|
Image Analysis
|
Clustering, Vector Quantization (coding), Electronic Packaging
|
A new method Simulated Tempering by Marinari ( Rome, Syracuse) and Parisi (Rome) |
Conventional Monte Carlo methods do not work well for Random Field Ising Model = RFIM |
Spins si live on 3D lattice |
si = +/-1 to be determined |
Fixed Magnetic Field |
hi = |h| ri |
|h| = 1 |
ri = +/- 1 with random probability 1/2 |
Add b to list of dynamical variables |
b chosen out of {bm} with probability given by
|
b1 Increasing |
b2 |
b3 |
b4 |
b5 |
Now system can move up in temperature and jump barrier |
Don't have to decide on a priori annealing schedule i.e. list of bm and computer time to be spent at each bm. |
Showing that multiple "global" minima are visited |
Showing that stuck in a local minim |
Schedule observations on planetary orbiter subject to physical constraints of power, which instruments near each other, etc. dynamically (new "moon" discovered ---> change schedule). Several hundred people devoted to this at JPL . |
Similar space telescope problem |
There is a major scheduling problem in shuttle operations |
Originally NASA hoped to turn shuttle around in ~2 weeks. Actually takes ~4 months |
In orbiter processing facility
|
Peterson and Soderberg, Lund applied physical optimization to scheduling smaller high school problems |
Currently Syracuse University spends ~3 weeks scheduling freshmen classes. (the time critical case) |
Variables are "multi-state" neurons x i |
Student (preferences) are the "quanta" of force acting on particles i. These particles move in space of ( classroom, time slot ). |
This is thesis topic of Saleh ElMohammed |
x i must be singled valued i.e. a (teacher, class) event occupies one spacetime slot |
x i = x j if i = j |
A given teacher can only teach one class at a time |
Each student should be allowed a "lunch class" if possible |
The different meetings of a given class should be spread over a week |
Each student's MWF and T Th classes should be roughly balanced |
Distance constraints between classrooms -- students and Professors should not have to run between classes |
Student must meet eligibility requirements for class |
Do not enroll athletes in courses that conflict with their sports practice (hard or soft constraints depending on sport) |
Balance enrollment in different sections of a given class |
Balance gender in Honors Seminar sections |
Enroll students in all parts of a course when linked e.g. recitation and lecture |
Satisfy student preferences |
Dynamical Systems
|
Statistical Systems
|
Are approaches consistent?
|
A hierarchy of mapping problems |
We would like to optimize the overall mapping |
Execution time - main focus of HPCC community? |
User happiness - main focus of software engineering community |
Various optimizations are possible at each stage of mapping. These can be (but usually are not) formulated as minimizing
|
This is a difficulty if constraint ensures correctness of calculation as in code generation phase of optimizing compiler for a digital computer. Then we must have E2=0 at T=0. |
This "correctness" issue might not be present in generation of "programs" for neural (as opposed to digital) supercomputers. |
We can study use of physical optimization in generation or execution of program |
Both Message Routing
|
Problem is to Find MAP in |
DISTRIBUTION (MAP) in FortranD/HPF (High Performance Fortran |
A user (compiler) generated data decomposition |
User (compiler) supplies |
What is to be distributed |
"Graph" - algorithmic connection between entities to be distributed |
These entities are:
|
Given: Data set and Algorithm / Solver |
Assumptions: Data Parallelism Loosely Synchronous Computation model Distributed-memory MIMD multiprocessor |
Definitions: Computation graph GC = (VC, EC) Multiprocessor graph GM = (VM, EM) |
MAP: VC ----> VM |
such that Execution time of parallel algorithm/solver is minimized |
Represent mapping configuration by MAP[ vi ] = pj |
Allocation problem as a graph isomorphism problem
|
Allocation problem as a quadratic assignment problem |
- |
Allocation problem as a minmax problem |
- |
Geometry based allocation approach
|
This optimization can also be thought of as finding ground state of a "physics" problem. The particles are "entities" (grid points) moving around in space with "# processors" discrete positions. |
Repulsive force - load balance |
Attractive force - communicate |
Decompose the geometric data structures in a specified number of subdomains (substructures) so that:
|
Allocate the subdomains to processors, so that:
|
Decouple (color) the processors so that:
|
from Chrisochoides |
Recursive Bisection, Neural Network, Simulated Annealing and Genetic Algorithms |
Here we compare time to do mapping -- previous page describes time to execute mapped code |
Parallel algorithms for
|
Large problems (the "real world") require multiscale algorithms just as multigrid is faster than Gauss Seidel for large P.D.E.'s (take --> log Nnode, d=2,3) |
Any approximate method can be dangerous when applied by
|
The published exact, simulated annealing (etc.) papers have been greatly improved but |
Each new heuristic for TSP shows how their method is better than published results which are not "state of the art" .... |
Again many say "simulated annealing" or "genetic algorithms" too slow for data decomposition |
Only true for some (naive) implementations |
Roy Williams at Caltech |
Neural Networks work well for NP-complete load balancing problem but fail for formally equivalent TSP. Why? |
Label data or processes by i = 1 ... M
|
The redundant choice of NM neural (binary) variables
|
Fails for same reason as for TSP |
The nonredundant choice (NO constraints in E) of Mlog2N neural variables
|
Þ Not all NP-complete problems are created equal |
Neural Networks work well for data decomposition as neural variables are natural nonredundant description |
In "analogous" TSP and navigation problems, constraints on redundant neural variable Þ elastic net (can view as a generalized neural net) better |
Why do all methods work so well for graph partitioning when computer scientists are taught to be terrified by such NP complete problems |
Advice on which algorithms and which machine to use |
Choose compiler transformations and strategy: loop unrolling, interchange - discrete set of choices at each of many program fragments |
Given transformation, find performance on given machine |
Static and Dynamic Decomposition |
Local Register Assignments, peephole optimizations, pipelining, etc. |
Given a bunch of measurements xm(t), find the best tracks given certain prejudice as to nature of particles or missiles |
Case I: I -> 20 Tracks
|
Case II: Very Many Tracks
|
Analogy to Travelling Salesperson Problem:
|
TRW have implemented this approach to tracking |
Neural network degrees of freedom independent of number of tracks! (Kalman filter gets difficult as number of tracks becomes large and unknown) |
Neural network has essentially unlimited parallelism |
Again neural networks "work" when these are natural degrees of freedom |
Case III: Intermediate Number of Tracks or |
Few Tracks with Dirty Data
|
A very "dirty" data set |
So not only do different problems need different methods but in a fixed problem - varying parameter values causes best method to change |
Tracking is like a multiple TSP with each track like a salesman |
The quality of solution needed depends on quality of data. Also as time advances, one can get new information and physical computation naturally allows time dependent data |
E.g. in related navigation problems, you may start with some information about terrain and update it with new sensory data |
Physical Optimization is a class of naturally parallel heuristics that solve "hard problems" quickly but approximately |
Monte Carlo or Deterministic |
Choice of variables is important |
No universally "good" method even in a given problem, different methods are appropriate for different parameter values |
Temperature controls a generalized multiscale approach
|