Basic HTML version of Foils prepared 20 October 1997

Foil 43 Compilers, Caches and Data Locality

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 97. by Geoffrey C. Fox


1 As owner's compute rule is obeyed, a good parallelizing compiler should be able to "automatically" find the "much better" algorithm as inverting loops and blocking are standard optimization strategies
2 Note that "much better" parallel algorithm is also correct sequential algorithm as naturally uses each j value N-1 times as j block fixed in cache and i values are cycled through
  • Now size of block J controlled by cache size and not processor memory size as in parallel case
  • Note each i value used J times, each j value N-1 times
3 General lesson is that amount of computation and amount of data re-use are as important as amount of communication

in Table To:


© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Fri Oct 2 1998