Basic HTML version of Foils prepared April 22 2000

Foil 32 Sequential and Parallel Performance II

From Parallel FFT and use in PDE Solvers Computational Science Class CPS615 -- Winter Semester 2000. by Geoffrey C. Fox


This implies a reordering of computations, so that L digits are done at a time and one fixes storage indices d-1, d-2, d-3, .. L and calculates 2L point FFT on lower indices
  • This 2L point FFT can be calculated by iterating 2 point FFT's as in Cooley Tukey or by some custom FFT of this size as in FFTW
  • Clearly this makes excellent use of cache and each data point is used L times (multiplied by number of times used in basic single digit FFT -- 2.5 times on average per 1 floating point word)
So basic operation needed is one of reordering:
  • Move L digits from current positions to another set of L positions
  • Assume these are not overlapping
  • Time taken is 2(d-P).2(complex floats) . (1-2L) with one memory read and one write each
This re-ordering is also algorithm used to undo bit reversal if needed



© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Mon Apr 24 2000