Basic HTML version of Foils prepared April 22 2000

Foil 22 Basic Parallel FFT Algorithms I

From Parallel FFT and use in PDE Solvers Computational Science Class CPS615 -- Winter Semester 2000. by Geoffrey C. Fox


Traditionally algorithms are iterative, going across each phase in the tree starting with all the N/2 2 point FFT's at phase d-1 and ending with combining two N/2 point FFT's at phase 0.
  • However FFTW papers at their web site points out that most such implementations make poor use of cache; recursive version can be better in cache use and we will describe this in general later
We can decompose in many ways but here we take the simplest case of block decomposition illustrated for N=16 on next foil.
Let Nproc = 2P be the number of processors in the parallel computer
Consider the d digits in binary representation of m
  • The bottom d-P digits md-P-1 through m0 representation location inside processor labeled by top P digits md-1 through md-P
P digits labeling 2P processors
d-P digits labeling position
in processor



© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Mon Apr 24 2000