This assignment asks you both to implement a parallel 3D fast fourier transform and to model your implementation.
Again, you'll be working in groups of two to three people. This assignment is due on (or by) Friday, March 21st (the last day before spring break). The TAs will then test and time the codes and the results of the contest will be announced after spring break.
A list of references which should be enough to get you started is provided here . The references listed range from extremely basic introductions to the fourier and fast fourier transforms to publically available codes for the FFT to recent papers on efficient parallelization of the FFT.
Check out a Java demo of a 1D Fourier Transform located here.
A.The inverse Fourier transform performs the frequency to time domain transformation. The Discrete Inverse FT is computed by (where h_k is the kth value in the time domain):
h_k = 1/N * SUM(H_n* e^(-2*pi*i*k*n/N)) where n varies from 0 to N-1
Go to the 1D Fourier Transform Java Demo. Clear all points to zero, and then place a single spike in the frequency domain (F(K) real plane at the frequency 1 (the second point, since the first point is the 0 frequency). Notice that the inverse of this single spike is a cos wave in the real plane and a negative sin wave in the imaginary plane. Given the formula above, return the symbolic answer for the real and imaginary time-domain functions as a function of k. (HINT: H_n is nonzero only when n == 1.) Note that e^(i*x) = cos(x) + i*sin(x). Assume that the amplitude of the spike is A.
B. What is the DFT of a constant function, e.g. y = 4 + 0*i , and Why? (You can use the Java demo as a guide.)
C. The formula for the 1D DFT is:
H_n = SUM(h_k*e^(2*pi*i*k*n/N)) where k varies from 0 to N-1. Using the Java Demo, zero everything and then manually draw a cosine wave with amplitude A in the real plane of the time domain.
Notice that this is transformed into two spikes. Why is this transformed into two spikes? What frequency does each spike represent? Are they the same or different frequencies? Given that the amplitude of the cosine wave you drew was A, what is the amplitude of each spike? Show your math. (HINT: It might be easier for you to think in terms of the inverse DFT, e.g. given these two spikes, what is the inverse transform in each of the imaginary and real planes?) Note that h_k in this example is simply (A*cos(2*pi*k/N)).
Check out the LogP paper located on the links page.
In order to estimate the running time of your 3D FFT Algorithm, you will have to create a model of the underlying network. One such model is the LogP model, which models the overhead, latency, and gap of short messages. There is an extension to LogP, called LogGP which models not only the cost for short messages but also the cost for large (bulk) messages as well.
After you have picked the language you are going to use for the 3D FFT implementation, start out by measuring the latency and bandwidth for the communication primitives in your language.
For Split-C,
For MPI
For HPF
For Sun Threads
This should give you a feel for the costs of the different communication primitives in your language which will help you:
The basic rules for the contest are as follows:
Turn in a description of the different parallel algorithms/implementations you considered as well as an explanation of why you chose the algorithm/implementation you did. Describe anything special your code does (for example, does it work on 3D blocks that aren't cubes?).
You should hand in a pointer to a directory that contains your code, or alternatively a pointer to a web page that contains links to your code (and your Makefile(s)).
We'd like the following format:
To build your code for timing only (no output is needed): make fast
In other words, your makefile should have a dependency for 'fast' which will end up building your code without any output.
To build your code for correctness: make
NOTE: The code should be the same. IOW - don't use two separate algorithms for performance and correctness. We just need an easy way to compile your code when we are testing for performance vs. correctness.
Your program should output the FFT data and/or timing information to stdout or a file. For example, your program should be able to run either via: a.out > fft.output or a.out fft.output
It's your choice - you don't have to support both options. Just document your code using a README file in the code directory telling us which method you've chosen, and any other environment variables that you need set, etc.
Using your performance assessment of the communication primitives, and a model of the speed of the processor, develop a model of the run time of your 3D FFT given N. Plot the estimated running time and the real running time for two different input sizes. What caused your model to behave incorrectly?