Performance of Simplest Parallel DIT FFT V
Alternatively we could send even members of fa to b and odd indexed entries in fb to a.
- We would take resultant vector members in processors a and b and combine them in pairs to get FFT components
Communication overhead fcomm = Tparallel *Nproc/Tsequential -1 is now given by
We have avoided load imbalance and halved the communication
compared to simple algorithm. In later foils we will find even better
methods that get rid of log2N term