Performance of Simplest Parallel DIT FFT I
As discussed we have two types of phases
The sequential time Tsequential (phase p) is identical for each phase p at NTbutterfly/2
Tparallel (phases 0? p? d-P-1) = NTbutterfly/(2Nproc) as perfectly parallel (load balanced)
At the remaining stages, one must communicate in each of computations where fE and fO (DIT) are exchanged between two processors. This must be done for every one of the N/Nproc points stored in each processor