Tparallel (phases d-P? p< d) = N(Tcalc+Tcomm)/(Nproc) and again this is perfectly parallel (load balanced) |
Tcomm is time to swap one complex word between 2 processors (to reduce latency this would be done as a block transfer of N/Nproc words) |
Tcalc = T+ + T* as each processor must multiply and add/subtract. Note this implies that one loses parallelism in multiplication as both processors must do it. |
Communication overhead fcomm = Tparallel *Nproc/Tsequential -1 |