1 | Tparallel (phases d-P? p< d) = N(Tcalc+Tcomm)/(Nproc) and again this is perfectly parallel (load balanced) |
2 | Tcomm is time to swap one complex word between 2 processors (to reduce latency this would be done as a block transfer of N/Nproc words) |
3 | Tcalc = T+ + T* as each processor must multiply and add/subtract. Note this implies that one loses parallelism in multiplication as both processors must do it. |
4 | Communication overhead fcomm = Tparallel *Nproc/Tsequential -1 |