Consider N grid points in P processors with grain size n = N2/P |
Sequential Time T1 = 4N2 tfloat |
Parallel Time TP = 4 n tfloat + 4 ?n tcomm |
Speed up S = P (1 - 2/N)2 / (1 + tcomm/(?n tfloat) ) |
Both overheads decrease like 1/?n as n increases |
This ignores communication latency but is otherwise accurate |
Speed up is reduced from P by both overheads |
Load Imbalance Communication Overhead |