This will only depend on 3 parameters |
n which is grain size -- amount of problem stored on each processor (bounded by local memory) |
tfloat which is typical time to do one calculation on one node |
tcomm which is typical time to communicate one word between two nodes |
Most importance omission here is communication latency |
Time to communicate = tlatency+ (Num Words)tcomm |
Node A |
Node B |
tcomm |
CPU tfloat |
CPU tfloat |
Memory n |
Memory n |