1 | "Back of the envelope estimates" where you get good intuition for why the performance is what it is |
2 | and Precise Simulations |
3 | Communication Overhead in classical data parallel (edge over area) with NO memory hierarchy NO latency where dim is geometric dimension and tcomm and tfloat are |
4 | typical communication and computation times |
5 | tcomm |
6 | tfloat |
7 | (grain size) |
8 | 1/dim |
9 | a |