1 | "Back of the envelope estimates" where you get good intuition for why the performance is what it is |
2 | and Precise Simulations |
3 | Communication Overhead in classical data parallel (edge over area) with NO memory hierarchy NO latency where dim is geometric dimension and tcomm and tfloat are |
4 | typical communication and computation times |