"Back of the envelope estimates" where you get good intuition for why the performance is what it is |
and Precise Simulations |
Communication Overhead in classical data parallel (edge over area) with NO memory hierarchy NO latency where dim is geometric dimension and tcomm and tfloat are |
typical communication and computation times |