On a coarse grained MIMD machine, there is a more efficient parallel Metropolis algorithm. This requires a checkerboard partitioning of the lattice. The data communication is done as large blocks rather than single values, by passing a block of edge data to neighboring processors after each red or black update.
This blocked communication greatly reduces the latency time for communication, thereby improving efficiency. It is also possible to overlap communications with computation, by computing edge values first, and sending them while the interior points are being computed.