Since the Metropolis algorithm for spin models is local and regular, it should parallelize very efficiently, even for the Ising model, which has very little computation.
For a 2-d grid of processors, use a (BLOCK,BLOCK) distribution of the sites of a 2-d lattice over the processor grid so that every processor has an sub-lattice .
Communication time
(# edge sites of sub-lattice,
i.e. perimeter).
Calculation time
(# sites of sub-lattice,
i.e. volume).
\
Thus, as long as l is large enough, the communication/calculation ratio will be small, and the efficiency will be near 1.