Using Gordon Erlebacher's nice summary, i find it rather suprising that Compaq compute rate is most impressive compared to IBM/SGI on real applications like Climate and Sweep3D or BT/SP NAS benchmarks. Here IBM/Compaq is around 2 On kernels like Livermore and earlier NAS benchmarks, IBM is only 20% slower than Compaq. Reason presumably is that multiple floating units on IBM chip are more effective on classic compute kernels. I do not understand why IBM does so well on throughput test -- I don't know what codes this test involved If you look at benchmarks, IBM improves compared to Compaq as you increase number of nodes. This is presumably because Compaq wins by more on CPU capability than on switch We quantify this using following formula valid for classic parallelization (using MPI) of algorithms with "no algorithmic parallel overhead". I believe NAS benchmark suite and most PDE algorithms are of this type. T(P CPUs) = A*TCALC/P + B*TCOMM/(P**(1-1/d)) Here A and B depend on Problem and Problem Size but are independent of P (machine size) and nature of machine TCALC is typical CPU Speed (which has obvious manufacturer dependence and some problem dependence as depends on how well cache used etc.) TCOMM is typical Communication time (which has obvious manufacturer dependence and some problem dependence as depends on message size and latency/bandwidth trade-offs) d is "dimension of decomposition" and is typically 1 2 or 3 Taking "COMM Value" = T(P2) - T(P1)*P1/P2 with P2 > P1, we find this cancels TCALC and is TCOMM * B *( (1/P2)**(1-1/d)) (1- (P1/P2)**1/d)) The P1 and P2 dependence of "COMM Value" depends on assumption that communication is an edge effect but all that matters here is that "COMM Value" only relects TCOMM and cancels TCALC due to assumed perfect algorithmic parallization. Further B *( (1/P2)**(1-1/d)) (1- (P1/P2)**1/d)) is machine independent. Thus we can use ratio of "COMM Value"'s as a measure of ratio of realized communication performance. Below we take lowest available P1 >=4 to avoid intranode parallization on 4 way SMP's for IBM/Compaq Looking at incomplete compilation of "COMM values", we see that benchmarks imply that on switch performance IBM and Compaq are competitive and SGI is mediocre If you scale to final machine, I think it is worrying for Compaq that they are not improving their switch performance as it is easy to see on say Sweep3D that communication cost dominates calculation cost for 128 nodes. I therefore think Compaq is incorrect in extrapolating their results to "final configuration" as a simple factor. The results will have very different and poorer scaling. COMP Values Compaq IBM IBM/Compaq SGI SGI/Compag 64**3 Sweep3D 4 CPU 4.5 10.5 2.3 7.8 1.7 128**3 Sweep3D 4 CPU 39 95 2.4 69.6 1.8 256**3 Sweep3D 4 CPU 309 827 2.7 563 1.8 Inverse Livermore Harmonic 1 1.24 1.21 Geometric 1 1.2 1.10 COMM Value secs Compaq IBM SGI 64 v. 32 CPUs 497 695 1042 Climate 128 v. 32 CPUs Aero 14 32 38 64**3 Sweep3D 4 CPU 0.46 0.47 2.9 compared to 128 CPUs 128**3 Sweep3D 4 CPU 1.08 0.84 4.2 compared to 128 CPUs 256**3 Sweep3D 4 CPU 4.9 3.2 9.4 compared to 128 CPUs 512**3 Sweep3D 16 CPU 24 34 59 compared to 128 CPUs