Using Gordon Erlebacher's nice summary, i find it rather suprising that Compaq
compute rate is most impressive compared to IBM/SGI on real applications like Climate
and Sweep3D or BT/SP NAS benchmarks. Here IBM/Compaq is around 2
On kernels like Livermore and earlier NAS benchmarks, IBM is only 20% slower than
Compaq. Reason presumably is that multiple floating units on IBM chip are
more effective on classic compute kernels. 
I do not understand why IBM does so well on throughput test -- I don't know what codes this test involved

If you look at benchmarks, IBM improves compared to Compaq as you increase number of nodes.
This is presumably because Compaq wins by more on CPU capability than on switch
We quantify this using following formula valid for classic parallelization (using MPI)
of algorithms with "no algorithmic parallel overhead". I believe NAS benchmark suite
and most PDE algorithms are of this type.
T(P CPUs) = A*TCALC/P + B*TCOMM/(P**(1-1/d))
Here A and B depend on Problem and Problem Size but are independent of P (machine size) and
nature of machine
TCALC is typical CPU Speed (which has obvious manufacturer dependence and some problem dependence
	as depends on how well cache used etc.)
TCOMM is typical Communication time (which has obvious manufacturer dependence and some problem 
	dependence as depends on message size and latency/bandwidth trade-offs)
d is "dimension of decomposition" and is typically 1 2 or 3

Taking "COMM Value" = 	T(P2) - T(P1)*P1/P2	with P2 > P1, we find this cancels TCALC and is
			TCOMM * B *( (1/P2)**(1-1/d)) (1- (P1/P2)**1/d))
The P1 and P2 dependence of "COMM Value" depends on assumption that communication is an edge effect
but all that matters here is that "COMM Value" only relects TCOMM and cancels TCALC due to
assumed perfect algorithmic parallization. Further B *( (1/P2)**(1-1/d)) (1- (P1/P2)**1/d))
is machine independent. Thus we can use ratio of "COMM Value"'s  as a measure of ratio of realized 
communication performance.
Below we take lowest available P1 >=4 to avoid intranode parallization on 4 way SMP's for IBM/Compaq

Looking at incomplete compilation of "COMM values", we see that benchmarks imply that
on switch performance IBM and Compaq are competitive and SGI is mediocre

If you scale to final machine, I think it is worrying for Compaq that they are not improving
their switch performance as it is easy to see on say Sweep3D that communication cost dominates
calculation cost for 128 nodes. I therefore think Compaq is incorrect in extrapolating their
results to "final configuration" as a simple factor. The results will have very different and
poorer scaling.


COMP Values		Compaq	IBM	IBM/Compaq	SGI	SGI/Compag
64**3 Sweep3D 4 CPU	4.5	10.5	2.3		7.8	1.7
128**3 Sweep3D 4 CPU	39	95	2.4		69.6	1.8
256**3 Sweep3D 4 CPU	309	827	2.7		563	1.8
Inverse Livermore 	
	Harmonic	1		1.24			1.21
	Geometric	1		1.2			1.10





COMM Value secs		Compaq		IBM		SGI
64 v. 32 CPUs		497		695		1042
Climate
128 v. 32 CPUs Aero	14		32		38
64**3 Sweep3D 4 CPU	0.46		0.47		2.9
compared to 128 CPUs
128**3 Sweep3D 4 CPU	1.08		0.84		4.2
compared to 128 CPUs
256**3 Sweep3D 4 CPU	4.9		3.2		9.4
compared to 128 CPUs
512**3 Sweep3D 16 CPU	24		34		59
compared to 128 CPUs