Subject: Re: C461:Emmerald: A Fast Matrix-Matrix Multiply Using Intel SIMD Technology
Resent-Date: Fri, 21 Jul 2000 10:36:24 -0400
Resent-From: Geoffrey Fox <fox@mailer.scri.fsu.edu>
Resent-To: Geoffrey Fox <gcfpc@csit.fsu.edu>
Date: Wed, 12 Jul 2000 15:44:00 +1000
From: Doug ABERDEEN <daa@discus.anu.edu.au>
To: fox@csit.fsu.edu

On Sat, Jun 17, 2000 at 12:51:40PM -0400, Geoffrey Fox(Concurrency) wrote:
> C461:Emmerald: A Fast Matrix-Matrix Multiply Using Intel SIMD Technology
>
> We would be happy to publish your paper if you addressed the changes
> suggested by the referee. I think this is quite easy!
> Please include a discussion of your changes and how they answer the
> referee in your resubmittal.

Please find the redrafted paper attached. I have discussed responses
to the referee's report below.

> C461 Referee Report
> -------------------
>
> This is a useful well written paper. I would suggest that the authors
> add a short discussion as to what other applications and chip architectures
> (e.g. Sun or ICM) would benefit from their techniques and if improvement factors
> would be equally impressive.

A new section (7) is now dedicated to a discussion of how to port the software
to other SIMD architectures, with a brief example claiming that the
Altivec (G4) instructions would yield better performance than
the Intel SSE instructions.

Section 6 which was previously Future Work, has been changed to
describe the work carried out since initial submission. It disucusses
an application of the work which performs distributed neural network
training on a 196 processor Beowulf cluster, achieving a price performance ratio
of USD$1 / MFlop/s (single precision).

> As a minor point, I would suggest the phrase "Intel SIMD Technology" used
> in title and abstract is obscure to most readers. It is clear from paper but

> title starts one thinking of iWARP i860 and other Intel adventures.
> Something like "optimal Pentium Floating Point" or equivalent would be clearer

The title has been changed to Emmerald : A Fast Matrix-Matrix Multiply
Using Intel's SSE Instructions.

This is perhaps not addressing the comment completely, but we feel
it's important to give some indication we are using the special
features of the processor, and the change should make it clear we
are using new instructions rather than some obscure new processor.

On a final note, some other sections have been changed to reflect
improvements in the performance achieved since inital submission,
mostly the results section.

Thanks!
Doug Aberdeen
--
-Doug  -- http://beaker.anu.edu.au, Ph:(02) 6279-8608, Fax:(02) 6279-8651
Good languages grow obsolete, a good algorithm is immortal.

    ---------------------------------------------------------------------
                          Name: matrixmult.ps.gz
                          Type: Postscript Document
                                (application/postscript)
   matrixmult.ps.gz   Encoding: base64
                   Description: C461 redraft: Emmerald: A Fast
                                Matrix-Matrix Multiply Using Intel's SSE
                                Instructions