Subject: Re: C461:Emmerald: A Fast Matrix-Matrix Multiply Using Intel SIMD Technology Resent-Date: Fri, 21 Jul 2000 10:36:24 -0400 Resent-From: Geoffrey Fox Resent-To: Geoffrey Fox Date: Wed, 12 Jul 2000 15:44:00 +1000 From: Doug ABERDEEN To: fox@csit.fsu.edu On Sat, Jun 17, 2000 at 12:51:40PM -0400, Geoffrey Fox(Concurrency) wrote: > C461:Emmerald: A Fast Matrix-Matrix Multiply Using Intel SIMD Technology > > We would be happy to publish your paper if you addressed the changes > suggested by the referee. I think this is quite easy! > Please include a discussion of your changes and how they answer the > referee in your resubmittal. Please find the redrafted paper attached. I have discussed responses to the referee's report below. > C461 Referee Report > ------------------- > > This is a useful well written paper. I would suggest that the authors > add a short discussion as to what other applications and chip architectures > (e.g. Sun or ICM) would benefit from their techniques and if improvement factors > would be equally impressive. A new section (7) is now dedicated to a discussion of how to port the software to other SIMD architectures, with a brief example claiming that the Altivec (G4) instructions would yield better performance than the Intel SSE instructions. Section 6 which was previously Future Work, has been changed to describe the work carried out since initial submission. It disucusses an application of the work which performs distributed neural network training on a 196 processor Beowulf cluster, achieving a price performance ratio of USD$1 / MFlop/s (single precision). > As a minor point, I would suggest the phrase "Intel SIMD Technology" used > in title and abstract is obscure to most readers. It is clear from paper but > title starts one thinking of iWARP i860 and other Intel adventures. > Something like "optimal Pentium Floating Point" or equivalent would be clearer The title has been changed to Emmerald : A Fast Matrix-Matrix Multiply Using Intel's SSE Instructions. This is perhaps not addressing the comment completely, but we feel it's important to give some indication we are using the special features of the processor, and the change should make it clear we are using new instructions rather than some obscure new processor. On a final note, some other sections have been changed to reflect improvements in the performance achieved since inital submission, mostly the results section. Thanks! Doug Aberdeen -- -Doug -- http://beaker.anu.edu.au, Ph:(02) 6279-8608, Fax:(02) 6279-8651 Good languages grow obsolete, a good algorithm is immortal. --------------------------------------------------------------------- Name: matrixmult.ps.gz Type: Postscript Document (application/postscript) matrixmult.ps.gz Encoding: base64 Description: C461 redraft: Emmerald: A Fast Matrix-Matrix Multiply Using Intel's SSE Instructions