REFEREE'S REPORT Concurrency and Computation:Practice and Experience --------------------------------------------------------------------------- A: General Information Please return to: Geoffrey C. Fox Electronically Preferred fox@csit.fsu.edu Concurrency and Computation: Practice and Experience Computational Science and Information Technology Florida State University 400 Dirac Science Library Tallahassee Florida 32306-4130 Office FAX 850-644-0098 Office Phone 850-644-4587 but best is cell phone 3152546387 Please fill in Summary Conclusions (Sec. C) and details as appropriate in Secs. D, E and F. B: Refereeing Philosophy We encourage a broad range of readers and contributors. Please judge papers on their technical merit and separate comments on this from those on style and approach. Keep in mind the strong practical orientation that we are trying to give the journal. Note that the forms attached provide separate paper for comments that you wish only the editor to see and those that both the editor and author receive. Your identity will of course not be revealed to the author. C: Paper and Referee Metadata * Paper Number C518: * Date: 27 August 2001 * Paper Title: A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels * Author(s): Vinod Valsalam and Anthony Skjellum * Referee: Maurice Clint * Address: School of Computer Science The Queen's University of Belfast Belfast BT7 1NN N Ireland, UK e-mail: m.clint@qub.ac.uk tel: +44-(0)2890-274651 fax: +44-(0)2890-683890 Referee Recommendations. Please indicate overall recommendations here, and details in following sections. 2. accepted provided changes suggested are made D: Referee Comments (For Editor Only) Is there an editorial policy on the length of papers? If the paper needs to be shortened there is an opportunity for the authors to give less generous coverage to the background to their work - though this may impair slightly its smooth readability. The readability of the graphs would be improved through the use of colour. Does the journal support colour printing? E: Referee Comments (For Author and Editor) The paper addresses important issues arising in the context of developing efficient, flexible, portable mathematical library software for execution on machines having different architectural characteristics. The advent of grid computing has intensified interest in the areas addressed in the paper. The considerable strengh of the paper derives from the authors' successful attempt to integrate the recent work of others within a unified framework, to extend and refine their work and to demonstrate the effectiveness of the proposed approach to mathematical software development though an investigation of the implementation of matrix multiplication. The authors show deep familiarity with the important work of DS Wise et al., S Chatterjee et al. and J Rice. The paper has a pronounced practical orientation but the work rests on solid scientific foundations. The employment of polyalgorithms in the development of mathematical software is likely to become widespread and the paper does much to show the effectiveness of their use. The authors have established, empirically, that their three layer approach to the development of a matrix multiplication routine results in implementations which are competitive with, and often surpass in performance, the code generated by systems such as ATLAS: and there is scope for further improvement of their software. There are a a number of points which should be (briefly) addressed before publication: (a) A more precise description of the oscillating execution of loops on page 22 should be given - though the diagram in Figure 7 compensates somewhat for the woolliness. (b) For a GIVEN architecture how is the best multiplier decided upon? Is this by experiment using different algorithmic combinations and then deciding which is best for particular orders of matrices? One presumes that a user of the software would be unaware of the particular algorithm(s) used - the choice would be a function of the matrix dimensions only? (c) Is the general approach flexible enough to allow the selection of DIFFERENT algorithm combinations depending upon the availability of varying computing resources - on a massively parallel computer, say? (d) How does the work reported sit with respect to GRID COMPUTING where the mixture of resources allocated to an operation may be fluid during the execution of an applications program and where the architectures of the machines involved may me heterogeneous in nature? F: Presentation Changes (a) To shorten the paper the second paragraph on p 29 could be omitted; all except the second paragraph (on p 38) of section 8 could be left out; and the treatment of the work of others in the earlier sections of the paper could be trimmed. (b) In the discussion of 'peeling' in the last paragraph on p 18 change the matrix name to A and make the requisite changes for the matrix elements. (c) p 25 line 1: '... for full applicability' --> '...optimum performance' p 25 line 5: ' ...out of Strassen' --> '...from Strassen' p 28 section 6 line 9: '...poorly' --> '...less well' p 28 section 6 line 10: '...is the reverse' --> '...reversed' p 28 section 6 line 15: '...has slightly' --> '...have slightly' p 28 section 6 line 16: '...which offers' -->'...offering' p 28 section 6 last line: '..ensuing' --> '...arising' p 29 line 1: '..offers in constructing' --> '..offers for constructing' p 29 3rd last line: 'Any existing legacy...' --> 'Legacy....' p 30 section 7 line 4: '...accounted for in' --> '..included in' p 30 section 7 line 6: '...III machine, the details of which..' --> '...III machine. Details of these...' p 30 last line: '...which includes' --> ' ... which include' p 31 section 7.1 line 10 & 11:'..ordered blocks, which..' --> ' ..ordered blocks, the numbers of which..' p 32 line 9: '...climbing back up again..' --> '..climbing again..' p 32 line 12: '..mitigates' --> '..alleviates' p 33 section 7.2 line 1: 'Now the best..' --> 'The best...' p 33 footnote: '..algorithms as precisely' --> '..algorithms precisely' p 34 line 1: '..smoothes' --> '..smooths' p 36 line 5: '..are a function of' --> '..are functions of' p 36 section 7.3 line 6: '..performance steady' --> '..steady performance' (d) Readability would be significantly enhanced if the Figures showing performance results were in colour.