Subject: Re: Request to review a paper C506 From: Victor Eijkhout Date: Mon, 04 Jun 2001 11:54:09 -0400 To: fox@csit.fsu.edu > Jack Dongarra suggested that you could referee this paper from Japan for a Special is sue I > am preparing for Concurrency and Computation: Practice and Experience "Practice and experience". Ok, with that in mind ... Referee report on Performance Optimization of GeoFEM on Various Computer Architecture by Kazuo Minami This paper discusses various techniques that were applied to the GeoFEM code to improve its performance. The architectures alluded to in the title are the SX-4, VPP5000, SR8000, and the DEC Alpha. First of, the English of this paper is really bad. In many sentences the reader has to guess the meaning. Then, the author omits explanation of various relevant details. For one, there is mention of forward and backward substitution, but there is no explicit mention of what the preconditioner is. I presume a block Jacobi with an ILU local solve, but I would like to have an explicit statement. More importantly, the author never describes the shape of the problem domain, the discretisation used on it, and the exact data structure, all important factors for the understanding of the code transformations. Section 3.2 mentions the "cyclic multicolor on hyperplace/RCM", but this phrase is never explained. The author should delete the explanation of piplining (section 3.1) and replace it with a clear explanation of the data structure. Without this, the code fragments, starting with figure 1, are unintelligible. Figure 3 especially puzzles me. I do not recognise any kind of solver in this. The main content of this paper is the five "performance factors" shown at the end of section 3.6, and the code transformations used to satisfy them. However, there is insufficient explanation of how the transformations achieve this. By the way, in factor (4), "latency" should be "ratio", I think. The data structure transformation of section 3.7 might be interesting, but the author should first explain what the original data structure was, and devote more space to how and why the transformation is beneficial. The beginning of section 3.8 is a good example of the author's writing style. "Both (1) and (2) of performance factor [...] was implemented to the model coding of Fig.4." Apart from this being execrable English, it would be so much easier to understand this if the author would simply substitute what performance factors 1 and 2 and figure 4 are about. Now the reader has to leaf back and forth through the paper to understand this sort of comment. According to its caption, figure 12 is the code of "Model coding of direct access". The author should mention that the algorithm is the substitution part of the solver. One more general comment: performance tests are done on two vector pipeline machines and one superscalar chip. It looks to me as though all the transformations are inspired solely by the pipelines. The author in no place remarks on architectural differences, and whether for the Alpha chip different optimisations would have been preferable. In sum, the code transformations presented here are moderately interesting, and could be of use to readers, but the presentation precludes any such usefulness. > Referee Recommendations. Please indicate overall recommendations here, and > details in following sections. > > 1. publish as is > 2. accepted provided changes suggested are made > 3. reject Reject for now. I'm willing to take another look at it provided the author makes some drastic changes in the presentation. > D: Referee Comments (For Editor Only) > > E: Referee Comments (For Author and Editor) See above. > F: Presentation Changes This needs a lot of work. English, style, and the author needs to be much clearer in his explanations.