Referee 1 *************************************************************** This is an interesting paper which MUST improve its use of English but with this change should be published. There are so many English problems that I cannot list them! As a start the title could be something like: Sequential Performance Optimization of GeoFEM on vector and scalar architectures for the Earth Simulator Referee 2 *************************************************************** This paper discusses various techniques that were applied to the GeoFEM code to improve its performance. The architectures alluded to in the title are the SX-4, VPP5000, SR8000, and the DEC Alpha. First of, the English of this paper is really bad. In many sentences the reader has to guess the meaning. Then, the author omits explanation of various relevant details. For one, there is mention of forward and backward substitution, but there is no explicit mention of what the preconditioner is. I presume a block Jacobi with an ILU local solve, but I would like to have an explicit statement. More importantly, the author never describes the shape of the problem domain, the discretisation used on it, and the exact data structure, all important factors for the understanding of the code transformations. Section 3.2 mentions the "cyclic multicolor on hyperplace/RCM", but this phrase is never explained. The author should delete the explanation of piplining (section 3.1) and replace it with a clear explanation of the data structure. Without this, the code fragments, starting with figure 1, are unintelligible. Figure 3 especially puzzles me. I do not recognise any kind of solver in this. Perhaps there is another paper in special issue that you can reference for this needed background. The main content of this paper is the five "performance factors" shown at the end of section 3.6, and the code transformations used to satisfy them. However, there is insufficient explanation of how the transformations achieve this. By the way, in factor (4), "latency" should be "ratio", I think. The data structure transformation of section 3.7 might be interesting, but the author should first explain what the original data structure was, and devote more space to how and why the transformation is beneficial. The beginning of section 3.8 is a good example of the author's writing style. "Both (1) and (2) of performance factor [...] was implemented to the model coding of Fig.4." Apart from this being execrable English, it would be so much easier to understand this if the author would simply substitute what performance factors 1 and 2 and figure 4 are about. Now the reader has to leaf back and forth through the paper to understand this sort of comment. According to its caption, figure 12 is the code of "Model coding of direct access". The author should mention that the algorithm is the substitution part of the solver. One more general comment: performance tests are done on two vector pipeline machines and one superscalar chip. It looks to me as though all the transformations are inspired solely by the pipelines. The author in no place remarks on architectural differences, and whether for the Alpha chip different optimisations would have been preferable. In sum, the code transformations presented here are moderately interesting, and could be of use to readers, but the presentation precludes any such usefulness. Presentation Changes This needs a lot of work. English, style, and the author needs to be much clearer in his explanations.