Referee 1 ***************************************************************

This is an interesting paper which MUST improve its use of English but
with this change should be published. There are so many English problems that
I cannot list them!
As a start the title could be something like:

Sequential Performance Optimization of GeoFEM on vector and scalar
architectures for the Earth Simulator

Referee 2 ***************************************************************

This paper discusses various techniques that were applied to the GeoFEM
code to improve its performance. The architectures alluded to in the title
are the SX-4, VPP5000, SR8000, and the DEC Alpha.

First of, the English of this paper is really bad. In many sentences the
reader has to guess the meaning.

Then, the author omits explanation of various relevant details. For
one, there is mention of forward and backward substitution, but there
is no explicit mention of what the preconditioner is. I presume a
block Jacobi with an ILU local solve, but I would like to have an
explicit statement.

More importantly, the author never describes the shape of the problem
domain, the discretisation used on it, and the exact data structure,
all important factors for the understanding of the code
transformations. Section 3.2 mentions the "cyclic multicolor on
hyperplace/RCM", but this phrase is never explained. The author should
delete the explanation of piplining (section 3.1) and replace it with
a clear explanation of the data structure. Without this, the code
fragments, starting with figure 1, are unintelligible. Figure 3
especially puzzles me. I do not recognise any kind of solver in this.
Perhaps there is another paper in special issue that you can reference
for this needed background.

The main content of this paper is the five "performance factors" shown
at the end of section 3.6, and the code transformations used to
satisfy them. However, there is insufficient explanation of how the
transformations achieve this. By the way, in factor (4), "latency"
should be "ratio", I think.

The data structure transformation of section 3.7 might be interesting,
but the author should first explain what the original data structure
was, and devote more space to how and why the transformation is
beneficial.

The beginning of section 3.8 is a good example of the author's writing
style. "Both (1) and (2) of performance factor [...] was implemented
to the model coding of Fig.4." Apart from this being execrable
English, it would be so much easier to understand this if the author
would simply substitute what performance factors 1 and 2 and figure 4
are about. Now the reader has to leaf back and forth through the paper
to understand this sort of comment.

According to its caption, figure 12 is the code of "Model coding of
direct access". The author should mention that the algorithm is the
substitution part of the solver.

One more general comment: performance tests are done on two vector
pipeline machines and one superscalar chip. It looks to me as though
all the transformations are inspired solely by the pipelines. The
author in no place remarks on architectural differences, and whether
for the Alpha chip different optimisations would have been preferable.

In sum, the code transformations presented here are moderately
interesting, and could be of use to readers, but the presentation
precludes any such usefulness.

Presentation Changes

 This needs a lot of work. English, style, and the author needs to be
much clearer in his explanations.