Referee 1 *******************************************************************

I think this paper should be improved before publication. There should be a deeper
discussion of the two applications
1) The GA method should discuss results from >2 processors
2) The Virtual California discussion seems inaccurate and incomplete. It
   should be enhanced.
Further code should only be presented to illustrate essential issues.
Appendix A and perhaps some of pseudo-code in sec. 3 should be placed on a web-site
and referenced in text. This material is useful but not suitable for publication.

I would not classify BSP as an automatic parallelization scheme unless it
has changed recently -- it is of same type as MPI
Further I think "In addition, MPI assumes that the system is either a
massively parallel processor (MPP), or a cluster of nearly identical machines [2,4]."
is incorrect -- MPI works on any parallel system (to which it has been ported)
Of course there is more user effort on a heterogeneous system to make load balancing
work

I see sections 2, 3 and 4 as inconsistent. Section 2 settles on a distributed memory
parallelism model (MPI) and section 3 applies it to a Genetic Algorithm.
For some reason this "encourages" authors to think about parallelizing the
Virtual California model. However although this will parallelize, the computational
structure has little in common with the GA method and so I don't see the connection.
Further section 4 for some reason proposes a SMP (shared memory) parallelism
scheme even though Beowulf and GA example are MPI style distributed memory.
The PDE algorithms in Virtual California are suitable for MPI.


Referee 2 *******************************************************************
This paper does not break fundamentally new ground in terms of algorithmic strategy, but it
brings some of the parallelization issues for genetic algorthims and Green's function up to
date in terms of current hardware available to a modest university effort.

In addition it is significant in providing simple examples of parallel migration for workers
in the earthquake field, which has few documented parallel applications to guide
practitioners.

I recommend publication in essentially the current form with some changes detailed below.


F: Presentation Changes

1) The genetic algoritm description appears incomplete, or else difficult to follow.  I could
not determine what is the initial range of values for each parameter, and if the 100 levels
for each remain static through the many generations, or if they become somehow adapted; if
not, it is hard to see how solutions can ever get closer than 1% of the initial range. A
sentence describing crossbreeding would also help, as well as if and how mutation
(randomization?) is applied (are some fraction of the population assigned new random values at
each step to ensure genetic richness? Or are all members -or just some- subject to random gene
substitution? )  I realize some of that is in the references, but those contain many variant
techniques and (I expect) undetermined strategic parameters that guide the algorithm.  These
should be called out in an unambiguous way, with some explanation as to how one selects good
details.

Similarly, the top-level GA code is missing, but should be included (or covered schematically,
as you do for the Virt. Calif. code.)

2) Just prior to Eq. 2 appears the variable "N".  It is not immediately clear if this is the
number of fault segments, or the number of stress coefficients, or if these are the same.
Since your faults are all vertical with purely horizontal slip, I suppose they are the same;
but the text could be clearer with a very slight modification.

3) Figure 4 is confusing; I see no output from EQ_Simulator.c.  Some minor rearrangement might
help, and perhaps a brief expansion of the caption, to the effect that kinematic and stress
greens function modules enable deformation and fault-failure (respectively) in the simulation.

4) The B.1.2 pseudocode appears wrong, in that time is updated inside the loop over segments
(N).  I suspect the time update should be in the next loop up. Is "time" the same as "t"?

5) It is not clear to me that so much detailed (GA) deformation code is helpful; are two
versions of "fit" and such functions as "read_field_data" and "uxsph" worthy of taking up
paper?  I don't insist on the point, but if the editor wishes to save space, I raise the
question.

In general I find the extent of quoted "GA" code annoying, especially as it does not detail the "GA"
part, but merely the parallelization of the chi-square or fitness function; see point 5 below.
I suppose it becomes the editors choice whether to print it all or demand it be cut back. It
is helpful to have a code example, but some can be reduced to a one-line description for code
that bears on geophysics or instrument data input. Meanwhile the top-level code that
implements the GA strategy is completely missing; I realize it's not where the "concurrency"
is, but its lack makes the story feel incomplete.

6) The pseudocode appears to indicate you divide the greens function problem by partitioning
the fault system segments among the processors, by dealing each out to available resources.
So each processor has the geometry for the full problem, and the master processor collects all
the results.  This is a plausible strategy so long as the problem fits in single-processor
memory, but may not scale to the ultimately possible sizes.  Similarly your EQ simulator
appears to partition over those segments that fail at a given step.

If that's correct, it would be helpful to the reader to say so in another paragraph or two
under section 4.3.  In particular it helps to say what is the principal used to partition the
full problem over processors.  Then a few words as to why this strategy was chosen, and how
you expect it to work out for much larger problems.  If you can say anything definite about
load balance and communication, that also aids the reader's intuition.


Referee 3 *******************************************************************

    I believe it is premature to publish your work at this stage. The results 
that you present for the GA running on two workstations is not very 
illuminating. You should discuss in more detail how the Virtual California 
simulation was parallelized and present some results. I think it would be best 
if you waited until the large Beowulf system you describe in Section 2.1 is 
ready for use, and then run your codes on this system. Then it will be possible 
to get some idea of the scalability of your parallel algorithm, and the extent 
to which you are able to exploit the low-cost cluster-based approach in your 
simulations.