Referee 1 ******************************************************************* I think this paper should be improved before publication. There should be a deeper discussion of the two applications 1) The GA method should discuss results from >2 processors 2) The Virtual California discussion seems inaccurate and incomplete. It should be enhanced. Further code should only be presented to illustrate essential issues. Appendix A and perhaps some of pseudo-code in sec. 3 should be placed on a web-site and referenced in text. This material is useful but not suitable for publication. I would not classify BSP as an automatic parallelization scheme unless it has changed recently -- it is of same type as MPI Further I think "In addition, MPI assumes that the system is either a massively parallel processor (MPP), or a cluster of nearly identical machines [2,4]." is incorrect -- MPI works on any parallel system (to which it has been ported) Of course there is more user effort on a heterogeneous system to make load balancing work I see sections 2, 3 and 4 as inconsistent. Section 2 settles on a distributed memory parallelism model (MPI) and section 3 applies it to a Genetic Algorithm. For some reason this "encourages" authors to think about parallelizing the Virtual California model. However although this will parallelize, the computational structure has little in common with the GA method and so I don't see the connection. Further section 4 for some reason proposes a SMP (shared memory) parallelism scheme even though Beowulf and GA example are MPI style distributed memory. The PDE algorithms in Virtual California are suitable for MPI. Referee 2 ******************************************************************* This paper does not break fundamentally new ground in terms of algorithmic strategy, but it brings some of the parallelization issues for genetic algorthims and Green's function up to date in terms of current hardware available to a modest university effort. In addition it is significant in providing simple examples of parallel migration for workers in the earthquake field, which has few documented parallel applications to guide practitioners. I recommend publication in essentially the current form with some changes detailed below. F: Presentation Changes 1) The genetic algoritm description appears incomplete, or else difficult to follow. I could not determine what is the initial range of values for each parameter, and if the 100 levels for each remain static through the many generations, or if they become somehow adapted; if not, it is hard to see how solutions can ever get closer than 1% of the initial range. A sentence describing crossbreeding would also help, as well as if and how mutation (randomization?) is applied (are some fraction of the population assigned new random values at each step to ensure genetic richness? Or are all members -or just some- subject to random gene substitution? ) I realize some of that is in the references, but those contain many variant techniques and (I expect) undetermined strategic parameters that guide the algorithm. These should be called out in an unambiguous way, with some explanation as to how one selects good details. Similarly, the top-level GA code is missing, but should be included (or covered schematically, as you do for the Virt. Calif. code.) 2) Just prior to Eq. 2 appears the variable "N". It is not immediately clear if this is the number of fault segments, or the number of stress coefficients, or if these are the same. Since your faults are all vertical with purely horizontal slip, I suppose they are the same; but the text could be clearer with a very slight modification. 3) Figure 4 is confusing; I see no output from EQ_Simulator.c. Some minor rearrangement might help, and perhaps a brief expansion of the caption, to the effect that kinematic and stress greens function modules enable deformation and fault-failure (respectively) in the simulation. 4) The B.1.2 pseudocode appears wrong, in that time is updated inside the loop over segments (N). I suspect the time update should be in the next loop up. Is "time" the same as "t"? 5) It is not clear to me that so much detailed (GA) deformation code is helpful; are two versions of "fit" and such functions as "read_field_data" and "uxsph" worthy of taking up paper? I don't insist on the point, but if the editor wishes to save space, I raise the question. In general I find the extent of quoted "GA" code annoying, especially as it does not detail the "GA" part, but merely the parallelization of the chi-square or fitness function; see point 5 below. I suppose it becomes the editors choice whether to print it all or demand it be cut back. It is helpful to have a code example, but some can be reduced to a one-line description for code that bears on geophysics or instrument data input. Meanwhile the top-level code that implements the GA strategy is completely missing; I realize it's not where the "concurrency" is, but its lack makes the story feel incomplete. 6) The pseudocode appears to indicate you divide the greens function problem by partitioning the fault system segments among the processors, by dealing each out to available resources. So each processor has the geometry for the full problem, and the master processor collects all the results. This is a plausible strategy so long as the problem fits in single-processor memory, but may not scale to the ultimately possible sizes. Similarly your EQ simulator appears to partition over those segments that fail at a given step. If that's correct, it would be helpful to the reader to say so in another paragraph or two under section 4.3. In particular it helps to say what is the principal used to partition the full problem over processors. Then a few words as to why this strategy was chosen, and how you expect it to work out for much larger problems. If you can say anything definite about load balance and communication, that also aids the reader's intuition. Referee 3 ******************************************************************* I believe it is premature to publish your work at this stage. The results that you present for the GA running on two workstations is not very illuminating. You should discuss in more detail how the Virtual California simulation was parallelized and present some results. I think it would be best if you waited until the large Beowulf system you describe in Section 2.1 is ready for use, and then run your codes on this system. Then it will be possible to get some idea of the scalability of your parallel algorithm, and the extent to which you are able to exploit the low-cost cluster-based approach in your simulations.