Dr. J.S. Reeve Concurrent Computation Group Department of Electronics and Computer Science University of Southampton Southampton SO17 1BJ UK Dear Dr. Reeve We append 2 referee report(s) on your paper C459: A Parallel Viterbi Decoding Algorithm We would be happy to publish your paper if you addressed the changes suggested by the referees. Please include a discussion of your changes and their answer to the referees in your resubmittal. If this is persuasive, we can publish your paper without further refereeing. I thank you for your interest in Concurrency.Practice and Experience and apologize for not replying earlier. Please send all communication electronically if possible using the address fox@csit.fsu.edu If you should need a "real address", please use: Geoffrey Fox Computational Science and Information Technology Florida State University 400 Dirac Science Library Tallahassee Florida 32306-4130 850-644-4587 but easiest is cell phone 3152546387 C459 Referee Reports Referee One ----------- This paper would greatly benefit from a more detailed explanation of the Viterbi algorithm. Many readers will not be familiar with how to interpret some of the figures, particularly Figures 1 and 2. Also, given that memory is presented as the prime reason for wanting a parallel algorithm, it would be useful to know what lengths of the generating shift register are typical in applications, and how strong the motivation is to go to greater lengths. This would allow the potential impact of the parallel algorithm to be assessed. In the summary section, the author says that the timing difference between the just communication in the parallel algorithm and the complete parallel algorithm is always less than 5%. This means that more than 95% of the time in the parallel algorithm is spent on communication. This doesn't seem to be borne out by the results shown in the tables. For example, in Table IV for the code (127,106,7) the time on one processor is 259 seconds and on four processors is 174 seconds. This would imply that the time spent on communication is 174 - 259=4 = 109:25 seconds, which is only about 63% of the total time. Can the use of FPGA technology address the large memory requirements of the Viterbi algorithm mentioned in the second paragraph of the introduction? There are a couple of minor issues that need to be addressed. 1. In line 6 of the introduction the word \codes" is repeated. 2. In Figure two the text says that thick lines are used to show path branches for input bit 0. All the lines look the same thickness to me. The only difference is that some of the arrow heads are larger than others. 3. The text says that Figure 3 shows the matrix for BCH (15,7,2), whereas the figure caption says it shows the matrix for BCH (31,16,7). Referee Two ----------- This paper needs a stronger introduction to motivate the Viterbi aklgorithm and the need for its parallelization. More recent references need to be cited to demonstrate the importance of your work. Please provide a general description of the Viterbi decoding algorithm and its parallelization.