Referee 1 **************************************************************************** This paper presents a detailed performance analysis of VI primitives and its implication for the emerging InfiniBand technology. The paper is easy to read and clear in presentation. The related work section is particularly helpful for readers working on improving the performance of network communication and protocols. Units - The authors use MB and Kbytes -- for consistency and clarity the unit should be Mbytes and Kbytes throughout. Particular Comments ------------------- Page 2, column 2, 4th paragraph - change "these anasylses did" to "this analysis did" Page 4, column 2, section 3.3 - the authors mention "the latency for a single message" - the authors should indicate what this message size is. The authors should explain the annotations in table 2 and footnote 1. i.e, for the latter, it is not clear what is the difference in terms of terminology for Split-C synchronous/asynchronous calls vs. VIA blocking/non-blocking calls. I am also curious how AM/VIA and Split-C perform using the software implemented VIA, e.g., MVIA, and how easy it's to port the existing TCP codes to VIA. I would also like to see the authors discuss the implications of assuming a reliable delivery of messages, and the effects on message loss. Referee 2 **************************************************************************** Publish as is Good work. Especially interesting are parts discussing the implications for Infiniband. Referee 3 **************************************************************************** This is a good systems implementation paper. The main contributions of the paper are two: (1) a description of mapping AMs and Split-C abstractions on VIA, and (2) an evaluation of the implementation ---the cost of implementing AMs on top of VIA primitives is analyzed in terms of the cost of VIA primitives. In addition the authors use this experience to discuss implications for Infiniband and future high-performance network architectures. Sections 5 and 6 are very short. VIA is widely available and accessible to many researchers and parallel programmers unlike the NIC software of other parallel machines on which AMs are implemented. The paper will be very useful with great impact if the authors describe in more detail the implementation of AMs functions on VIA. This paper along with the code could be of great development value and excellent educational tool for parallel programmers. Presentation Changes Remove Figure 3 and 4. They are redundant and hard to read, especially the legends. Referee 4 **************************************************************************** Overall the paper is very good, well written, and of interest to the community. I recommend that the paper be published essentially as is. Several typo errors were discovered and these are noted below in section F. Some comments are provided here which the authors might consider in developing their final version of the paper. The paper is very ambitious and could have be written a two to four separate papers that could easily have stood on their own. There are many degrees of freedom that are being discussed at the same time and several systems that have to be understood (e.g. AM, Split-C, VIA, Infiniband, etc.) and described in a finite space. There are places where the material may be too succinct to give adequate coverage of necessary data or too dense to be easily assimilated by the reader. In a number of cases, the authors assume the reader knows background information that may not be warranted and at other times information is provided later in the paper that could have been useful earlier on. Finally, the discussion about Infiniband is not required for this paper and is too brief to provide a meaningful foundation for understanding. Some specifics follow: In section 2, the doorbell mechanism is mentioned but only much later defined. Also here, the LANai 7 processor is mentioned without informing the reader that it is associated with Myrinet. Later this is cleared up. Section 3.1 would truly benefit from a pair of diagrams showing the general interaction and state of communication and perhaps of the (somewhat simplified) control state transitions. This is also true for sections 4.1 and 4.2 to help the reader picture what is going on. In Table 1 you do not identify LANai from Myricom while both Giganet (the name has changed by the way) and Compaq are mentioned. In the last paragraph of section 3.2 "receive credits" and "NAKs" are not defined but perhaps should be. How sensitive are the timings and rates of Table 1 to message length. In the second paragraph of section 3.3, a subjective statement is made without justification in support of the LANai chip in spite of its lower performance. I think the description of the VIA operations should be first introduced in section 3.1 instead of waiting until Tables 3 & 4 several sections later to show them to the reader. The caption for Table 2 refers to both AM and Split-C but the table itself seems to convey data only about AM; perhaps I am missing something here. In section 9, a statement is made that indicates that large scale systems are enabled by these technologies. But the body of the paper does not talk about scalability and this statement can't be supported by the data presented. Presentation Changes [section 3.1, 3rd paragraph, 2nd sentence] "... posts it into appropriate ..." -> "... posts it into -an- appropriate ..." [section 3.1, 4th paragraph, 5th sentence] "... can be be piggybacked ..." -> "... can be piggybacked ..." [section 3.2, 1st paragraph, 1st sentence] "... a brief overview the infiniband ..." -> "... a brief overview -of- the ..." [section 4.1, 3rd paragraph, 6th sentence] "... includes of a pair ..." -> "... includes a pair ..." [section 6, 1st bullet] "... read operation in an optional ..." -> "... read operation -is- an optional ..."