Here's my review -- let me know if you need any additional information. -- Arthur B. (Barney) Maccabe http://www.cs.unm.edu/~maccabe Computer Science Department email: maccabe@cs.unm.edu The University of New Mexico voice: (505) 277-6504 Albuquerque, NM 87131-1386 FAX: (505) 277-6927 C: Paper and Referee Metadata Paper Number Cnnn: C508 Date: Paper Title: Comparing Windows NT, Linux, and QNX as the Basis for Cluster Systems Author(s): Avi Kavas and Dror G. Feitelson Referee: Arthur Maccabe Address: maccabe@cs.unm.edu Computer Science Department The University of New Mexico Albuquerque, NM 87131 Referee Recommendations. Please indicate overall recommendations here, and details in following sections. accept with major revisions D: Referee Comments (For Editor Only) E: Referee Comments (For Author and Editor) The basic premise of this paper -- a comparison of which OS to use in the construction of a cluster system -- is intriguing. This could be a very useful paper although, I'm not sure that I would include QNX.. Unfortunately, there are a number of problems that make the paper unacceptable in it's current form. The writing style is very comfortable and easy to read. My biggest problem with this paper is that it doesn't clearly identify the operational mode for the cluster they want to build. Is the cluster intended to provide a compute service for a large number of independent jobs (e.g., a Condor flock), is it intended to be a giant Web server, is it indented as a large transaction processing system, is it going to be used for graphics and/or rendering, or is it intended to provide a capability for parallel scientific applications. These are very different applications of cluster technology and will introduce different considerations. For example, if I'm building a giant Web server, I am likely to be more concerned about access to shared disks and connections to external networks. On the other hand, if I'm building a systems to provide a new capability, I will focus on the internal network structure. The time to launch an application (as long as it is within reason) will not be important, as I only expect to launch a few tens of jobs per day. Without knowing the operational mode, the paper reads like a list of things that the authors picked out of the air and were easy to measure. While the paper seems aimed at system developers, the biggest issue in any system development is the applications. As an example, it is my understanding that the NT clusters at NCSA were underutilized not because of any performance issues, but because they lacked the development tools that the application programmers were familiar with. The networking section is particularly weak. There is no discussion of MPI or Gigabit technologies which are clearly dominating the high-end cluster development. Moreover, it might be useful to have information on UDP performance which is being rediscovered for many of the cluster tools. Even if they are not particularly interested in developing clusters for capability computing, the authors should at least note the results from the "Top 500" list. We all know that the list is flawed, but it is also clear that there are problems with the other metrics the authors are using. The conclusions need to be strengthened. There are no significant conclusions. There are clear areas in which Linux is the dominate clustering OS (scientific computing -- due to application write familiarity) and others where NT is the dominate clustering OS (visualization clusters -- because of the availability of graphics drivers). F: Presentation Changes