Subject: Paper review - C508 From: "Stephen L. Scott" Date: Tue, 15 May 2001 17:01:45 -0400 To: Geoffrey Fox CC: scottsl@ornl.gov Geoffrey, Below is my paper review for C508. Please email or call if you have any questions regarding my review. thanks, stephen. -- ------------------------------------------------------------------------ >>> Team PVM >>> HARNESS the power for the next generation!! Heterogeneous Adaptable Reconfigurable NEtworked SystemS ------------------------------------------------------------------------ Stephen L. Scott, Ph.D. voice: 865-574-3144 Oak Ridge National Laboratory fax: 865-574-0680 P. O. Box 2008, Bldg. 6012, MS-6367 scottsl@ornl.gov Oak Ridge, TN 37831-6367 http://www.epm.ornl.gov/~sscott/ ------------------------------------------------------------------------ C: Paper and Referee Metadata * Paper Number Cnnn: C508 * Date: May 11, 2001 * Paper Title: Comparing Windows NT, Linux, and QNX as the Basis for Cluster Systems * Author(s): Avi Kavas, Dror G. Feitelson * Referee: Stephen L. Scott * Address: Oak Ridge National Laboratory, Bethel Valley Road, Oak Ridge, TN scottsl@ornl.gov Referee Recommendations. Please indicate overall recommendations here, and details in following sections. XXX 2. accepted provided changes suggested are made E: Referee Comments (For Author and Editor) I would like to see comments below considered by the authors. However, all changes need not be incorporated for this work to go forward to publication. This work provides a comparison between the Windows NT, Linux, and QNX operating systems for use as a cluster computing operating system. Three broad areas of comparison are made: 1. kernel services and api, 2. performance, and 3. ease of use. I found the content of the paper to be well informed and to serve as a good overview of these operating system environments with respect to cluster computing. I do question the value of QNX being included in this work, as I do not know of it being used in cluster computing. Furthermore, it is somewhat distracting when most readers will only be concerned with the issues of NT and Linux. Perhaps the authors could make a better case for its inclusion. Other specific comments regarding the work follow: 1. Perhaps should upgrade this work to include complete Windows 2000 as it appears as though Microsoft is abandoning the Windows NT operating system. Perhaps this could be included in the final version of the paper as while there are many similarities in the two operating systems, there are some significant differences. 2. If not including Windows 2000 discussion - then clarify in the document that the information is only relevant for Windows NT. Windows 2000 is mentioned a number of times in the text and occasionally left me confused as to which OS was being discussed. 3. Code examples in sections where code is discussed would be beneficial. Why not provide a web link to the actual codes so that I may try them too. 4. Page 1 - 1. Introduction: Word "throughput" should be "throughout" 5. Page 2 - 1.2 Operating Systems Compared: Regarding NT and POSIX. Should inform readers of the POSIX restrictions - namely that NT is only partial POSIX compliant and if the program is compiled with POSIX compliance turned on - then the application can't use native windows api calls. It is an either / or option. 6. Page 2 - 1.2 Operating Systems Compared: While NT can run on Alphas - it is no longer supported for the Alpha processor. This point should be mentioned to the reader. 7. Page 3 - 2. kernel services and API comparison: What is "(man 2 section)" - perhaps a reference not completed? 8. Page 4 - 2.1.1 process creation: discussion of DCOM with respect to NT. I thought that Microsoft calls their DCOM equivalent COM+ and will be called ".NET" in future. 9. Page 8 - Table 2: set column text to "left justify" to make text more readable without large spaces between words. Also, turn off hyphenation for table information. 10. Page 9 - 2.1.8. process termination detection and error handling: duplicate use of "the the" 11. Page 11 - Table 3: what about NT 4.0 Workstation - should list SMP support to complete table. Also, what kernel version of linux supports 16PE SMP? 12. Page 11 - 2.3.1 "Windows NT Server should be used on ..." Explain your recommendations for specific version of Windows NT OS. 13. Page 11 - 2.3.1 - is "NTS/E" NT Server Enterprise - if so, it should say somewhere before being used in only this format. Remember that some readers may only be Linux literate and you must help them understand too. 14. Page 11 - 2.3.1 - explain further what "User-space threads are not." means. Does this mean that all user's threads stay on the original parent PE on a SMP box? 15. Page 12 - 2.3.2 processor affinity: paragraph starting with "In general..." There is a good paper discussing the ramifications of specifying thead/processor affinity. Perhaps some comments regarding this paper and a citation is appropriate. Take a look at the following paper for some NT scheduling insights - "SIAM News," December 1998 - Parallel Computation of Hessian Matrices Under Microsoft Windows NT. 16. Page 16 - 2.6 time measurement and timers: Windows timer functions are discussed but not provided. This is also one place where timer codes would be valuable. 17. Page 17 - 3.1 methodological issues: show codes here - as a reader, I want to see this. 18. Page 18 - 3.2.1 process creation and termination: I want access to the actual code - either in text or from a web page. I should be able to run your exact codes on my machines too. 19. Page 19 - 3.2.2 process suspension and resumption: again, your codes... 20. Page 20 - #2: Reference [27] is to work on Alphas - your work was done on Intel boxes. I think this is a flaw in your comparison. If you disagree, you should address the issue in the paper and tell the reader why Alpha tests validate your Intel based tests. 21. Page 21 - #4: Looks to me like Windows stabilizes at 16-bytes and QNX at 32 - these points are the first at which the are in the same "scope" as all subsequent points - thus implies first stable point. However, is possible that you can see better with actual data as I could only go by the graph points. 22. Page 22 - 4.1 "Counting the number... metrics for..." Should be "metric" - no "s" on the end. 23. Page 24 - 4.4: Text leading to reference [36] specifically sounds as though you are talking of channel bonding. If this is the case please say so as your reference is to the entire chapter, which contains much information. AND if you are saying that channel bonding - this is an inaccurate statement as most Beowulf systems do not use channel bonding and simply use standard fast Ethernet (100BT) network communications. 24. Page 25 - 4.4: Regarding the Windows NT - "reboot on new drivers issue" - drivers do not often change once a cluster node is built - thus, while true, this issue becomes more of a "non-issue" when discussed in the terms of clusters versus desktop NT machines. Of course, I can't recall the last time I bothered to update drivers on my desktop NTs either... 25. Page 25 - 4.5: What about the Windows 2000 command line capabilities for most administration tools? 26. Page 25 - 4.5: Regarding registry entries - these can be easily moved as a block from one system to another. 27. Page 25 - 4.5: For remote PC access - best tool I have found is VNC - IMHO it outperforms its commercial counterpart and it is FREE on the web. http://www.uk.research.att.com/vnc/ 28. Page 26 - 5.: fifth line of text - starting with "the QNX" should read "that QNX"