Referee 1 **************************************************************** E:Referee Comments(For Author and Editor) This paper presented "a survey and history" of distributed applications. It contains many references to existing work; and, it discusses some future implication of distributed applications. Overall, this paper lacks of technical merits since it did not discuss issues and difficulties faced by programmers/developers. Moreover, the title of the paper is misleading - one would think that the paper is proposing a common framework for developing distributed application. Referee 2 **************************************************************** E: Referee Comments (For Author and Editor) This paper has a worthwhile aim: to define a common development framework specifically for distributed applications. On page 1, DAs are defined to include parallel, fault-tolerant, real-time, and web applications. (There is a further category - functional specialisation - which does not seem to have the well-known currency of the others.) The paper introduces the topic, then goes on to look at concurrency, real-time, distributed and parallel systems in detail. The breakdown of the paper in such a way does not match the definition on page 1 (which comes from a 1989 reference), and it would be better to just leave that out, and start with defining the world as it is going to be covered in the paper. Indeed, the breakdown the paper gives is the far more well-known and rational one. Each section looks at specific aspects particular to that area, and in the first three cases reaches a conclusion. There are no conclusions for parallel processing. Section 7 then looks for strengths of each discipline and overlappings, makes a case for integrated approaches and identifies the main problems and future work. There is then an example-based appendix (more of which later). The paper contains nearly four pages of references, which I regard as excessive. Even though the paper has survey tendencies, it should be possible to trim these down to those that are really essential to support the work. The problem with much of the paper is that what is said is already well-known and present in text books on operating systems or distributed systems. There is also a certain datedness associated with some of the subsections, for example 2.3 refers to LANs and WANs, but does not mention middleware or web services. It would only be valuable to restate this information in Section 2 if a framework were then developed for the following four sections to follow. This is not the case. Section 3 discussed the theory that might assist in a common development framework. The discussion relates to work done in the 1980s and concludes that it has had little impact. If this is the case, then this section has a minimal bearing on any proposal. It could be reduced simply to the conclusion. Section 4 is a fairly rich discussion of tools and techniques for real-time development, but once again, goes too far back in history for a research paper (e.g. occam and Chorus are dated at 1988). The reader will be searching for the essence of the discussion that is going to lead to the common framework, but it is not present. The conclusion (4.4) makes some statements which are not defensible. For example, in 4.4 paragraph one, what is wrong with Java or C++ being produced by a tool? And the reference to UML extensions (line -4) has been superceded by events. The compartmentalisation of concurrency and real-time also misses a point. Much of Kramer and Magee's work, as well as that of Durra and Polylith initiatives and other distributed systems (referenced in section 5) has been shown to be directly applicable to real-time systems. Many of the examples were gleaned from this area. Thus it is not true to say that the expressive power of the methodologies for real-time systems is limited to static systems without replication structures. These are available in Darwin and the other notations. If the growth in distributed systems research was explosive before 1997, then it has been doubly explosive since then. Yet page 9 refers only to 1997 and earlier. The middleware discussion is particularly relevant, more so that Section 5.1. However, there are some misconceptions. Figure 2 gives the impression that the client stub and server skeleton in Corba communicate directly. In fact, as stated in para 4, page 10, the communication is always via the ORB. While a section on design patterns is highly topical, the examples given are insufficient and once again old. In the appendix, there is more mention of newer patterns. The conclusion of Section 5 harps again on old work, and should draw in the latest in middleware technologies, from CORBA, all the way up to EJB and .NET. Section 6 on parallel processing is quite extensive and touches on a broader range of topics than the other three sections. The section on skeletons is relevant to the title of the paper, but once again refers to very old work. The real problem with the paper is that it dows not live up to its title. The reader is led at the start to imagine that a proposal will be forthcoming with guidelines for the future development of distributed systems. Instead section 7 makes generalisations and presents limitations. It would seem that Figure 4 is a possible candidate for an integrated approach but it is neither detailed enough, nor all embracing enough to be really useful. Recommendations regarding adaptation and formal notations are made, but there is no direct line between these and what has gone before, so they sound unsubstantiated. Then we come to the appendix. This is much more acceptable and readable. It describes an example of a distributed system and uses modern techniques to illustrate how distributed systems should be developed. How can the paper be improved? ----------------------------- My recommendation is that the paper should be rejected. However, the idea is sound, it addresses a relevant problem - how to develop distributed systems - and is based on considerable background understanding. If better structured, and brought up to date, the work could be publishable. The following could be done: 1. Considerable reduce sections 3-6 by removing material which is well known, and concentrating only on the conclusion subsections. 2. Bring all the references up to date by considering the work from 1998-2001 in CCPE, Parco, Europar, SPE, IEE Computer and IEE Software. 3. Reduce the references to a max of 25 really relevant ones. 4. Draw up a table which will list the proposed essental issues in the development of distributed software and which will show which of these is already covered in the areas surveyed, and which needs more work. 5. Make the example central to the paper and expand it to include an assessment of its actual implementation. F: Presentation Changes The presentation is fine. Referee 3 **************************************************************** The paper's motivation is pretty interesting and challenging -- to try and analyze commonalities of development techniques coming from historically different fields of concurrent, distributed, real-time, and parallel computing. So the contribution as expected consists of two parts: a) an overview of existing approaches; b) an analysis of the relations, overlappings, cross-fertilization etc. While part a) is covered by the author quite well, the most interesting part, b), is still too weak and should be improved. Without this part the paper just provides a commonly available information which one can find in textbooks. Therefore, the paper in the current version does not fully deliver on its promise. There is just too little of a really new insight which one would expect from a journal paper. I am strongly in favor of this paper's objective and would encourage the author to revise her/his paper along the lines stated here and to re-submit. The plus of the paper is a very good overview of current work on distributed applications with many useful references. Also the case study in the appendix is interesting, but again, every single diagram and solution presented is well known from textbooks on the subject. Unfortunately, the example does not clarify the point how methods from different areas of parallel and distributed computing can be used together or interchangeably. Some additional comments: 1. In section 2.1 you write that distributed applications mostly differ in non- functional requirements. Do you mean they differ in that respect from sequential applications or do you mean the difference among various distributed applications? If the former, then I don't agree: performance is also very important in sequential programming and cache optmizations are similar to optimizations in distributed programming. If you mean the latter then please explain in detail. 2. Sometimes it is not clear where particular concepts are placed in the developement cycle Example: In section 6.4 you write that skeletons are very similar to patterns. In figure 4 you classify design patterns as design phase and skeletons as implementation phase. Probably skeletons should be made a member of the design phase as well? 3. You distinguish between formal(3.1) and semi-formal models(4.2) and mention statecharts in section 4.2. Shouldn't state charts rather belong to the formal stuff? 4. In section 6.2, when mentioning BSP, you probably should also mention LogP and its extensions for heterogeneous, i.e. distributed case. Referee 4 **************************************************************** This paper claims to present a common application framework for distributed applications. I agree with the author that such a framework is overdue and that it is about time to combine results from different research areas into a common development framework. Unfortunately, the paper only presents the state of the art in different research areas without fulfilling its own claims. In the end, the reader is disappointed because there is no common application framework presented in the paper. At least I haven't found it. Since the paper does not hold its promises, it should not be accepted for publication. The only novel idea is the use of the waterfall model for distributed applications. This however needs to be debated carefully. The waterfall model is based on the fact that there is a common hardware abstraction layer so that platform specific questions can be ignored during the design phase. Current parallel and distributed platforms are very diverse (shared verses distributed memory, nearest-neighbor versus switched versus broadcast network, SIMD verses MIMD design) Unfortunately, these aspects typically need to be taken into account during the design phase of high-performance applications since otherwise sufficient performance cannot be achieved at all. The waterfall model seems not to be appropriate for high-performance applications, i.e., for one of the four problem areas mentioned on the first page. The remaining question is whether the survey sections (2-6) are interesting or promising enough. If they were, it might make sense to ask the author to retarget the article and to turn it into a survey (that does not claim to provide technical news.) However, I regard the survey sections to be far from excellent as well. Most of the survey sections are enumerations of approaches or lines of research. If the reader already knows the approaches, there is nothing to gain from the article, because the descriptions are very high-level, i.e., superficial. If the reader does not know the approaches, the descriptions are too high-level to give a sufficient idea. The survey chapters do not provide any classifications of the approaches that would empower the reader to structure or (re-)organize his own knowledge of the field. There aren't any comparisons either, that would allow the reader to better understand advantages or disadvantages of the approaches presented. Finally, the author does not discuss which of the approaches are successfully used in the industry (or in the applied research) today. Hence, there is not much to learn from the survey chapters. In conclusion, the subsections in the survey chapters are just list of facts, systems, and properties without any guiding structure. They do not deliver what I expect from a good survey. (The author seems to be a skeleton expert. This particular subsection is *much* more detailed than others. This fact gives the impression that the author does not have a very deep understanding of the other areas. This impression reduces the credibility of the whole text.) The paper has a rather long appendix. In this appendix, the author presents how various tools can be used in a case study. Unfortunately, this appendix is not tightly coupled to the article. The article does not gain a lot from the appendix. On the other hand, the appendix does not show the results of the article applied to practice. So what's the reason for having such a lengthy appendix? To summarize, the paper does not live up to its title and its abstract. Most of the chapters feel like chapters of a survey but do not help the reader in getting an understanding of the field, nor do they provide a new classification or comparison. Hence, the paper is neither a technical contribution to the field nor a valuable survey article. Therefore, I recommend to reject the paper in its current form. If the editor's decision is to accept the paper (or to accept it after major changes) I'm happy to provide a list of detailed comments on grammar, on individual paragraphs, on the list of references, . Referee 5 **************************************************************** ACCEPTED PROVIDED CHANGES SUGGESTED ARE MADE D: Referee Comments (For Editor Only) All comments for author and editor in section (E) below. E: Referee Comments (For Author and Editor) It is really hard to judge this paper, as it tries to cover a very large area of work. If the attempt is to bring together many different aspects of parallel and distributed computing together, then I would say that the author has succeeded in giving the reader a flavour of the pertinent themes in the area. I found the paper interesting to read, even though, as the author claims, the coverage is not meant to be exhaustive or complete. I have a rather contentious point about the title -- perhaps, it should have within it the word "Software Engineering" somewhere -- maybe? I would recommend the following additional references: ``An Object Oriented Framework for Data Parallelism'' J.-M. Jezequel, ACM Computing Surveys, Vol 32, Issue 1, 2000 http://www.acm.org/pubs/contents/journals/surveys/2000-32/ also in the same issue ``Programming languages and systems for prototyping concurrent applications'' Wilhelm Hasselbring and ``Selecting locking primitives for parallel programming'' P. E. McKenney, CACM 39, 10 (October), pp 75--82, 1996 My comments are based on the section in which they appear in the submitted paper. I hope they are useful to the author: SECTION 1: I am uncertain what is specifically different about the types of problems detailed here, that conventional software engineering techniques are not appropriate. For instance, many people have adopted existing SE approaches for developing Web based apps -- why are these approaches not suitable? SECTION 2.1: paragraph 1, line 3 I am not sure I understand what the author means by "should be minimal yet complete" Also, in the same section, some of the outlined requirements, such as "dynamic change management" are surely generally applicable, and therefore not necessarily a requirement of the outlined apps? This again brings me back to the previous comment of why the author thinks conventional approaches are not suitable. SECTION 3.2: paragraph 1, line 1 "formal methods are still far from being widely used" -- how does the author reach this conclusion, and under what context is this true? This is a very general statement, and should be grounded in the context of the on-going discussion. paragraph 2, line 2 not sure what "cycleromising" means paragrah 2, last line Why do practisioners feel these tools have limited value? Can the author provide a reference here? SECTION 4.2/4.3 There is no mention of the declarative style -- and what impact this is likely to have on developing modules. ALthough there is mention of agents towards the end of the paper, the particular approach adopted by this community is not discussed. SECTION 4.4 Why does the author feel that formal semantics are necessary. What is the overall objective of modelling here in the first place. paragraph 2, last line Not sure what "for static systems and do not ..." means SECTION 5.3 paragraph 3, last 2 lines Need more explanation about the patterns defined in figure 3 SECTION 5.4 paragraph 1 What is the impact of using a different design approach -- say a top down approach, rather than a bottom-up as recommended. What impact is this likely to have on the adopted styles? Also, what about hybrid approaches -- what issues arise in the use of these. paragraph 2, line 8/9 Give references for Parse and ADL paragraph 3 What the pitfalls in using the approach -- there is no discussion of this SECTION 6.2 There is no discussion of how relevant or valid are models such as BSP and PRAM for real world applications and architectures. What are issues in efficiency, performance, tool support etc (as identified in section 1 and 2 earlier in the paper) SECTION 7.2 There are lots of general statements here -- and I do not feel they have been truly quantified. For instance, the author claims that "most" structured OO design approaches do not provide replication structures -- which design approaches does he have in mind. How is this conclusion reached? Overall I find the paper interesting to read -- as it has tried to bring together many different ideas and themes. However, what the author has not done is provide a critique of existing approaches, or why they are inappropriate. The material is presented in a "as is" manner -- without direct comments from the author. Although a very time consuming task (and one which I do not necessarily advocate the author do), would be to have a working example (or multiple small examples) that are spread throughout the paper, and which highlight particular themes that the author is trying to suggest. The 2 examples in the Appendix do not fully bring out the issues that are highlighted and discussed in the paper. F: Presentation Changes SECTION 2.1 paragraph 3 (after bullet points), line 3 "serve" -> "serves" SECTION 2.2 second bullet point ("Interaction Management") delete "which components communicate with which" SECTION 4.1 pargraph 2, line 5 "occam" -> "Occam" SECTION 4.3 paragraph 2, line 2 second "that" -> "there" paragraph 6, line 2 "Ng ..." -> "Ng et al." SECTION 6.1 paragraph 1, line 2 insert a "," after the for second bullet point ("scalable" abstractions), after paragraph 4, line 2 re-phrase "the decomposition into tasks ..." SECTION 6.3 paragraph 1, line 8 remove second "include" SECTION 6.4 paragraph 3, line 8 remove "of" between `Several' and `such' paragraph 3, line 11 "than" -> "to" SECTION 7.1 line 8 insert "as" between `well' and `standard' Spelling error in Figure 4 in Implementation section "Secttion 5.2" -> "Section 5.2" REFERENCES Reference for CLEAVELAND, R should be CLEAVELAND, R. AND SMOLKA, S. A. This paper was written by many people -- and the actual reference says CLEAVELAND, R et al. SKILLICORN, D. and TALIA, D. -- replace "ans" -> "and" Spacing problem on "VICTOR, B." Referee 6 **************************************************************** E: Referee Comments (For Author and Editor) This is a curious paper. While it is clear that the author intended this as a broad survey, after a careful reading I am not sure that this is the kind of submission that CCPE is looking for. My main concern about the paper is that it does not provide adequate detail about the systems and techniques described to be of real value as a survey, nor does it present any significant new direction or synthesis of ideas to be compelling. The title is somewhat misleading in this regard as well. On the other hand, the paper does cover a very broad scope of topics and could be potentially useful -- if focused towards a particular audience and with a particular theme. As it stands this paper reads as the draft of an intro textbook chapter on distributed computing; I would hope that the audience for CCPE would be well-versed in most of these areas already. However, I believe this paper could be retasked to focus on the cross-fertilization of distributed systems techniques with parallel computing (or vice versa). This would yield an altogether different paper, which is why I am recommending rejection for this paper as it stands. The real value here would be in introducing techniques from distributed computing fields and making specific recommendations for their application to parallel computing. (The discussion could go in both directions, but I would advocate a more focused approach rather than attempting to bite off too much.) In particular I recommend a tutorial on applying techniques from one area (distributed systems, say) to a problem or set of problems in parallel computing. This would still essentially be a survey although more directed at particular problems and at gaining a better understanding of both approaches. Some specific recommendations on the text follow. Section 2 is very high-level and apart from establishing some basic terminology simply reiterates well-known concepts. I would suggest trimming this section down considerably and extracting the basic definitions as required for other sections of the paper. There seems to be some missing text in the last paragraph on page 5, starting with "development cycleomising [sic] development ..." The paper uses too many forward references to technologies which are described in later portions of the text. Examples include "See Section 4.3" in section 3.2, "to be described later" in Section 4.2, and the use of the term "configuration language" before its definition in Section 5.4. Some reorganization of the presentation would be extremely helpful for clarity. In addition there are many unexpanded acronyms in the text (JSD, CODARTS, UML, etc.) that are not in common usage (at least within the parallel computing community). Section 4.2 does not seem to be closely related to real-time systems; many of these techniques are applied to concurrent systems in general. If there are specific ways in which these techniques are used in real-time systems that are useful, it would be good to point them out. POSIX is not only a standard for thread-based systems; you should reference POSIX 1003.1c (1995) instead. You should cite the standard. The appendix seems somewhat contrived and does not illustrate the mechanisms very well. Showing a set of UML diagrams is not particularly useful if the reader is not familiar with UML or the meaning of the diagram symbols. Also, it would have been very useful to include concrete examples (such as the one presented in this appendix) during the discussion throughout the paper; this would have helped to make the description of the various concepts more concrete. Leaving this material as an appendix does not really do it justice. F: Presentation Changes