NSF Award #0096236: Data Parallel SPMD Programming Models from Fortran to Java ------------------------------------------------------------------------------ Progress report, July 2001 Principal investigator: Geoffrey C. Fox: Florida State University Participant individuals: Senior personnel: Bryan Carpenter Graduate students: Han-Ku Lee, Sang Boem Lim Collaborators: Xiaoming Li, Peking University China. Co-author of original proposals for HPspmd. Vladimir Getov, University of Westminster, UK. Chair of Java Grande Message-Passing working group. Activities and findings ----------------------- Research Activities: We have been developing a model for parallel programming that integrates high-level, data-parallel features from languages like High Performance Fortran (HPF) with well-established, library-based approaches to programming distributed memory parallel computers. In particular we have been implementing and testing these ideas in an environment called HPJava, built around an "HPspmd" programming language extended from Java. An early version of an HPJava translation system was reported last year. This year, in the light of experiences with the experimental system, we undertook a major rewrite of the translator. The main goal was to implement an improved translation scheme that we hope will provide realistic performance. We also took the opportunity to bring the standard of checking and error-reporting in our HPJava front-end closer to what would be expected in a practical tool. The new translator is approaching completion, and we hope to be able report results of early benchmarks at a forthcoming workshop (ref 1). The current translation scheme is documented in detail in (ref 2). As a by-product of the efforts to produce an efficient translator, many details of the language definition have been refined and clarified, by now we can more more definitely characterize the HPJava language. HPJava is a strict extension of Java. It incorporates all of Java as a subset (including, for example, nested classes). Any existing Java class library can be invoked from an HPJava program without recompilation. HPJava adds to Java a concept of multi-dimensional, distributed arrays, closely modelled on the arrays of High Performance Fortran. Sequential multi-dimensional arrays--essentially equivalent to Fortran 95 arrays--are available as a subset of the HPJava distributed arrays. Regular sections of distributed arrays are fully supported. The multidimensional arrays can have any rank, and the elements of distributed arrays can have any standard Java type, including Java class types and Java array types. A translated and compiled HPJava program is a standard Java class file, which will presumably be executed by a collection of JIT-enabled Java Virtual Machines. All externally visible attributes of an HPJava class--e.g. existence of distributed-array-valued fields or method arguments--can be transparently reconstructed from Java signatures stored in the class file. This makes it possible to build libraries operating on distributed arrays, while maintaining the usual portability and compatibility features of Java. The libraries themselves can be implemented in HPJava, or in standard Java, or as JNI interfaces to other languages. The HPJava language specification carefully documents the mapping between distributed arrays and the standard-Java components they translate to. Although HPJava incorporates HPF-like distributed arrays, it does not incorporate HPF-like "sequential" semantics for manipulating these arrays. Besides features designed to support library interfaces, HPJava DOES add a handful of high-level features designed to support direct programming with distributed arrays, including a distributed loop construct called `overall'. To directly support lower-level SPMD programming, it also provides a complete set of inquiry functions that allow the local array segments in distributed arrays to be manipulated directly, if preferred. Research Findings: Continuing developments encourage our belief that the HPspmd approach can efficiently and elegantly combine several important paradigms for parallel computing, and our hope that the HPJava system in particular has the potential to become a practical tool for "scientific" computing in the near future. We are encouraged by reports at recent Java Grande and JavaOne, conferences claiming that--on important commodity platforms--Java JIT performance is now approaching parity with C and Fortran compilers, and also by recent discussions in Java Grande Forum meetings highlighting the pressing need for preprocessors to introduce multi-dimensional arrays and other "scientific" features into Java. Research training: Several graduate students have worked on the project. Sung Hoon Ko received a Syracuse Ph.D. on work related to the project last Summer. Two graduate students are currently occupied full-time in developing and testing the HPJava translator and associated software, and eventually expect to complete Ph.D.'s on related materials. Education and outreach: (Not directly using Java for scientific applications, but somewhat related) in the last year the senior personnel on the project (Fox and Carpenter) developed a two-part, Java-based, inter-disciplinary, graduate course on information technology for the School of Computational Science and Information Technology at Florida State University: http://aspen.csit.fsu.edu/it1spring01/ http://aspen.csit.fsu.edu/it2spring01/ Future activities ----------------- Our "2nd generation" HPJava translator will be operational within the next few weeks. Once the basic translator is in place, we have to investigate what kind of optimizations are important in the generated "code" (the current translator is a source-to-source translator, so this code is intermediate Java). Experimentatation and benchmarking will be needed to discover what optimizations are important and worthwhile for automatically generated code, given that this code will subseqently be optimized by state-of-the-art JITs. There are also language-level issues about compile-time and run-time checking of various "sanity rules" in the HPspmd programming model. The current translator incorporates some but not all these checks, and in general the system will probably benefit from more detailed compile time analysis to reduce run-time overheads. Exactly what kind of analyses are worthwhile needs experimental investigation. In parallel with translator issues, we also need to address the questions about the most suitable communication and support libraries, and other platform issues. In the initial system the principal communication libraries are MPI and Adlib (a library of parallel array operations developed in earlier projects). These are accessed through JNI (Java Native Interface) wrappers. This is not expected to be the most portable OR efficient approach in the long run; in most cases pure (or purer) Java is likely to be a much better way to go--avoiding the overheads of crossing the JNI barrier. Hence we need to develop Java platforms for distributed memory parallel computing (refs 3, 4). The original proposal also called for investigation and development of HPspmd bindings to libraries with functionalities of other kinds of established libraries for parallel computing, like the optimized irregular communication in CHAOS/PARTI, the one-sided access to distributed arrays in Global Arrays, and so on. These issues still have to be fully addressed. Of course we are also eager to see the system used for real applications. References: ----------- 1) "Translation Schemes for the HPJava Parallel Programming Language" B.Carpenter, G.Fox, H.-K.Lee, S.B.Lim, accepted for presentation at 14th Workshop on Languages and Compilers for Parallel Computing, Cumberland Falls, KY, August 2001 2) "Parallel Programming in HPJava", B.Carpenter, G.Zhang, H.-K.Lee, S.B.Lim http://aspen.csit.fsu.edu/pss/HPJava/ 3) "MPJ: MPI-like message passing for Java", B.Carpenter, V.Getov, G.Judd, A.Skjellum, G.Fox, Concurrence: Practice and Experience, vol 12, no. 11, 2000, p1019. 4) "Scalable Computing Environments from Internet Technologies", B.Carpenter, proposal to NSF ITR program, Jan 2001.