ࡱ; *+  !"#$%&'(),-./01R F7r:CompObj\WordDocumentwWObjectPool q: q:  FMicrosoft Word 6.0 DocumentNB6WWord.Document.6;  Oh+'0  & 2 >JR Zf @Macintosh HD:Microsoft Office:Microsoft Word 6:Templates:NormalThis proposal is a collaboration between Cornell, Houston and Syracuse who are currently collaboratingܥhO e*KwW*HTlTllTlTlTlTlTTTTTTT TTU=UUUUUUUUUUUUUUU&UX>V9UlTU&'UUUUUUlTlTUUUUUUlTUlTUUT&T@lTlTlTlTUUU{U This proposal is a collaboration between Cornell, Houston and Syracuse who are currently collaborating on ARMS. We have put together an interdisciplinary team involving the Networking and HPC communities. ARMS will be the centerpiece and viewed as enabling a set of large scale distributed applications including both computing and collaboration. Our research will focus on integration of networking into the ARMS framework, and the development and evaluation of tools that measure the network requirements of ARMS applications. We think this could be very important in understanding the needed NGI, Internet 2 technology developments. This proposal does not cover the development of those networking technologies but provides a assessment mechanism for them as they are developed. We intend to include (no-cost) international collaborators so that we can start to look at consequences of intercontinental networking. Distributed Resource Management Within this proposal we will address key technical challenges in scheduling and managing a distributed, scalable teraflops environment through an Advanced Resource Management System (ARMS) that is being developed and tested among Alliance partners. The ultimate goal is to provide a seamless environment for distributed computing in a secure and fault-tolerant fashion while making effective use of the available resources. ARMS is based on an open, layered architecture that has a rich application programming interface to facilitate integration of existing and new resources. ARMS builds on a successful collaboration between CTC and IBM that resulted in an application programming interface to LoadLeveler that allows this resource manager to be controlled by any external scheduler. At CTC a new, enhanced version of Extensible Argonne Scheduling sYstem (EASY) was developed to work with the application programming interface (API). The key to ARMS' functionality is an intelligent, high-level scheduler that uses a comprehensive, distributed object-oriented database (being developed in collaboration with Xerox Corporation) to collect and store metadata about user code requirements, such as the location and size of data objects, preferred or required computational resources, the operating system version on which it was developed, and versions of compilers used. In the longer term, ARMS will provide higher-level descriptive capabilities to users so that they can succinctly describe their main requirements and leave it to the system to decide on details. The high-level scheduling system will consult the ARMS databases to determine data and execution requirements for particular codes and algorithms requested by the user. The system will automatically select the needed resources and determine more fine-grained scheduling, data transport, and caching requirements. (For example, it might determine that different parts of the overall job should be run at different sites and be reassembled for the user when completed.) The high-level scheduler will pass lists of resources that meet users' requirements to a low-level scheduler that decides what resources a job will use and when it will start. We are working with major database companies, such as Informix/Illustra, on the development of this next-generation system. Distributed Computing Environment (DCE) software will be used to provide distributed cross-realm authentication, whereby users will be able to authenticate once to the system and have seamless access to all Alliance resources. This will join distributed administrative domains without sacrificing security. A user who travels to different locations will see the same user interface and methods of operation due to our Web-based interface. This ARMS interface will allow users to tune their job requirements and get instant feedback about when they can expect that job to run. It is currently under development in a collaboration between SYR and CTC. Figure 2 illustrates ARMS' functionality across the Alliance advanced computational environment. We have successfully completed several milestones in developing ARMS. These include installation of the base ARMS resource management software at all mid-range sites and verification of centralized control capabilities; verification of survivability of the centralized scheduling control; successful testing of automatic, cross-site, fail-soft capability that allows local and global resource management activities to continue even when one or several sites become unavailable (e.g., network outage); development of a secure Web interface built on an Apache server using Kerberos authentication; deployment and test of DCE/DFS over all sites; initial development of the distributed database for data and computational resources with a heterogeneous resource scheduler being built on top of the database. Global DataSpace Effective data management is an increasingly complex problem in high-performance computing. Very high in the local environment is essential to keep pace with leaps in processor speed. Intelligent access, movement, and replication of data in geographically distributed sites are critical to the success of an advanced computational environment. Efficient ways to access and analyze massive amounts of information are important in both local and distributed environments. Global DataSpace consists of storage components that will be phased in at each mid-range resource site and controlled by ARMS. Components will include a common distributed file system, a high-performance file system optimized for the architecture on which it will be run, and a mass storage system. CTC will use the High Performance Storage System (HPSS), which couples the mass storage and file system functions within a hierarchical storage management framework. Data automatically migrate between tape archive and an active disk cache as needed. Functional integration and automated data movement are essential so that researchers with the largest problems have use of very large amounts of disk space in an affordable manner-in essence, using a "virtual disk" capability. CTC has conducted early scaling tests with HPSS using hundreds of SP nodes and achieving an aggregate data rate of more than a gigabyte per second. This system should scale in the future to handle billions of files and petabytes of data. CTC is the main integration site for HPSS in an SP environment. HPSS is scheduled to go into production at CTC later this calendar year. Mid-range resource providers will acquire mass storage systems and high-performance file systems, whether HPSS, a different system, or a combination of systems based on their needs, interests, and areas of emphasis. Within a site, ARMS will pre-stage files from the mass storage system to the high performance file system prior to the start of a job so that researchers do not have to wait for data to be brought in from tape. This same approach will be used in the wide-area, distributed environment. Because the underlying ARMS scheduling algorithm is based on the EASY deterministic backfill algorithm, the high-level scheduler can determine ahead of time when a particular job will start at another site and ensure that the data are in place. Further, it will provide a method of implementing a global namespace based on object-oriented access that will mask unique site configurations and dependencies. Testbed Environment CTC, Syracuse and Houston will create a testbed for ARMS and the applications that will be enabled by the ARMS technology. Each testbed site will be integrated fully into the ARMS resource management environment. This will allow users to reserve required resources automatically, based on defined policies. The national testbed environment will rely on the Internet's shared facilities; thus, while ARMS will manage testbed computational resources at all sites, bandwidth management across the Internet will realistically be limited until quality-of-service mechanisms are adopted broadly. Beyond the National testbed we expect to include at no cost, international collaborators to creat e an international testbed. A likely collaborator is the Royal Institute of Technology in Stockholm, which manages the Swedish University Network (SUNet) and the Nordic Net, hosts the domain name server for the region, and has a variety of computational and visualization facilities (e.g. Cray, SGI Origin, IBM SP-2, Fujitsu, ImmersaDesk). Others include Ediburgh, and GMDb...... Network Capabilities Collaboration Recent Web technology developments open the way for new collaborative computing environments which integrate traditional video conferencing, distributed simulation as developed by the DMSO office(SIMNET), and emerging ideas of computational steering. In this proposal we will link the ARMS system with TANGOsim developed at NPAC. TANGOsim uses Java Servers to manage Web collaboration in the traditional fashion already seen in Habanero (NCSA) and Shaking Hands (IBM but originating in NPAC). The Java Server supplies session management with multicasting allowing shared applications between multiple users. TANGOsim was built to support command and control with scripted environments allowing real and virtual members in the command teams. This is implemented as an event driven simulator replacing the simple session manager used in the other Web collaboration systems. As ARMS and TANGOsim are built on a modern Web backbone we can integrate them with ARMS providing the asynchronous scheduling and TANGOsim the synchronous collaborative computational steering linking in each case distributed computers, databases, visualization and multiple collaborating computational scientists. TANGOsim has been developed with other funding for command and control and distance education. Here we propose funding its integration with ARMS and the special features needed for distributed computational steering. TANGOsim is implemented as a Java Server linking a net of Java Applets with a special downloadable plug-in providing the critical connectivity. It already supports client applications written in Java JavaScript and C++ (and hence other traditional languages). We have demonstrated a prototype visualization client for a 3D GIS (Geographical Information System) with a custom VRML browser written in C++. Other major capabilities of TANGOsim include a video teleconferencing module and integration with a state of the art digital video server supporting efficient multicast services. Further over the next few months, we will add a database backend (using the new JDBC Java Database Connectivity) which will allow one to log the complete multimedia collaborative session. TANGOsim naturally provides traditional basic services -- chatboard, shared whiteboard and multimedia mail. Further we have ported several interesting education applications -- the WebWisdom dissemination systems and some nifty Physics education applets -- to collaborative mode. Selected Test Applications Virtual Environments -- A metacomputing application Recently, collaborative science and engineering, using shared virtual environments, has become practical with the advent of higher bandwidth communications channels. Research efforts have focused on two main areas: software infrastructure and application development. Researchers concentrating on software infrastructure have developed software architectures, network protocols, and data sharing algorithms to enable distributed, shared virtual environments. Application developers have built specific applications to serve as proofs--of--concept for the effectiveness of distributed, shared virtual environments. In order to achieve real--time collaboration over long distances, two approaches have emerged: (1) rendering graphics locally on similar platforms and exchanging state data at high rates over a communications link and; (2) rendering graphics at a single facility and delivering those graphics, over communications channels, to remote users. The first technique demands that facilities be duplicated at every user site, but can be accomplished with modest bandwidths (for example, ISDN for a two--site installation or ATM over heterogeneous networks. The second technique can require a communication bandwidth in the gigabit/s range for real--time rendering, reasonable visual resolution, and color depth. In either case, low latencies are essential for user interaction. We propose to work with both approaches. In partnership with SGI and MCI, we will develop, test, and distribute to our partners high performance remote rendering software. This technological innovation, based on work initially done at SGI and ongoing at the UH, will support the remote "writing" of frame buffers. Thus, a powerful graphics supercomputer at one site can effectively drive the displays of inexpensive workstations on the scientist's or engineer's desktop. We will go beyond the initial goal of SGI (one--way remote rendering) to achieve interactive displays through remotely--rendered graphics. Of course, this approach will make extraordinary demands on available bandwidth (A 1600 x 1280, 24--bit pixel display rendered at 25 frames per second is more than 1.2 Gb/s and real--time interaction generally precludes standard compression techniques.), but our partnership with MCI will provide the necessary access to high--speed networks at the inception of the project. Internet II and the vBNS (OC--3/OC--12 at this time, moving to OC--48 within two years) will bring this same capability to others over the next several years. We will utilize our partnership with Argonne National Laboratories to further develop the ability to link similar facilities (in this case, CAVEs in Houston and Chicago). In this effort, much of the focus will be on enhancing interaction and collaboration. Acoustic and haptic interfaces will be used to access additional sensory channels along which to communicate with the user. A useful shared virtual environment may also require visual representations of remote participants that are properly animated. Techniques such as gesture recognition, movement analysis, and voice recognition will be deployed for interactive tasks. The objective of this research thrust will be to improve the design of user interfaces (including display and interaction devices) so that the efficiency of the scientist and engineer, both singly and within team settings, is improved significantly. We will pay special attention to the unique problems of collaboration via shared virtual environments, leveraging from ongoing work at the UH, funded by DARPA. This activity will address, in addition to data representation and interaction with the computer, issues of nonverbal communication between team members, human figure representation (benefits and required fidelity), and multi--user interaction with the same data elements. Another major use of visualization will be its application as a user interface to large--scale mathematical models and simulations. We will seek to make possible the more effective use of such models and simulations executing on supercomputers, locally or remotely. This will be accomplished by allowing scientists and engineers to construct and work directly with meaningful representations of the physical systems that they are attempting to understand. With this approach, the user will directly manipulate the representation of the physical system of interest in order to communicate input parameters to the model/simulation executing elsewhere. By scaling the computing resources to the difficulty of the problem, we will achieve near--real--time interaction between the engineer and the model/simulation, greatly enhancing the ability of the scientist or engineer to pose "what--if" questions and follow a reasoning "trail" by using an appropriate model or simulation. Applications for visualization The application that we propose for metacomputing environments is shared virtual workspaces. Such a metacomputing application can also be viewed as a metacomputing technology, which in itself should be demonstrated and measured in terms of end--user feasibility and value. As end--user applications we propose 3--D medical imaging and video, and 3--D seismic data sets for exploration and reservoirt modelling, and engineering design in the manufacturing industry. These applications are selected becasue first they involve large, complex, 3--D data sets, second the interpretation of the data sets are typically done by parties in different geographical locations, third the end--user has strong demands on quality of representation, ease of perception, and performance ("quality--of--service"). University of Houston has excellent relationships to all three user categories, and a realistic assessment of accomplishments can be achieved. Integration and Assessment We propose to deploy a distributed collaborative computing environment which focuses on the high level user services. Little experience exists on either the useful features of such a system or on the needed infrastructure capabilities in areas of network, parallel multimedia database for session logs and computational services including visualization. Thus our proposal will be structured as a set of application experiments where we will carefully assess both the value of particular system features and needed performance and functionality of underlying infrastructure. This assesment will use traditional computer and network performance tools augmented by enhancements in TANGOsim to log message traffic through the session manger. Here we will follow the strategy exploited in our Virtual Programming Laboratory of using a web implementation of Illinois's Pablo system with ,performance data logged in TANGOsim's backend database using the SDDF data format. The heart of this proposal is the integration of ARMS and TANGOsim with at a lower level TANGOsim integrating the components of a synchronous computational steering environment. The details of this integration will depend on the evolution of Web technology but we expect to continue the basic architecture built around Netscape's LiveConnect on the client and Java Servers. Note that TANGOsim can be viewed as "just integration technology" for collaboration systems "just" provide integration between the involved clients. |HH(FG(HH(d'`ࡱ; SummaryInformation( on ARMS. We have put together an interdisciplinary team involving the Networking and HPC communities. ARMS will be the centerpiece and viewed as enablTheory Center StaffTheory Center Staff'@DCM:@v@DCM:p ;@Microsoft Word 6.0.11;  U$l$m$|$.9.;.n.p.kAA=EWE*KKuUUcn o # $ +/_!"|}VW !!! !!!!!,! !!!!!!! !!!!0!!0!@!0!!!!!!!!!!!!!!!! h 4h' ! q"r")$R$S$T$U$j$k$l$m${$|$%%''))F,G,....:.;.o.p.0212<==iAjAkAAABB;E!!!!!!!!!!!,!,!,!!,!,!!!!!!!!!!!!!,!!!, !!!!! !!!!!!!!-;E>k>k>`=MTimes New Roman Symbol "MArial MTimes"1h6f6fp ;~9This proposal is a collaboration between Cornell, Houston and Syracuse who are currently collaborating on ARMS. We have put together an interdisciplinary team involving the Networking and HPC communities. ARMS will be the centerpiece and viewed as enablTheory Center StaffTheory Center Staffࡱ;