Given by Geoffrey C. Fox at PDPTA Las Vegas on June 28 1999. Foils prepared June 27 1999
Outside Index
Summary of Material
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
Outside Index Summary of Material
PDPTA `99 Las Vegas June 28 99 Geoffrey Fox |
NPAC |
Syracuse University |
gcf@npac.syr.edu |
http://www.javagrande.org |
http://www.npac.syr.edu/users/haupt/WebFlow/demo.html |
http://www.npac.syr.edu/users/gcf/pdptajune99/index.html |
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
Large Scale (termed Grande) computing can benefit from the rich set of commodity technologies being developed to support the "Internet"
|
As Grande computing is a small field ( 0.5 to 5% depending on what you count) compared to commodity computing but solving as difficult a problem, one can benefit by building Grande computing environments on a commodity base |
"Just" need to add "Grande Functionality"
|
There is a "philosophy/architecture" called building "Portals to X"
|
There are distributed object technologies to label register and look up objects
|
There is a language Java which is more productive than previous languages such as Fortran or C++
|
There is a data structure metalanguage called XML which allows universal object serialization and the generation of application specific data specifications |
There are a set of services that are either
|
NCSA biology Workbench http://biology.ncsa.uiuc.edu was one of first computational portals |
Gateway is one of most technically advanced computing portal projects with a specific chemistry instantiation
|
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
Portals to X are essentially the name for an Object Web system where it is designed to address a particular application X |
Portal to the world is http://www.yahoo.com/ or http://my.netscape.com/ |
Portal to latest news is http://www.cnn.com |
Portal to computational chemistry is http://www.osc.edu/~kenf/theGateway/PSEactivities/CCM.html |
Portal to stock trading is http://quote.yahoo.com/ |
http://www.ibm.com is the external portal to IBM for customers. There will also be an internal portal for IBM employees used to "run the company" |
Kodak is interested in portals to family memorabilia |
More generally a portal is a web entrance to a set of resources and consists of a mix of information, computer simulations and various services |
For businesses portals generalize the concept of a a company Intranet and encompass domain of IBM main frames, Lotus Notes etc. |
For computing, portals are called Problem Solving Environments |
Total Portal Revenue 1998: $4.4B and 2002: $14.9B with 36% CAGR |
http://www.sagemaker.com/company/lynch.htm |
We can identify a set of tools that enable the construction of portals |
These are roughly equivalent to the tools needed to build a general application based on "object web technologies" |
There is also an architecture (explained ad nauseam later) implying multi-tier systems with standard compliant interfaces
|
A common portal architecture means that portals can be conveniently linked together
|
So we can discuss some special portals
|
The latter include several projects aimed at harnessing the power of the web to do computing
|
NPAC has focussed on networks of web servers and these fit the portal model well |
However there is most computing power in collections of web clients |
A server accepts input and produces output
|
IIOP and HTTP are two common protocols (formats of control data) for inter program messages |
A Web browser (Netscape or Microsoft) can access any server at "the click of a button" with data from user refining action |
Similar to invoking a web page |
"CORBA" or "WIDL" (pure XML CGI specification) is just CGI done right ...... |
Object Broker |
Fortran Simulation Code on Sequential or |
Parallel Machine |
Convert Generic Run Request into Specific Request on Chosen Computer |
Fortran Program |
is an Important |
Type of Object |
It can be built up from |
smaller objects |
e.g. Matrix |
library could be an |
object |
But perhaps more interestingly computing portals involve building a web based problem solving environment to link together all the capabilities needed to compute |
run programs and access dynamically status of jobs and computers -- in particular allow a uniform interface to running a given job on one of many backend compute servers |
compile and debug programs |
link diverse data sources with computations run on multiple backend machines |
visualize results |
web-based help systems and collections of related scientific papers |
computational steering i.e. interacting with a job (change parameters) based on dynamic results such as visualized results |
See http://www.osc.edu/~kenf/theGateway/ and http://www-fp.mcs.anl.gov/~gregor/datorr/ |
Application Integration |
Visualization Server |
Seamless Access |
Collaboration |
Security Lookup |
Registration |
Agents/Brokers |
Backend Services |
Middleware |
Bunch of |
Web Servers |
and Object |
Brokers |
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
Builds on Web-based problem solving environments built for
|
Uses distributed object middleware built for Modeling and Simulation field |
OSC NPAC Nicholls project funded by ASC MSRC at Dayton
|
Core system in alpha test today and SC99 will have fully functional demonstration |
Built with multi tier architecture using secure CORBA at middle tier and sophisticated WebFlow Java front end allowing task composition |
All object interfaces defined in XML |
Web based Interface Supporting
|
WebFlow |
server |
WebFlow |
server |
WebFlow |
server |
EDYS |
CASC2D |
Data Retrieval |
High Performance SubSystem |
CASC2D |
proxy |
IIOP |
Web Browser |
Data Wizard |
WMS interface |
Toolbar |
HTTP |
WMS |
File Transfer |
File Transfer |
GLOBUS |
Internet |
WebFlow modules |
(back-end) |
WebFlow |
middle-tier |
WebFlow applet |
(front-end) |
Ken Flurchick, http://www.osc.edu/~kenf/theGateway |
Gateway Computing Process |
1. Enter the Gateway system |
2. Define your problem |
3. Identify resources (software and hardware) |
4. Create input file |
5. Run your application |
6. Analyze results |
Database |
Matrix Solver |
Optimization Service |
MPP |
MPP |
Parallel DB Proxy |
NEOS Control Optimization |
Origin 2000 Proxy |
NetSolve Linear Alg. Server |
IBM SP2 Proxy |
Grid Gateway |
Supporting Seamless Interface |
Agent-based Choice of Compute Engine |
Multidisciplinary Control (WebFlow) |
Gateway Programming Environment |
Good Old Programming Tools |
Toolkit to build high level application specific web based problem solving environments
|
Simple strategy to convert existing codes into distributed objects that can be seamlessly launched |
Simple default job status displays |
Architecture naturally supports collaboration (using shared events and TangoInteractive) and visualization if structured as a multi tier service conforming to XML interfaces |
Compositional Programming of modules
|
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
Basic Vision: The current incoherent but highly creative Web will merge with distributed object technology in a multi-tier client-server-service architecture with Java based combined Web-ORB's |
Need to abstract entities (Web Pages, database entries, simulations) and services as objects with methods(interfaces)
|
COM(Microsoft) and CORBA(world) are competing cross platform and language object technologies
|
Javabeans plus RMI and JINI is 100% pure Java distributed object technology |
W3C says you should use XML which provides a convenient way to write IDL and define/serialize general object models |
How do we do this while technology is still changing rapidly! |
Need to use mix of approaches -- choosing what is good and what will last |
For example develop Web-based databases with Java objects using standard JDBC (Java Database Connectivity) interfaces
|
Use XML to record small databases in flat files |
Use XML to define all interfaces |
Use CORBA to wrap existing applications |
Use COM to access components of major PC applications such as Microsoft Excel and Word |
Use Java to build all Middleware |
Use Jini to implement dynamic registration of objects |
Use HTML and JavaScript to render everything |
Objects (at "logical backend") can be on client of course |
Front end can define a generic (proxy for a) object. The middle control tier brokers a particular instantiation |
Broker or Server |
XML |
Result |
XML Query |
Rendering Engine |
Browser |
Rendering Engine |
HTML |
Universal Interfaces |
IDL or Templates |
XML Request for service |
followed by return of XML result |
1)Rendering of (Multiple) Objects 2)Proxy to some backend capability used to render |
input and output to and |
from service |
Database |
MPP |
Telescope |
File System |
1)Server acts as a broker |
and control layer |
2)Same software as client |
but higher performance |
multi-user |
3)Again service represented |
as proxy used as a token for |
control logic |
Services with |
specialized software |
and capabilities |
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
Use of Java for: |
High Performance Network Computing |
Scientific and Engineering Computation |
(Distributed) Modeling and Simulation |
Parallel and Distributed Computing |
Data Intensive Computing |
Communication and Computing Intensive Commercial and Academic Applications |
HPCC Computational Grids ....... |
Immersive Applications driven by new high performance Java Chip |
Very difficult to find a "conventional name" that doesn't get misunderstood by some community! |
Bill Joy |
These exist from both a computer science and user point of view |
Grande applications are very complex but field is small (1% or so of total computing world)
|
The field needs Java as it provides a wonderful distributed computing software infrastructure on which to build applications and tools
|
Not clear that Java needs the field and so Grande field needs to be humble and persuasive in its requests |
Currently the Grande field is intrigued but skeptical due to poor Java performance |
Java community doing battle in commercially critical areas |
Need to bring communities together |
The Java Language has several good design features
|
Java with 1.7 Million developers is currently 2X as productive as C++ |
There will be more Java than C++ programmers in 2002 and goal is 10X productivity gain by 2005 |
Java has a very good set of libraries covering everything from commerce, multimedia, images to math functions (under development at http://math.nist.gov/javanumerics) |
Java has best available electronic and paper training resources |
Children will learn Java (and other POW technologies) as it is a social language with natural graphical "hello world" |
Java is rapidly getting best integrated program development environments |
Java naturally integrated with network and universal machine supports potentially powerful "write once-run anywhere" model |
Can we exploit this in Grande Applications? We MUST try!!! |
So most existing Grande codes are written in Fortran or C with a clearly unattractive and comparatively unproductive programming environment |
These current languages and tools are sufficient but does not seem likely that can build much better environments around them
|
Five years ago, it looked as though C++ could become language of choice (perhaps with Fortran as inner core) but this appears stalled
|
Set of Workshops with increasing interest
|
Topics include compilation issues; applications; algorithms (math libraries); benchmarking; Java based programming environments(visualization, problem solving environments, MPI); parallel computing and largest set of papers are in distributed systems |
Other major conferences with a Java Grande component include HPCN and IPPS (April 99), Mannheim and ICS 99 Rhodes Greece (June 99) |
ISCOPE99 in December 99 will have a Java Grande component to enhance interaction between Grande community and mainstream Java and C++ world |
We set up Java Grande Forum to act as a focus for Grande community activities and coordinate the (feeble 1%) voice into mainstream! This will now have an official European branch |
Java has potential to be a better environment for "Grande application development" than any previous languages such as Fortran and C++ |
The Forum Goal is to develop community consensus and recommendations for either changes to Java or establishment of standards (frameworks) for "Grande" libraries and services |
These Language changes or frameworks are designed to realize "best ever Grande programming environment" |
First Meeting Mar 1 Palo Alto at Java 98 -- 200 Attendees set Agenda -- 30 permanent people and further meetings May 9-10, Aug 6-7 |
Public Discussion SC98 Orlando November 13 (3 hour panel with some 250 attendees) where we released our first report (54 pages on web site)
|
http://www.javagrande.org |
1) Most important in the near term -- encourage Sun to make a few key changes in Java to allow it to be a complete efficient Grande Programming Language
|
2) As a community, recognize that sometimes standards are more appropriate than creativity and pool results of experiments to produce a Java Grande framework covering libraries and computer access
|
1) requires us to work with the computing mainstream -- 2) is internal to community |
Two major working groups promoting standards and community actions |
Numerics: Java as a language for mathematics led by Ron Boisvert and Roldan Pozo from NIST
|
So Java not only will run anywhere but can be expected to get same answers everywhere
|
Natural tension between performance (both in terms of speed and precision) and reproducibility |
Java has particularly poor floating point performance due to
|
Solution requires "Change in Java Rules" and better compilers |
Java Grande Forum working with internal Sun staff on drafting set of proposals |
In June 99 ACM meeting Bill Joy of Sun and others endorse Java Grande process and predict that in two years Java will obtain competitive or better performance than C++ and Fortran |
Distributed and Parallel Computing Working Group led by Dennis Gannon and Denis Caromel (INRIA, France)
|
Development of Grande Application benchmarks
|
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
Core capabilities allow simple wrapping of existing codes |
Program objects is defined as an XML file e.g. my favorite physics simulation on my PC becomes: <program name="physicssimulation1"> <run domain="npac" machine="maryland" type="pc" os="nt" >c:\myprogs\a.out</run> <input button="prog1" toolbar="physics" type="htmlform" > <name>userinput</name> <field default="10" maximum="100000" >iterations</field> ............ </input> <output> ...</output> </program> |
Gateway generates a distributed object with
|
For this example (running a physics program), we could use a specific machine as defined on previous foil (the Windows NT PC maryland) or a generic machine <run domain="any" machine="any" type="pc" os="nt" > |
In this case, middle tier broker, would look in a database (XML file) to find what machines were available and either automatically or with user input choose one. |
Both Software and Hardware are defined in XML |
Note databases and an XML files are logically equivalent |
JDBC can access either Oracle (Microsoft) database or XML ASCII file |
More generally XML can be thought of as general object serialization
|
The XML File <machines domain="npac" type="pc" > <machine os="nt" cpu="pentium2" memory="128" >maryland</machine> <machine os="nt" cpu="pentium3" memory="256" >georgia</machine> <machine os="95" cpu="mmx" memory="128" >foxport1</machine> ..... </machines> <machines domain="cis" > <machine os="solaris" cpu="sparcXX" > top</machine> ..... </machines> |
is equivalent to database tables such as |
Every field has data of special significance -- for field xxxxxx, we imagine a group of standards realized in XML and used for exchange of information. We call this xxxxxxML
|
http://www.xml.com/xml/pub/submlist lists some standards currently proposed for XML |
The Portal for xxxxxx must support xxxxxxML |
For businesses, perhaps one needs special support for "excelML" "SAPML" (XML export format for EXCEL or SAP) as well as support for more general forms of information
|
This we define as a group of defined formats that support scientific data, note taking and sketches |
XSIL (Scientific data Interchange) defines metadata needed to specify scientific data files including high level parameters and methods needed to read data
|
VML is Vector Graphics Mark up Language |
DrawML is designed to support simple technical drawings (easier than VML but VML should be able to do this?) |
VRML (3D scenes) re-implemented in XML as X3D (http://www.vrml.org/news/pr990210-content.html) |
MathML Mathematical Expressions |
CML Support Chemistry -- not clear if adopted widely |
Presumably this allows Scientists to make notes and record thoughts in a way that it supports important scientific constructs |
At its simplest this is an authoring tool like Microsoft Word, PowerPoint or Framemaker
|
One useful utility would be a whiteboard that supported scientific notes using ScienceML |
Such a collaborative whiteboard (implemented in NPAC's Tango Interactive for instance) would be useful in research and teaching
|
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
Collaboration Service is an Object Sharing Service |
Assume researchers, teachers, students, engineers, shoppers, salespersons, families compute, teach, learn, collaborate, buy, sell, socialize via electronic versions of traditional human interactions combined with shared objects expressed in XML and rendered as web pages
|
Only shared event model (used in Tango) of sharing (collaboration) is capable of necessary efficiency and customization to each user |
Tango is a fully Web integrated |
system that can share general client side applications, web pages and output from databases and other servers |
As web-based naturally links Interactive (synchronous) and self-paced (asynchronous) collaborative models |
Runs on Windows 95,98,NT UNIX (Irix, Solaris, Linux) |
Supports multi-language shared server and client applications in Java, Javabeans, C++ and JavaScript (W3C DOM) |
Supplement existing very general tools (audio-video conferencing, chats, whiteboard etc.) by specialized applications to support particular collaborative activities such as: |
Distance Education and Training where we are teaching computational science at Jackson State and High School classes on Java |
Collaborative Visualization and Computing
|
Shared Client Side Java applets for "Rapid Prototyping" Phase of Computation |
Shared emacs and Telnet for collaborative programming |
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
SECIOP |
MPP |
Database |
Parallel I/O |
Front End Applet |
SECIOP |
Kerberos authentication |
and authorization |
Gatekeeper |
delegation |
HPCC resources ............... |
GSSAPI |
GSSAPI |
Layer 1: secure Web |
Layer 2: secure CORBA |
Layer 3: Secure access to resources |
Policies defined by resource owners |
Can use Public Key Infrastructure but for HPC Modernization |
use SecurID and Kerberized Workstation |
Globus |
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
So between XML/HTML document and backend Fortran is a bunch of servers linked by XML Messages |
Software of this glue (business logic) is built in Java |
Servers can either be commercial Web Servers (Apache with servlets), CORBA brokers or custom Servers
|
Java/CORBA/WIDL Wrapper |
Style Sheets and Page Design |
We can copy a much reviled model -- Microsoft Word or PowerPoint -- Problem Solving Environments for document preparation -- to get PSE for Computing |
XML Widgets are organized into Toolbars .... |
Computing abstracted as a set of hierarchical Toolbars Toolbars are defined in XML and rendered in HTML for user interface. XML interpreted on middle tier as some suitable service. |
Computing Toolbars include user profile, results, visualization (where "command" could be AVS), collaboration, programming model, HPF, Dataflow, resource specification, resource status, code (application specific) |
WebFlow server is given by a hierarchy of containers and components |
These are CORBA objects written in Java acting if necessary as proxies to backend resources |
WebFlow server hosts users and services |
Each user maintains a number of applications composed of custom modules and common services |
WebFlow supports both object based and dataflow computing model with visual interface at client and both tasks and their interrelationship defined in XML |
We can use similar approaches to consider both seamless interfaces and metacomputing (broadly defined as linkage of multiple jobs simultaneously to address single problem)
|
Seamless Interfaces and Metacomputing are both services of a computing portal with metacomputing naturally building on top of seamless interfaces
|
Relevant Applications are metaproblems with a mix of module and data parallelism |
Modules are decomposed into parts (data parallelism) and composed hierarchically into full applications.They can be the
|
Modules are "natural" message-parallel components of problem and tend to have less stringent latency and bandwidth requirements than those needed to link data-parallel components
|
Assume that primary goal of metacomputing system (portal programming environment) is to add to existing parallel computing environments, a higher level supporting module parallelism
|
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
JWORB - Java Web Object Request Broker - multi-protocol middleware network server (HTTP + IIOP + DCE RPC + RMI transport) |
Current prototype integrates HTTP and IIOP i.e. acts as Web Server and CORBA Broker
|
Next step: add DCE RPC support to include Microsoft COM |
JWORB - our trial implementation of Pragmatic Object Web |
First non DMSO implementation of RTI -- HLA (distributed event driven simulation) Runtime at 5% cost(!) |
HLA is object model for distributed simulation and RTI is distributed event simulation infrastructure developed by DMSO -- Defense Modeling and Simulation Office |
NPAC Implemented DMSO RTI as JWORB service with 2 major CORBA objects: RTI Ambassador and Federate Ambassador |
Offers natural "DMSO Portal" i.e. Web interfaces to HLA simulations via HTTP or IIOP channels
|
Attractive model for High Level Interface to Metacomputing as RTI provides a rich event based model for managing and controlling the coarse grain jobs present at middle tier |
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
Image Processing often involves linkage of many different |
components as is familiar from use of Khoros in many projects
|
From http://www.khoral.com/core.html |
Abstract and Motivation |
Portals to Computing and Business
|
Technologies of a Portal
|
Some Services of a Portal
|
Portal Programming |
Other Examples
|
How to get High Performance in a multi tier model |
mpiJava - Modeled after the C++ binding for MPI. Implementation through JNI wrappers to native MPI software. http://www.npac.syr.edu/projects/pcrc/HPJava/ |
JavaMPI - Automatic generation of wrappers to legacy MPI libraries. C-like implementation based on the JCI code generator. http://perun.hscs.wmin.ac.uk/JavaMPI/ |
MPIJ - Pure Java implementation of MPI closely based on the C++ binding. A large subset of MPI is implemented using native marshaling. http://ccc.cs.byu.edu/DOGMA/ |
Working on two MPI bindings for Java
|
Reports on Java Grande Web Page http://www.javagrande.org |
There are several forms of parallelism
|
In a Nutshell, Java is better than previous languages for a) and b) and no worse for c)
|
Thus "Java plus message passing" form of parallel computing is actually somewhat easier than in Fortran or C.
|
Coarse grain parallelism very natural in Java and we have discussed how to use this with RMI (see WebFlow example) |
"Data Parallel" languages features are NOT in Java and have to be added extending ideas from HPF and HPC++ etc
|
Java has built in "threads" and a given Java Program can run multiple threads at a time (see work of Gannon's group)
|
Backend Parallel Computing Nodes running Classic HPCC -- MPI becomes Globus to use distributed resources |
Middle Control Tier -- JWORB or equivalent POW server runs on all Nodes |
Use separation of control and data transfer |
to support RTI(IIOP) on control layer and MPI |
on fast transport layer simultaneously |
1)Simple Server Approach 2)Classic HPCC Approach |
Data and Control |
CFD |
Structures |
Data Only |
CFD Server |
Structures Server |
Control |
Only |
3)Hybrid Approach with control at server and |
data transfer at |
HPCC level |
Can switch at each mpi_init or at each MPI message |
http://www.sun.com/jini/ and also see very interesting Ninja project at UC Berkeley http://ninja.cs.berkeley.edu/ |
Jini is an innovative distributed computing capability with a classic 3 tier architecture with client, lookup service (called broker or server in other architectures) and service provider (the backend)
|
Jini enables services to be dynamically linked to users of services
|
Will carry us from collections of computers to enormous collections of devices -- the next Grande problem! |
Bill Joy |
Greg Papadopoulos |
is MPI Transport Layer |
Jini Lookup Service |
Jini PC Embryo |
Jini PC Embryo |
SPMD Program |
SPMD Program |
SPMD Program |
SPMD Program |
Jini PC Embryo |
Jini PC Embryo |
PC Control and Services |
RMI |
Middle Tier |
PC is Parallel Computing |
Gateway |
Don't need to rewrite existing codes in Java!
|
Develop XML standards for the data in your field
|
Conduct suitable experiments in using Java in complete Grande applications |
Make certain your interests are represented in Java Grande Forum |
Does this change your research agenda? (different types of compilers, service-based architectures, re-use commodity technologies ) |
Retrain your staff in Java, Web and distributed object technologies |
Put "High Performance Grande Forum compliant" Java support into your RFP's for hardware and software |