Evaluating New Transparent Persistence Commodity Models:
JDBC, CORBA PSS, OLEDB and W3C WOM for HPC T&E Databases
G.C.
Fox, W. Furmanski and T. Pulikal
Northeast
Parallel Architectures Center, Syracuse University, Syracuse NY 13244-4100
gcf@npac.syr.edu,
furm@npac.syr.edu
Abstract
We analyze here the standard candidates for the universal
(storage medium- and vendor-indepdendent) persistency frameworks as proposed by
the leading alternative technologies for distributed objects: Java, CORBA, COM
and WOM. We point out that the consensus in this field is yet to be reached and
we present our Pragmatic Object Web approach to coordinate and integrate complementary technologies. We illustrate it
on a few practical examples of Web linked collaboratory database environments
we constructed recently for various communities and application domains such as
telemedicine, distance education, interactive FMS training, data mining of
T&E data from the Virtual Proving Ground. Finally, we summarize lessons
learned and we outline our recommendations for the HPC T&E database
approach.
Introduction
The distributed and voluminous data produced during various testing and evaluation procedures needs to be stored and managed in an efficient, reliable and scalable manner. The data produced could be on diverse platforms, stored in various formats, and accessed using different methods. Update rates can widely vary, ranging from static, read-only historical databases accumulated in a data warehouse to dynamic ‘datafoam’ acting as transient database storage for real-time interactive simulations. Front-end requirements for advanced T&E applications such as Virtual Prototyping are also diverse and typically more demanding and graphics / visualization oriented than the traditional client-server transaction processing applications.
Recently, a
significant progress has been made by the Web / Commodity Computing in the area
of Web linked databases. The current
systems evolve from the client-server towards the 3-tier Object Web [1] models with the distributed object / componentware
middleware responsible for the ‘business / application logic’ inserted between
the Web browser based interactive front-ends and the backend databases. Such 3-tier organization provides us also
with a natural software bus framework to plug-and-play specialized high
performance computing modules.
We are currently exploring
various strategies for integrating the storage and management of large,
heterogeneous and distributed datastores, in a simple, location- and datastore-
transparent way. For this purpose, we are analyzing, evaluating and comparing
the following four major technologies competing as candidates for distributed
object/componentware computing standards: Java by Sun Microsystems, CORBA by
Object Management Group, COM by Microsoft and XML/RDF/DOM (sometimes referred
to as WOM [2]) by the World-Wide-Web
Consortium.
In our Pragmatic Object Web [3] approach at NPAC we adopt the integrative methodology i.e. we setup a multiple-standards based framework in which the best assets of various approaches accumulate and cooperate rather than competing. We implement it by building a Java server (JWORB) [4] which handles multiple network protocols and includes currently support both for HTTP (Web) and IIOP (CORBA), and soon also for DCE RPC (DCOM) and RMI (Pure Java). A mesh of collaborating JWORB servers forms a Pragmatic Object Web middleware and it connects naturally to Web browsers in the front-end and to legacy systems such as databases or HPC modules in the backend, using one of the supported standard protocols.
In a set of PET FMS tasks within the DoD High Performance Modernization Program, we are adapting our generic POW technologies such as JWORB to the DoD needs in the form of an emergent WebHLA framework that we believe is capable to integrate High Performance Modeling and Simulation with the advanced computing requirements of the Test and Evaluation community. Our four papers submitted to this conference address various aspects of WebHLA such as: Web/Commodity based front-ends [5]; JWORB based RTI software bus in the middleware [6]; POW based database backends (this paper); and WebFlow based visual authoring and integration across all tiers [7].
In this paper, we focus on the backend database layer and we first summarize and evaluate transparent persistence models as currently offered by Java , CORBA, COM and WOM. Next, we illustrate our current approach in the context of a few selected collaborative database applications such as CareWeb (telemedicine), Language Connect University (distance education), FMS Training Space (interactive training), and we also report on our early experiments in Data Mining for a Virtual Proving Ground database. Finally, we summarize the current status in the field and our suggested strategy towards FMS and T&E integration within the WebHLA framework.
We summarize here the
ongoing activities within the Java, CORBA, COM and WOM communities in the area
of universal persistence frameworks, i.e. abstract data models that would span
multiple vendors and various storage media such as relational or object
databases and flat file systems.
JDBC
and JavaBlend JavaSoft’s Java Database Connectivity (JDBC) is
a standard SQL database access interface for accessing a wide range of
relational databases from Java programs. It encapsulates the various DBMS
vendor proprietary protocols and database operations and enables applications
to use a single high level API for homogenous data access. JDBC API defined as
a set of classes and interfaces supports multiple connections to different
databases simultaneously.
JDBC API mainly
consist of classes and interfaces representing database connections, SQL
statements, result sets, database metadata, driver management etc. The main
strength behind the JDBC API is its platform- and database-independence and
ease of use, combined with the powerful set of database capabilities to build
sophisticated database applications. The new JDBC specification (JDBC 2.0)
which was released recently adds more functionality like support for forward
and backward scrolling, batch updates and advanced data types like BLOB, and
Rowsets which are JavaBeans that could be used in any JavaBean component
development, etc. Other important features include support for connection
pooling, distributed transaction support and better support for storing Java
objects in the database.
Despite of the
simplicity of use and the wide acceptability, the JDBC API has its own
disadvantages. The API is primarily designed for relational database management
systems and thus is not ideal for use with object databases or other non-SQL
databases. Also, there are several problems due to various inconsistencies
present in the driver implementations currently available.
JavaBlend, a
high-level database development tool from JavaSoft that will be released this
year, enables enterprises to simplify database application development and
offers increased performance, sophisticated caching algorithms and query
processing to offload bottlenecked database servers. It is highly scalable
because it is multi-threaded and has built-in concurrency control mechanisms.
It provides a good object-relational mapping that fits best into the Java
object model.
UDA: OLEDB and ADO Universal Data Access (UDA) is Microsoft's strategy for
high performance data access to a variety of information sources ranging from
relational to object databases to flat file systems. UDA is based on open
industry standards collectively called the Microsoft Data Access Components.
OLEDB, which is the core of Universal Data Access strategy, defines a set of
COM interfaces for exposing, consuming and processing of data. OLEDB consists
of three types of components: data
providers that expose or contain data; data
consumers that use this data; and service
components that processes or transforms this data. The data provider is the
storage-specific layer that exposes the data present in various data stores
like relational DBMS, ORDBMS, flat files for use by the data consumer via the
universal (store-independent) API.
OLEDB
consists of several core COM such as Enumerators,
Data Sources, Commands and Rowsets.
Apart from these components, OLEDB also exposes interfaces for catalog
information or metadata information about the database and supports event
notifications by which consumers sharing a rowset could be notified of changes
at real time. Other features of OLEDB that will be added in future are
interfaces to support authentication, authorization and administration as well
as interfaces for distributed and transparent data access across threads,
process and machine boundaries.
While
OLEDB is Microsoft's system-level programming interface to diverse data
sources, ActiveX Data Objects (ADO) offers a popular, high / application-level
data consumer interface to diverse data. The main benefits of ADO are the ease
of use, high speed, low memory overhead, language independence and other
benefits that comes with the client side data caching and manipulation. Since
ADO is a high-level programming interface application developers need not be
concerned about memory management and other low-level operations. Some of the
main objects in ADO Object model are: Connection,
Command, Recordset, Property, Field.
CORBA Persistent State Service The initial CORBA standard that was accepted by
OMG in the persistent objects domain was the Persistent Object Services (POS).
The main goals for such a service were to support corporate centric datastores
and to provide a datastore independent and open architecture framework that
allows new datastore products to be plugged in at any time. POS consisted of
four main interfaces namely, the Persistent Object interface (PO) that the
clients would implement, Persistent Idinterface (PID) for identifying the PO
object, Persistent Object Manager interface (POM) that manages the POS objects
and the Persistent Data Service interface(PDS) which actually does the
communication with the datastore.
Although this
specification was adopted more than three years ago, it saw very little
implementations because of it's complexity and inconsistencies. The
specification also exposed persistence notion to CORBA clients which was not
desirable and the integration with other CORBA services were not well defined.
Thus OMG issued a new request for
proposal for a new specification, Persistent
Object State (PSS) that is much simpler to use and to implement and is
readily applicable to existing data stores.
The Persistent State
Service specification, currently still at the level of an evolving proposal to
OMG led by Iona / Orbix, uses the value
notation defined in the new Objects by Value specification for representing the
state of the mobile objects. The PSS provides a service to object implementors
which is transparent to a client. This specification focuses on how the CORBA
objects interact with the datastore through an internal interface not exposed
to the client. The persistent-values are implemented by application developers
and are specific to a datastore and can make use of the features of the
datastore. The specification also defines interfaces for
application-independent features like transaction management and association of
CORBA objects with persistent-values.
Web Object Model World-Wide
Web Consortium (W3C) develops a suite of new Web data representation and/or
description standards such as XML (eXtensible Markup Language), DOM (Document
Object Model) or RDF (Resource Description Framework). Each of these
technologies has a merit in its own but when combined they can be viewed
collectively as a new, very dynamic, flexible and powerful Web Object Model (WOM) [2].
XML is a subset of
SGML that acts as a metamodel for specialized markup languages i.e. it allows
to define new custom / domain specific tags and document templates or DTDs
(Document Type Definitions). Such DTDs provide a natural bridge between Web and
Object Technologies since XML documents can be now viewed as instances of the
associated DTD classes. DOM makes such analogy even more explicit by offering
an orthodox object-oriented API (specified as CORBA IDL) to XML documents.
Finally, RDF offers a metadata framework that allows to associate a set of
named properties and their values with a particular Web resource (URL). In the
WOM context, RDF is used to bind in a dynmaic and transient fashion the Web
Object methods located in some programming langauge files with the Web Object
states, specified in XML files.
Summary As seen from the above discussion, the
universal data models are just emerging. Even if most major RDBMS vendors are
OMG members, the consensus in the CORBA database community is yet to be reached.
WOM is a new, ’97 / ’98 concept and
several aspects of and relations between the WOM components listed above are
still in the design stage. However, given the ongoing explosion of the Web
technologies, one can expect the WOM approach to play a critical role in
shaping the univeral persistence frameworks for the Internet and Intranets. At
the moment, the single vendor models such as JDBC by Sun and OLEDB by Microsoft
are ahead the consortia frameworks and in fact the Microsoft UDA solution is
the most complete and advanced as of mid ’98.
Sample Database Applications on the Pragmatic Object Web
In parallel with tracking
the evolution of the universal data models, we are involved in building Web
linked database environments for various communities and user domains. We
illustrate here a few examples of such specific database applications.
Practical solutions and design decisions appear often as a result of compromise
between several factors such as price vs functionality tradeoffs for various
technologies, project specific requirements, customer preferences etc. We
indicate these tradeoffs when discussing the application examples below and
then we summarize in the next section the lessons learned so far and our approach
towards the HPC T&E database technology roadmap.
Careweb is a Web based collaborative environment
for school nurses with support for: a) student healthcare record database; b)
educational materials for nurses and parents; c) collaboration and interactive
consultation between nurses, nurse practitioners and pediatricians, including
both asynchronous (shared patient records) and synchronous (audio, video, chat,
whiteboard) tools. Early CareWeb prototype (Fig. 1) was developed at NPAC using
Oracle7 database, PL/SQL stored procedures based programming model, and VIC/VAT
collaboration tools. We found the
Oracle7 model useful but hardly portable to other vendor models, especially
after Oracle decided to integrate their Web and Database services.
New production version
of CareWeb (Fig. 2) under development
by NPAC spin-off Translet, Inc. is using exclusively Microsoft Web
technologies: Internet Explorer, Access/SQL Server databases, ASP/ADO based
programming model, and NetMeeting based collaboration tools. Although also
hardly portable beyond the Microsoft software domain, we found this solution
more practical from a small business perspective as Microsoft offers now indeed
a total and affordably priced solution prototype for all tiers.
Language Connect University
is another Web/Oracle service constructed by Translet Inc. for the distance
education community. Marketed by the Syracuse Language Systems as an Internet
extension of their successful CD-ROM based multimedia courses for several
foreign languages, LCU offers a spectrum of collaboratory and management tools
for students and faculty of a Virtual University. Tools include customized
email, student record management, interactive multimedia assignments such as
quizzes, tests and final exams, extensive query services, evaluation and
grading support, course management, virtual college administration and so on
(see Fig. 5)
We found the Oracle
based high end database solution for LCU as appropriate and satisfactory;
possible follow-on projects will likely continue and extend this model by
adding suitable CORBA PSS based middleware to assure compatibility with the
emergent public standards for distance education such as IMS by Educom.
FMS Training Space [8][9] is an ongoing FMS PET project at NPAC
within the DoD Modernization Program that develops Web based collaboratory
training for FMS technologies under development by the CHSSI program. We start
with the SPEEDES training which will be gradually extended towards other FMS
components such as Parallel CMS, E-ModSAF, HPC RTI, Parallel IMPORT and
TEMPO/Thema. FMS Training Space combines lessons learned in our previous Web
/Database projects such as CareWeb or LCU with our new WebHLA middleware based
on Object Web RTI. The result will be a dynamic interactive multiuser system,
with real-time synchronous simulation tools similar to online mupltiplayer
gaming systems, and with the asynchronous tools for database navigation in
domains such as software documentation, programming examples, virtual
programming laboratory etc. Selected screendumps from the preliminary version of
the FMS Training Space, including some elements of the HLA, SPEEDES, CMS and
ModSAF documentation databases and demonstrated at the DoD UGC 98 Conference [8],
are shown in Fig 3,4. Object Web RTI based realtime multiplayer simulation
support is being included in the FMS Training Space during the summer 98 and
will be demonstrated at the SIW Fall 98 Conference in Orlando [9].
Current version of the
FMS Training Space is using Microsoft Web technologies: Internet Information
Server, Active Server Pages, Visual Basic Script, Internet Explorer, ActiveX
and Java applet plug-ins, Front Page
Web Authoring and Access Database. This approach facilitates rapid prototyping
in terms of the integrated web/commodity tools from Microsoft and we intend to
extend it in the next stage by adding the corresponding support for UNIX, Java,
Oracle and Netscape users and technologies.
:
Data Mining for Virtual Proving Ground we
are working with the Virtual Proving Ground team at ARL/ATC in Aberdeen to help
with the data mining services for the T&E historical test data. In
particular, we received recently from the VPG a MS Access database with a
vehicle engineering and testing information and we analyzed it using some
selected Web and data Mining technologies.
The VPG data was made accessible over the Web using Active Server pages for ease of access across workstations and network and for use with future distributed datamining applications. An SQL Query tool was written on top of this data which can be used to run simple queries and analyze the results. The second step was to use simple classification algorithm, C4.5, for initial Data Mining experiments. We choose one of the attributes from one the tables as our target class, dividing the values into two classes Major and Minor. We used a small subset of attributes that we thought would affect the classification process namely the subsystem, course condition and course type for our analysis. The public domain tool that we used was See5 from RuleQuest, which implements the C5.0 KDD algorithm, a successor to Quinlan’s C4.5 decision tree algorithm. Training data and test data were randomly selected with the help of the query tool. The See5 tool was used to run the algorithm over the training data to generate a decision tree (see Fig. 6) and ruleset with an error rate of 3.8%. On the test cases the error rate was found to be 12%, which indicates the abnormalities in the training set selection and the decision tree generation. We are in the process of refining this to get a lower error rate and to generate a better decision tree.
Fig 5: Sample screendump from the Language Connect
University for distance education: Message Center for Collaboratory E-Mail
between students and teachers Fig 6:
Decision tree generated for the Virtual Proving Ground data using the
See5.0 classification algorithm
7.
References