XML: Principles, Tools and Techniques
Oct. 02, 1997
Current Issue
Article Search
Previous Issues
Annotated XML 1.0
What is XML?
XML Resource Guide
The Standards List
Authoring Tools Guide
Content Mgmt. Tools
XML Events Calendar
Puzzlin' Evidence
XML Q&A
XML:Geek
Seybold Publications
W3 Journal: XML
XML Testbed
Who We Are
Our Mission
Become a Sponsor
Seybold Seminars
Web Review
Perl.com
WebCoder
WebFonts
W3 Journal
A Songline PACE
Production
WIDL
Application Integration with XML
Charles Allen
Abstract
The problem of direct access to Web data from within business
applications has until recently been largely ignored. The Web
Interface Definition Language (WIDL) is an application of the
Extensible Markup Language (XML) which allows the resources of the
World Wide Web to be described as functional interfaces that can be
accessed by remote systems over standard Web protocols. WIDL
provides a practical and cost-effective means for diverse systems to
be rapidly integrated across corporate intranets, extranets, and the
Internet.
Overview
The explosive growth of the World Wide Web is providing millions of
end-users access to ever-increasing volumes of information. The
resources of legacy systems, relational databases, and multi-tier
applications have all been made available to the Web browser, which
has been transformed from an occasionally informative accessory into
an essential business tool for organizations large and small.
While the Web has achieved the extraordinary feat of providing
ubiquitous accessibility to end-users, it has in many cases
reinforced manual inefficiencies in business processes as repetitive
tasks are required to transcribe or copy and paste data from browser
windows into desktop and corporate applications. This is as true of
Web data provided by remote business units and external (i.e.,
partner or supplier) organizations as it is of Web data accessible
from both public and subscription based Web sites.
Business units that have previously been unable to agree on
middleware and data interchange standards are (by default) agreeing
on HTTP and HTML as data communication and presentation standards.
Because of the overwhelming focus on the browser, almost all Web
applications require interaction with a human user. The problem of
direct access to Web data from within business applications has been
largely ignored, as has the possibility of using the Web as a
platform for automated information exchange between organizations.
The debut of XML is set to change all this, and in the process spark
a major Web revolution: Web Automation (see Figure 1).
Figure 1 The need for Web Automation
XML enables the creation of Web documents that preserve data
structure and include "machine-readable" hooks to enable intelligent
processing by client applications. It is not necessary, however, for
Web content to exist as XML in order for XML to be used today to
automate the Web. The use of XML to deliver metadata about existing
Web resources can provide sufficient information to empower
non-browser applications to automate interactions with Web servers.
XML metadata defining interfaces to Web-enabled applications can
provide the basis for a common API across legacy systems, databases,
and middleware infrastructures, effectively transforming the Web
from an access medium into an integration platform.
Web Automation
Imagine everything a browser can do: sign-on to a secure Web site;
query that site for data; download the results; upload a response.
Now imagine that your business applications can do the same thing,
automatically, without human intervention and without using a
browser. This is the power of Web Automation.
The benefits of Web Automation are numerous:
Competitive intelligence--aggregate product pricing data, news
reports
Application integration--leverage investments in Web data and
infrastructure
Implement robust ecommerce solutions without expense and
difficulty of EDI or CORBA
Realize a 100% Web-based alternative to EDI
Put Web site functionality in the heart of customers' and
suppliers' IT infrastructures
The incredible diversity of Web resources presents significant
challenges for the automation of arbitrary tasks on the Web.
A robust infrastructure for Web Automation needs to provide:
Full interaction with HTML forms
An ability to handle both HTTP Authentication and Cookies
Both on-demand and scheduled extraction of targeted Web data
Aggregation of data from a number of Web sources
Chaining of services across multiple Web sites
An ability to integrate easily with traditional application
development languages and environments
A framework for managing change in both the locations and
structures of Web documents
webMethods has defined the Web Interface Definition Language (WIDL)
as an application of XML to lay the foundation for Web Automation.
WIDL
The goal of the Web Interface Definition Language is to enable
automation of all interactions with HTML/XML documents and forms,
providing a general method of representing request/response
interactions over standard Web protocols, and allowing the Web to be
utilized as a universal integration platform.
Where XML supports the creation of Web content that preserves data
structure, and promises Web documents that are "machine-readable,"
WIDL is an application of XML that defines interfaces and services
within and across HTML, XML, and text documents. As shown in Figure
2, services defined by WIDL map existing Web content into program
variables, allowing the resources of the Web to be made available,
without modification, in formats well-suited to integration with
diverse business systems.
Figure 2 WIDL allows Web resources such as package tracking services
to ba accessed directly from business applications.
WIDL brings to the Web many of the features of IDL concepts that
have been implemented in distributed computing and transaction
processing platforms, including DCE and CORBA. A major part of the
value of DCE and CORBA is that they can define services offered by
applications in an abstract but highly usable fashion. WIDL
describes and automates interactions with services hosted by Web
servers on intranets, extranets and the Internet; it provides a
standard integration platform and a universal API for all
Web-enabled systems.
A service defined by WIDL is equivalent to a function call in
standard programming languages. At the highest level, WIDL files are
collections of services. WIDL defines the locations (URLs) of each
service, input parameters to be submitted (via GET or POST methods)
to each service, and output parameters to be returned by each
service.
WIDL provides the following features:
A browser is not required to drive Web applications
Service definitions are dynamically interpreted and can thus be
centrally managed
Client applications are insulated from changes in service
locations and data extraction methods
Developers are insulated from network programming concerns
Application resources can be integrated across firewalls and
proxies
WIDL can be used to describe interfaces and services for:
Static documents (HTML, XML, and plain text files)
HTML forms
URL directory structures
WIDL also has the ability to specify conditions for successful
processing and error messages to be returned to calling programs.
Conditions further enable services to be defined that span multiple
documents.
Applications of WIDL
The success of the Web has exposed the advantages of distributed
information systems to a global audience. Around the world, IT
organizations, regardless of industry, are searching for ways to
connect the Internet with new or existing applications, to use Web
technology to reduce development, deployment, and maintenance costs.
Using HTML, XML, and HTTP as corporate standards glue, application
integration requires only that target systems be Web-enabled. There
are hundreds of products in the market today which Web-enable
existing systems, from mainframes to client/server applications. The
use of standard Web technologies empowers various IT departments to
make independent technology selections. This has the effect of
lowering both the technical and "political" barriers that have
typically derailed cross-organizational integration projects.
The use of proprietary middleware infrastructures to integrate
applications requires not only that the same software product be
purchased by both organizations and successfully installed in both
target hardware environments, but also that both target applications
be tailored to support the middleware API. This type of investment
can be disastrous if one company spends six months designing a
CORBA-based business system only to discover that one of their
business units or business partners is unable to install CORBA
because it conflicts with their existing infrastructure. Conflicts
can arise because of hardware or software incompatibilities, or
simply because of difficulties in acquiring appropriate development
resources.
A number of analysts have already warned that proprietary ecommerce
platforms could lock suppliers into relationships by forcing them to
integrate their systems with one infrastructure for
business-to-business integration, making it costly for them to
switch to or integrate with other partners who have selected
alternate ecommerce platforms. Buyer-supplier integration issues
involve many-to-many relationships, and demand a standard platform
for functional integration and data exchange.
Here is a brief overview of the types of applications that WIDL
enables:
Manufacturers and distributors
Access supplier and competitor ecommerce systems automatically
to check pricing and availability
Load product data (spec sheets) from supplier Web sites
Place orders automatically (i.e., when inventory drops below
predetermined levels)
Integrate package tracking functionality for enhanced customer
service
Human resources
Automated update of new employee information into multiple
internal systems
Automated aggregation of benefits information from healthcare
and insurance providers
Governments
Kiosk systems that aggregate data and integrate services across
departments or state and local offices
Shipping and delivery services
Multi-carrier package tracking and shipment ordering
Access to currency rates, Customs regulations, etc.
Shipping companies were early leaders in bringing widely applicable
functionality to the Web. Web-based package tracking services
provide important logistics information to organizations large and
small.
Many organizations employ people for the sole purpose of manually
tracking packages to ensure customer satisfaction and to collect
refunds for packages that are delivered late. Integrating package
tracking functionality directly into warehouse management and
customer service systems is a huge benefit, boosting productivity
and enabling more efficient use of resources.
Using WIDL, the web-based package tracking services of numerous
shipping companies can be described as common application
interfaces, to be integrated with various internal systems. In
almost all cases, programmatic interfaces to different package
tracking services are identical, which means that WIDL can impose
consistency in the representation of functionality across systems.
Example 1 illustrates the use of WIDL to define a package tracking
service for Federal Express. Note that the WIDL specifies a
"Shipping" template. This indicates that there is a general class of
shipping services, and that this particular WIDL is one
implementation of the shipping interface.
Example 1 The WIDL Representation of a Package Tracking Service
The FedexShipping interface in Example 1 contains one service
(TrackPackage) which takes three input parameters (TrackingNum,
DestCountry, ShipDate) and returns three output parameters
(disposition, deliveredOn, deliveredTo). The WIDL definition
describing the TrackPackage service is stored in an ASCII file,
which is utilized by client programs at runtime to determine both
the location of the service (URL) and the structure of documents
that contain the desired data. Client programs access WIDL
definitions from local files, naming services such as LDAP, HTTP
servers, or other URL access schemes (see Figure 3).
Figure 3 WIDL files can be centrally managed with a well known URL
or via a directory service such as LDAP. Unlike the way CORBA and
DCE IDL are normally used, WIDL is interpreted at runtime. As a
result, Service, Condition, and Variable definitions within WIDL
files can be administered without requiring modification of client
code. This usage model supports application-to-application linkages
that are far more robust and maintainable than if they were coded by
hand.
One of WIDL's most significant benefits is its ability to insulate
client programs from changes in the format and location of Web
documents. As long as the parameters of services do not change,
Service URLs, object references in variables, regions, and
conditions can all be modified without affecting applications that
utilize WIDL to access Web resources.
There are three models for WIDL management:
Client side--where WIDL files are colocated with a client
program
Naming service--where WIDL definitions are returned from
directory services, i.e., LDAP
Server side--where WIDL files are referenced by, colocated with,
or embedded within Web documents
WIDL does not require that existing Web resources be modified in any
way. Flexible management models allow organizations to describe and
integrate Web sites that are beyond their control, as well as to
provide their business partners with interfaces to services that are
controlled. The ability to seamlessly migrate from independent to
shared management eases the transition from informal to formal
business-to-business integration.
Elements of WIDL
The Web Interface Definition Language (WIDL) consists of six XML
tags:
defines an interface, which can contain multiple services
and bindings
defines a service, which consists of input and output
bindings
defines a binding, which specifies input and output
variables, as well as conditions for successful completion of a
service
defines input, output, and internal variables used
by a service to submit HTTP requests, and to extract data from
HTML/XML documents
defines success and failure conditions for the
binding of output variables; specifies error messages to be
returned upon service failure; enables alternate bindings
attempts and the chaining of services
defines a region within an HTML/XML document; useful
for extracting regular result sets which vary in size, such as
the output of a search engine, or news stories
The complete WIDL DTD is included in Appendix A. In the next
sections the attributes of each element of WIDL are presented and
discussed by way of example.
is the parent element for the Web Interface Definition
Language; it defines an interface. Interfaces are groupings of
related services and bindings. The following are attributes of the
element:
NAME
Required. Establishes a name for an interface. The interface
name is used in conjunction with a service name for naming or
directory services.
VERSION
Optional. Specifies the version of WIDL. webMethods first
implemented WIDL as HTML extensions. Experience with customers
since late 1996 resulted in WIDL 2.0, an application of XML that
is capable of automating complex interactions across multiple
Web servers.
TEMPLATE
Optional. WIDL enables common interfaces to services provided by
multiple sites. Templates allow the specification of interfaces,
implementations of which may be available from multiple sources.
A shipping template defines a functional interface for shipping
services; various implementations can be provided for
FederalExpress, UPS, and DHL.
BASEURL
Optional. BASEURL is similar to the statement in
HTML. Some of the services within a given WIDL may be hosted
from the same Base URL. If BASEURL is defined, the URL for
various services can be defined relative to BASEURL. This
feature is useful for replicated sites which can be addressed by
changing only the BASEURL, instead of the URL for each service.
OBJMODEL
Optional. Specifies an object model to be used for extracting
data elements from HTML and XML documents. Object models are the
result of parsing HTML or XML documents. The use of object
models is central to the functionality of WIDL. Object
references are used in , and
elements. For this reason, the object model will be briefly
discussed before proceeding with the description of the element
definitions that constitute WIDL.
Object model
Many of the features of WIDL require a capability to reliably
extract specific data elements from Web documents and map them to
output parameters.
Two candidate technologies for data extraction are pattern matching
and parsing. Pattern matching extracts data based on regular
expressions, and is well suited to raw text files and poorly
constructed HTML documents. There is a lot of bad HTML in the world!
Parsing, on the other hand, recovers document structure and exposes
relationships between document objects, enabling elements of a
document to be accessed with an object model.
Using an object model, an absolute reference to an element of an
HTML document might be specified:
doc.p[0].text
This reference would retrieve the text of the first paragraph of a
given document.
From both a development and an administrative point of view, pattern
matching is more labor intensive for establishing and maintaining
relationships between data elements and program variables. Regular
expressions are difficult to construct and prone to breakage as
document structures change. For instance, the addition of formatting
tags around data elements in HTML documents could easily derail the
search for a pattern. An object model, on the other hand, can see
through many such changes.
Patterns must also be carefully constructed to avoid unintentional
matching. In complex cases, patterns must be nested within patterns.
The process of mapping patterns to a number of output parameters can
easily become unmanageable.
It is possible to achieve the best of both worlds by using pattern
matching when necessary to match against the attributes of elements
accessible via an Object Model. Using a hybrid model of pattern
matching within parsed elements provides for the extraction of
target information from preformatted text regions or text files.
This reference would retrieve the text of the first paragraph that
contains 'Currency:' within a given document.
Various object models for working with HTML documents have been
specified. The W3C has established a working group to define a
standard Document Object Model (DOM). The WIDL specification allows
for multiple object models. In implementing WIDL, we discovered many
functional requirements not currently addressed by existing object
models. These requirements will be demonstrated in various examples
later in this article.
We now continue with a discussion of the attributes of the elements
of the WIDL.
The element describes a Web service, such as those
provided by CGI scripts, or via NSAPI, ISAPI, or other back-end Web
server programs. Services take a set of input parameters, perform
some processing, then return a dynamically generated HTML, XML or
text document.
The attributes of the element map an abstract service
name into a service's actual URL, specify the HTTP method to be used
to access the service, and designate "bindings" for input and output
parameters.
NAME
Required. Establishes a name for a service. The service name is
used in conjunction with an interface for naming or directory
services.
URL
Required. Specifies the Uniform Resource Locator for the target
document. A service URL can be either a fully qualified URL or a
partial URL that is relative to the BASEURL provided as an
attribute of the element.
METHOD
Required. Specifies the HTTP method (GET or POST) to be used to
access the service.
INPUT
Required. Designates the to be used to define the
input parameters for programs that call the service. The
specified name must be that of a contained within the
same as the service.
OUTPUT
Required. Designates the to be used to define the
output parameters for programs that call the service. The
specified name must be that of a contained within the
same as the service.
AUTHUSER
Optional. Establishes the username for HTTP authentication.
AUTHPASS
Optional. Establishes the password for HTTP authentication.
TIMEOUT
Optional. Amount of time before service times out.
RETRIES
Optional. Number of times to retry the service before failing.
Typically the username/password combination is set independent of
service definitions in WIDL. The AUTHUSER and AUTHPASS attributes
allow a username and password to be defined outside of a calling
program. This is useful in cases where multiple client programs use
the same service.
The element defines input and output variables for a
service. Input bindings describe the data provided to a Web
resource, and are analogous to the input fields in an HTML form. For
a static HTML document no input variables are required. Output
bindings describe which data elements are to be mapped from the
output document returned as a result of accessing the Web resource
with the given input variables. In most cases an output binding will
map only a subset of the available elements in the output document.
NAME
Required. Identifies the binding for reference by service
definitions and other binding definitions.
TYPE
Required. Specifies whether a binding defines input or output
parameters.
The element is used to describe both input and output
binding parameters; different attributes are used depending on the
type of parameter being described.
Common attributes are:
NAME
Required. Identifies the variable to calling programs.
VALUE
Optional. Designates a value to be assigned to the variable in
HTTP transactions. For input variables this has the effect of
rendering the variable invisible to calling programs; i.e., the
specified value is submitted to the Web service without
requiring an input from calling programs. For output variables
this has the effect of hard-coding the value returned when the
service is invoked.
USAGE
Optional. The default usage of variables is for specification of
input and output parameters. Variables can also be used
internally within WIDL, as well as to pass header information
(i.e., USER-AGENT or REFERER) in an HTTP request. The USAGE
attribute will be explored in Examples 2 and 3, which follow
this element overview.
TYPE
Required. Specifies both the data type and dimension of the
variable.
The following attributes are specific to input variables:
FORMNAME
Optional. Specifies the variable name to be submitted via GET or
POST methods. Obscure back-end variables can be given names that
are more meaningful in the context of the service described by
WIDL. Used in conjunction with WIDL Templates, FORMNAME permits
the mapping of a single variable name across multiple service
implementations. In the package tracking service in Example 1,
the FORMNAME differs from the variable name. It is also possible
to set FORMNAME="" to pass only the variable's value to the
back-end program.
OPTIONS
Optional. Captures the options of list boxes, check boxes, and
radio buttons. Useful for validating inputs prior to submitting
input parameters to a service and for transforming input
criteria into formats acceptable to back-end programs. For
example, an options list could be used to translate a meaningful
input of "full" to the "f" acceptable to a back-end program.
The following attributes are specific to output variables:
REFERENCE
Optional. Specifies an object reference to extract data from the
HTML, XML, or text document returned as the result of a service
invocation.
MASK
Optional. Masks permit the use of pattern matching and token
collecting to easily strip away unwanted labels and other text
surrounding target data items.
NULLOK
Optional. Overrides the implicit condition that all output
variables return a non-null value.
Apart from the "default" behavior of variables defined in input
bindings, there are two other usage models supported by WIDL:
"internal" and "header." The USAGE attribute can define service
inputs in place of or in addition to those required by a Web
service's HTML form.
Internal variables enable variable substitution within input and
output bindings. For instance, using internal variables, a portion
of a service's URL or a pattern for matching within an object
reference can be specified as a variable that is part of an input
binding.
Header variables allow HTTP header information to be included as
part of a service request. This is useful in many situations,
including the passing of referrer information where required by
back-end systems.
In Example 2, an auto loan service is defined for a site that uses a
directory structure to organize loan information for various states.
Rather than using CGI scripts to access a database of high, low, and
average loan rates, unique URLs which contain a state abbreviation
as part of target document names are linked from a pick list. The
use of internal variables enables the parameterization of a portion
of the URL. In this fashion, WIDL is able to define an input binding
even though no HTML forms are present to query the user for
information. The input binding specifies a variable "state" that is
referenced in the URL attribute of the service definition as
%state%. At runtime the value passed into the "state" variable is
used to complete the service URL.
Example 2 Using Internal Variables to Parameterize Directory
Structures
Because the AutoLoan service uses a variable to complete the URL to
access a static document, an invalid input parameter results in an
invalid URL. The statement in the output binding traps
the document not found condition and returns a sensible error
message to client programs.
Internal variables can also be used within object references that
use pattern matching to index into the object tree.
Example 3 uses the currency exchange service provided by the Federal
Reserve Bank to illustrate the use of internal variables to
interactively query a single static document.
Example 3 Using Internal Variables to Input Criteria in Object
References
In this example currency rates for a number of countries are
provided in a single document. The object reference for the 'rate'
variable in the output binding uses an internal variable 'Currency'
as part of the pattern that is matched to discover the current
exchange rate.
The object reference used in this example also demonstrates two
additional text manipulation features of the object model developed
by webMethods. The .line[] construct allows access to individual
lines of both preformatted text and text that has been formatted
with the
line-break element. This greatly simplifies pattern
matching expressions within object references.
The Federal Reserve Currency Exchange service returns rate
information in a column from character position 53 to character
position 65. This range of characters is specified by qualifying the
.text[53-65] attribute of the line matching the input criteria.
The element is used in output bindings to specify
success and failure conditions for the extraction of data to be
returned to calling programs. Conditions enable branching logic
within service definitions; they are used to attempt alternate
bindings when initial bindings fail and to initiate service chains,
whereby the output variables from one service are passed into the
input bindings of a second service. Conditions also define error
messages returned to calling programs when services fail.
TYPE
Required. Specifies whether a condition is checking for the
"Success" or the "Failure" of a binding attempt.
Any variable that returns a NULL value will cause the entire
binding to fail, unless the NULLOK attribute of that variable
has been set to true. Conditions can catch the success or
failure of either a specific object reference or of an entire
binding. In the case where a condition initiates a service
chain, it is important that all variables bind properly.
REFERENCE
Optional. Specifies an object reference which extracts data from
the HTML or XML document returned as the result of a service
invocation. The REFERENCE attribute for conditions is equivalent
to the REFERENCE attribute used in variable definitions.
MATCH
Required. Specifies a text pattern that will be compared with
the object property referenced by the REFERENCE attribute.
REBIND
Optional. Specifies an alternate output binding. Typically a
failure condition indicates that the document returned cannot be
bound properly. REBIND redirects the binding attempt. This is
useful in situations where the documents returned by a service
are dependent upon the input criteria that was submitted. For
example, a retail Web site may return a different document
structure for an SKU depending on whether the item requested is
a shirt, a tie, or trousers. The use of REBIND allows a
conditions to determine the appropriate binding for extracting
the desired data.
SERVICE
Optional. Specifies a service to invoke with the results of an
output binding. Aside from the obvious benefit of chaining
services to further automate the tasks that can be encapsulated
for client programs, there are many cases when target documents
can only be retrieved after visiting several Web pages in
succession. In some instances cookies are issues by an entry
page that must be visited prior to interacting with HTML forms,
in others URLs are dynamically generated from databases for
specific user identities.
REASONTEXT
Optional. The text to be returned as an error message when a
service fails.
REASONREF
Optional. Reference to an element's attribute to be returned as
an error message when a service fails.
WAIT
Optional. Amount of time to wait before re-trying retrieval of a
document after a server has returned a 'service busy' error.
RETRIES
Optional. Number of times to retry the service before failing.
Example 4 illustrates the use of conditions to specify alternate
bindings. Alternate bindings can be used when documents returned by
services are dependent upon the inputs submitted to the service. In
some rare cases, such as the StockMarketInfo service defined in this
example, a service occasionally returns different document formats
for no apparent reason. Conditions and rebinding handle any such
situations.
Example 4 Conditions Initiate Alternate Attempts for Extracting
Output Values
Example 5 illustrates the use of conditions to specify a service
chain. Service chains pass the name-value pairs of an output binding
into the input binding of the service specified by a
statement. Any name-value pairs matching the variables of the
chained service's input binding will be used as input parameters. In
this example, the productSearch service returns a URL when it
successfully finds a product matching the search criteria. The
success condition on the ProductSearchOutput binding causes the
ExtractPrices service to be called. Because the output binding of
productSearch matches the input binding of ExtractPrices, the
variables are passed from one service into the other.
Example 5 Conditions Initiate Service Chains
It is important to note that the ExtractPrices service can be called
independent of the productSearch service, and that the ExtractPrices
service specifies productURL as an internal variable. The output
variables from the productSearch service are not available to the
ExtractPrices service except in the case where they have been passed
via an input binding.
Service chains make it possible to interact with "shopping cart"
services, where multiple service calls are required to add items,
followed by a service call to submit an order.
The element is used in output bindings to define targeted
subregions of a document. This is useful in services that return
variable arrays of information in structures that can be located
between well known elements of a page.
Regions are critical for poorly designed documents where it is
otherwise impossible to differentiate between desired data elements
(for instance, story links on a news page) and elements that also
match the search criteria.
NAME
Required. Specifies the name for a region. This name can then be
used as the root of an object reference. For instance, a region
named foo can be used in object references such as:
foo.p[0].text
START
Required. An object reference that determines the beginning of a
region.
END
Required. An object reference that determines the end of a
region.
Example 6 demonstrates the use of regions in a news service, where
the number of news stories varies day to day. Regions permit the
extraction of data elements relative to other features of a
document. The tops region begins with a text object that matches the
pattern 'Last Updated' and ends with an object that matches 'For
more*'.
Example 6 Regions Permit the Extraction of Data Elements
Variable references into the tops region collect arrays of anchors
and anchor text, regardless of the fact that the sizes of the arrays
change throughout the day. The object references within tops are
vastly simplified by the processing already provided by the region
definition:
tops.a[].text
tops.a[].href
It is also worth noting that the news service in Example 6 has no
input binding. Input bindings are not required for service
definitions.
Object References
The default object model used by WIDL provides object references for
accessing elements and properties of HTML and XML documents. This
model is based on the JavaScript page object model, but without the
JavaScript method definitions.
Using the default object model, all elements of HTML and XML
documents can be addressed in the following ways:
By name, if the target element has a non-empty name attribute.
For example, the value of an HTML element can be
referenced:
doc.foo.value
By absolute indexing, where each array of elements has a zero-based
integer index, i.e.:
doc.headings[0].text
doc.p[1].text
By relative indexing, which directs the binding algorithm to search
the VALUE attributes of each element in the array, until a match is
found. The match must be complete, which requires the use of
wildcard metacharacters for partial string matches. Note that the
search will return the first matching element, if any:
doc.tr['*pattern*'].td[1].text
By region indexing, which directs the binding algorithm to search
only within a region of a document:
myregion.a[2].href
By attribute matching, which directs the binding algorithm to search
an object's attributes until a match is found. Attribute matching is
done with parenthesis instead of square brackets:
doc.a(name='foo').href
The following properties are available for all objects:
.text/.txt
Returns the text of a container
.value/.val
Returns the value of a container
.source/.src
Returns the source of a container
.index/.idx
Returns the index of a container
.reference/.ref
Returns the fully qualified object reference
Attributes of HTML containers take precedence over properties, which
have alternate accessors.
.text/.txt and .value/.val are equivalent except when a document
element has an identically named attribute.
Putting WIDL to Work
WIDL files can be hand-coded or developed interactively with command
line or graphical tools, which provide aids for determining object
references used in , , and
declarations.
Once a WIDL file has been created, its use depends upon the
implementation of products that can process and understand WIDL
services. A Web integration platform based on WIDL needs to provide:
A mechanism for retrieving WIDL files, either from a local file
system, a directory service such as LDAP, or a URL
An HTML and XML parser, and text pattern matching capabilities,
providing an object model for accessing elements of Web
documents
HTTP and HTTPS support, to initiate requests and receive Web
documents
Apart from these requirements, a WIDL processor could be delivered
as a Java class or a Windows DLL, for integration directly with
client applications, or as a standalone server with middleware
interfaces, allowing thin-client access to Web automation
functionality.
Generating Code
The primary purpose of WIDL is integration with corporate business
applications. In much the same way that DCE or CORBA IDL is used to
generate code fragments, or "stubs," to be included in development
projects, WIDL provides the necessary ingredients for generating
Java, JavaScript, C/C++, and even Visual Basic client code.
webMethods has developed a suite of Web Automation products for the
development and management of WIDL files, as well as the generation
of client code from WIDL files. Client stubs, which we
affectionately call "Weblets," present developers with local
function calls, and encapsulate all the methods required to invoke a
service that has been defined by a WIDL file.
Example 7 Java Stub
import watt.api.*;
public class TrackPackage extends Object
{
public String TrackingNum;
public String disposition;
public String deliveredOn;
public String deliveredTo;
public TrackPackage(String TrackingNum)
throws IOException, WattException, WattServiceException
{
String args[][] = {
{"TrackingNum", TrackingNum},
{"DestCountry", DestCountry},
{"ShipDate", ShipDate}
};
Context c = new Context();
c.loadDocument("Shipping.widl");
Result r = c.invokeService("FedexShipping",
"TrackPackage", args);
disposition = r.getVariable("disposition");
deliveredOn = r.getVariable("deliveredOn");
deliveredTo = r.getVariable("deliveredTo");
}
}
Example 7 features a Java class generated from the package tracking
WIDL presented earlier in Example 1. This class demonstrates the
following methods that are part of the API that webMethods has
developed for processing WIDL:
Context
loadDocument
invokeService
getVariable
After declaring the variables that will be used by the
PackageTracking class, a handle c to a new Context of the webMethods
Web automation runtime is created. All API calls are then made
against this handle.
loadDocument loads and parses the specified WIDL file, in this case
Shipping.widl. Loading the WIDL defines the services of the Shipping
interface to the runtime. invokeService actually submits the input
parameters to the TrackPackage service, which makes the appropriate
HTTP request and returns either a result set which contains the
bound output variables or an error message specified by a
statement within the definition. getVariable
is then used to extract the values of the output variables and to
assign them to class variables.
Within the Java application, the package tracking service looks like
a simple instantiation of the TrackPackage class:
TrackPackage p = new TrackPackage("12345678");
In short, an application makes a call to a local function that has
been generated by WIDL. The local function encapsulates the API
calls to the WIDL processor. The WIDL processor:
Loads the WIDL file from a local or remote file system
Passes the function's input parameters as an HTTP request
Parses the retrieved document to extract target data items
Executes any conditional logic for error checking or service
chaining
Returns the extrated data into the output parameters of the
calling function
Generated Java classes can be incorporated in standalone Java
applications, Java Applets, JavaScript routines, or server-side Java
"Servlets." Generated C/C++ encapsulating Web services can be
deployed as DLLs, shared libraries, or standalone executables.
webMethods implementation, the Web Automation Platform, provides
Java classes, a shared library, a Windows DLL and an Active/X
control to support Visual Basic modules which can be embedded in
spreadsheets and other Microsoft Office applications.
Conclusion
Web technology is strong on interactivity but low on automation. The
primary applications of the Web, including Push and Agent
technologies, are almost exclusively focused on end users. Data that
is being made available in HTML format is effectively inaccessible
to business applications other than the Web browser.
On corporate intranets and extranets, the Web browser has enabled
access to business systems, but has in many cases reinforced manual
inefficiencies as data must be transcribed from browser windows into
other application interfaces.
Electronic commerce on the Web is typically driven manually via a
browser. In order to achieve business-to-business integration,
organizations have resorted to proprietary protocols. The
many-to-many nature of Web commerce demands a standard for automated
integration.
Interactions normally performed manually in a browser, such as
entering information into an HTML form, submitting the form, and
retrieving HTML documents, can be automated by capturing details
such as input parameters, service URLs, and data extraction methods
for output parameters. Mechanisms for condition processing can also
be provided to enable robust error handling.
The Web Interface Definition Language (WIDL) is an application of
the Extensible Markup Language (XML), which allows the resources of
the World Wide Web to be described as functional interfaces that can
be accessed by remote systems over standard Web protocols. WIDL
transforms the Web into a standards-based integration platform,
providing a practical and cost-effective infrastructure for
business-to-business electronic commerce over Web.
Appendix A
Example 8 shows the WIDL DTD in its entirety.
Example 8 The WIDL DTD
<
<
< <
<
<
<
About the Author
Charles Allen
3975 University Drive
Suite 360
Fairfax, VA 22030
(703) 352-8345
charles@webMethods.com
Charles Allen is VP of Product Management for webMethods, Inc., the
leading provider of Web Automation and integration solutions for the
Global 2000. Prior to joining webMethods, Mr. Allen was a founding
member of Open Environment Corporation. Most recently he was
responsible for technology acquisitions and joint ventures in the
Asia/Pacific region. An inveterate communicator, Mr. Allen has
presented extensively on the Web and distributed systems technology
at events around the world.
Copyright İ 1998 Seybold Publications and O'Reilly & Associates,
Inc.
XML is a trademark of MIT and a product of the World Wide Web
Consortium.