XML: Principles, Tools and Techniques 
                        Oct. 02, 1997
                
    
            
              Current Issue
              Article Search
              Previous Issues
            
              Annotated XML 1.0
              What is XML?
            
              XML Resource Guide
              The Standards List
              Authoring Tools Guide
              Content Mgmt. Tools
              XML Events Calendar
            
              Puzzlin' Evidence
              XML Q&A
              XML:Geek
            
              Seybold Publications
              W3 Journal: XML
            
            
              
              XML Testbed
            
              Who We Are
              Our Mission
              Become a Sponsor
             
                
              Seybold Seminars
              Web Review
              Perl.com
              WebCoder
              WebFonts
              W3 Journal
            
            A Songline PACE
            Production 
             
            
     
            
            WIDL
            Application Integration with XML
            Charles Allen
            Abstract
            The problem of direct access to Web data from within business 
            applications has until recently been largely ignored. The Web 
            Interface Definition Language (WIDL) is an application of the 
            Extensible Markup Language (XML) which allows the resources of the 
            World Wide Web to be described as functional interfaces that can be 
            accessed by remote systems over standard Web protocols. WIDL 
            provides a practical and cost-effective means for diverse systems to 
            be rapidly integrated across corporate intranets, extranets, and the 
            Internet.
            Overview
            The explosive growth of the World Wide Web is providing millions of 
            end-users access to ever-increasing volumes of information. The 
            resources of legacy systems, relational databases, and multi-tier 
            applications have all been made available to the Web browser, which 
            has been transformed from an occasionally informative accessory into 
            an essential business tool for organizations large and small.
            While the Web has achieved the extraordinary feat of providing 
            ubiquitous accessibility to end-users, it has in many cases 
            reinforced manual inefficiencies in business processes as repetitive 
            tasks are required to transcribe or copy and paste data from browser 
            windows into desktop and corporate applications. This is as true of 
            Web data provided by remote business units and external (i.e., 
            partner or supplier) organizations as it is of Web data accessible 
            from both public and subscription based Web sites.
            Business units that have previously been unable to agree on 
            middleware and data interchange standards are (by default) agreeing 
            on HTTP and HTML as data communication and presentation standards. 
            Because of the overwhelming focus on the browser, almost all Web 
            applications require interaction with a human user. The problem of 
            direct access to Web data from within business applications has been 
            largely ignored, as has the possibility of using the Web as a 
            platform for automated information exchange between organizations. 
            The debut of XML is set to change all this, and in the process spark 
            a major Web revolution: Web Automation (see Figure 1).
            
            Figure 1 The need for Web Automation
            XML enables the creation of Web documents that preserve data 
            structure and include "machine-readable" hooks to enable intelligent 
            processing by client applications. It is not necessary, however, for 
            Web content to exist as XML in order for XML to be used today to 
            automate the Web. The use of XML to deliver metadata about existing 
            Web resources can provide sufficient information to empower 
            non-browser applications to automate interactions with Web servers.
            XML metadata defining interfaces to Web-enabled applications can 
            provide the basis for a common API across legacy systems, databases, 
            and middleware infrastructures, effectively transforming the Web 
            from an access medium into an integration platform.
            Web Automation
            Imagine everything a browser can do: sign-on to a secure Web site; 
            query that site for data; download the results; upload a response. 
            Now imagine that your business applications can do the same thing, 
            automatically, without human intervention and without using a 
            browser. This is the power of Web Automation.
            The benefits of Web Automation are numerous: 
            Competitive intelligence--aggregate product pricing data, news 
                reports
                Application integration--leverage investments in Web data and 
                infrastructure
                Implement robust ecommerce solutions without expense and 
                difficulty of EDI or CORBA
                Realize a 100% Web-based alternative to EDI
                Put Web site functionality in the heart of customers' and 
                suppliers' IT infrastructures
                The incredible diversity of Web resources presents significant 
            challenges for the automation of arbitrary tasks on the Web.
            A robust infrastructure for Web Automation needs to provide:
            Full interaction with HTML forms
                An ability to handle both HTTP Authentication and Cookies
                Both on-demand and scheduled extraction of targeted Web data
                Aggregation of data from a number of Web sources
                Chaining of services across multiple Web sites
                An ability to integrate easily with traditional application 
                development languages and environments
                A framework for managing change in both the locations and 
                structures of Web documents
                webMethods has defined the Web Interface Definition Language (WIDL) 
            as an application of XML to lay the foundation for Web Automation.
            WIDL
            The goal of the Web Interface Definition Language is to enable 
            automation of all interactions with HTML/XML documents and forms, 
            providing a general method of representing request/response 
            interactions over standard Web protocols, and allowing the Web to be 
            utilized as a universal integration platform.
            Where XML supports the creation of Web content that preserves data 
            structure, and promises Web documents that are "machine-readable," 
            WIDL is an application of XML that defines interfaces and services 
            within and across HTML, XML, and text documents. As shown in Figure 
            2, services defined by WIDL map existing Web content into program 
            variables, allowing the resources of the Web to be made available, 
            without modification, in formats well-suited to integration with 
            diverse business systems.
            
            Figure 2 WIDL allows Web resources such as package tracking services 
            to ba accessed directly from business applications.
            WIDL brings to the Web many of the features of IDL concepts that 
            have been implemented in distributed computing and transaction 
            processing platforms, including DCE and CORBA. A major part of the 
            value of DCE and CORBA is that they can define services offered by 
            applications in an abstract but highly usable fashion. WIDL 
            describes and automates interactions with services hosted by Web 
            servers on intranets, extranets and the Internet; it provides a 
            standard integration platform and a universal API for all 
            Web-enabled systems.
            A service defined by WIDL is equivalent to a function call in 
            standard programming languages. At the highest level, WIDL files are 
            collections of services. WIDL defines the locations (URLs) of each 
            service, input parameters to be submitted (via GET or POST methods) 
            to each service, and output parameters to be returned by each 
            service.
            WIDL provides the following features: 
            A browser is not required to drive Web applications
                Service definitions are dynamically interpreted and can thus be 
                centrally managed
                Client applications are insulated from changes in service 
                locations and data extraction methods
                Developers are insulated from network programming concerns
                Application resources can be integrated across firewalls and 
                proxies 
                WIDL can be used to describe interfaces and services for: 
            Static documents (HTML, XML, and plain text files)
                HTML forms
                URL directory structures
                WIDL also has the ability to specify conditions for successful 
            processing and error messages to be returned to calling programs. 
            Conditions further enable services to be defined that span multiple 
            documents.
            Applications of WIDL
            The success of the Web has exposed the advantages of distributed 
            information systems to a global audience. Around the world, IT 
            organizations, regardless of industry, are searching for ways to 
            connect the Internet with new or existing applications, to use Web 
            technology to reduce development, deployment, and maintenance costs.
            Using HTML, XML, and HTTP as corporate standards glue, application 
            integration requires only that target systems be Web-enabled. There 
            are hundreds of products in the market today which Web-enable 
            existing systems, from mainframes to client/server applications. The 
            use of standard Web technologies empowers various IT departments to 
            make independent technology selections. This has the effect of 
            lowering both the technical and "political" barriers that have 
            typically derailed cross-organizational integration projects.
            The use of proprietary middleware infrastructures to integrate 
            applications requires not only that the same software product be 
            purchased by both organizations and successfully installed in both 
            target hardware environments, but also that both target applications 
            be tailored to support the middleware API. This type of investment 
            can be disastrous if one company spends six months designing a 
            CORBA-based business system only to discover that one of their 
            business units or business partners is unable to install CORBA 
            because it conflicts with their existing infrastructure. Conflicts 
            can arise because of hardware or software incompatibilities, or 
            simply because of difficulties in acquiring appropriate development 
            resources.
            A number of analysts have already warned that proprietary ecommerce 
            platforms could lock suppliers into relationships by forcing them to 
            integrate their systems with one infrastructure for 
            business-to-business integration, making it costly for them to 
            switch to or integrate with other partners who have selected 
            alternate ecommerce platforms. Buyer-supplier integration issues 
            involve many-to-many relationships, and demand a standard platform 
            for functional integration and data exchange.
            Here is a brief overview of the types of applications that WIDL 
            enables:
            Manufacturers and distributors 
            Access supplier and competitor ecommerce systems automatically 
                to check pricing and availability
                Load product data (spec sheets) from supplier Web sites 
                Place orders automatically (i.e., when inventory drops below 
                predetermined levels) 
                Integrate package tracking functionality for enhanced customer 
                service
                Human resources
            Automated update of new employee information into multiple 
                internal systems
                Automated aggregation of benefits information from healthcare 
                and insurance providers
                Governments
            Kiosk systems that aggregate data and integrate services across 
                departments or state and local offices
                Shipping and delivery services
            Multi-carrier package tracking and shipment ordering
                Access to currency rates, Customs regulations, etc.
                Shipping companies were early leaders in bringing widely applicable 
            functionality to the Web. Web-based package tracking services 
            provide important logistics information to organizations large and 
            small.
            Many organizations employ people for the sole purpose of manually 
            tracking packages to ensure customer satisfaction and to collect 
            refunds for packages that are delivered late. Integrating package 
            tracking functionality directly into warehouse management and 
            customer service systems is a huge benefit, boosting productivity 
            and enabling more efficient use of resources.
            Using WIDL, the web-based package tracking services of numerous 
            shipping companies can be described as common application 
            interfaces, to be integrated with various internal systems. In 
            almost all cases, programmatic interfaces to different package 
            tracking services are identical, which means that WIDL can impose 
            consistency in the representation of functionality across systems.
            Example 1 illustrates the use of WIDL to define a package tracking 
            service for Federal Express. Note that the WIDL specifies a 
            "Shipping" template. This indicates that there is a general class of 
            shipping services, and that this particular WIDL is one 
            implementation of the shipping interface.
            Example 1 The WIDL Representation of a Package Tracking Service 
            <WIDL NAME="FedexShipping" Template="Shipping" 
      BASEURL="http://www.fedex.com" VERSION="2.0"> 
 
<SERVICE NAME="TrackPackage" METHOD="GET"  
         URL="/cgi-bin/track_it" 
         INPUT="TrackInput" OUTPUT="TrackOutput" /> 
 
<BINDING NAME="TrackInput" TYPE="INPUT"> 
   <VARIABLE NAME="TrackingNum" TYPE="String" FORMNAME="trk_num" /> 
   <VARIABLE NAME="DestCountry" TYPE="String" FORMNAME="dest_cntry" /> 
   <VARIABLE NAME="ShipDate" TYPE="String" FORMNAME="ship_date" /> 
</BINDING> 
 
<BINDING NAME="TrackOutput" TYPE="OUTPUT"> 
   <CONDITION TYPE="FAILURE" REFERENCE="doc.title[0].text"  
              MATCH="FedEx Warning Form" 
              REASONREF="doc.p[0].text['&.*']" /> 
   <CONDITION TYPE="SUCCESS" REFERENCE="doc.title[0].text"  
              MATCH="FedEx Airbill:*"  
              REASONREF="doc.p[1].value" /> 
   <VARIABLE NAME="disposition" TYPE="String"  
             REFERENCE="doc.h[3].value" MASK="$*" /> 
   <VARIABLE NAME="deliveredOn" TYPE="String"  
             REFERENCE="doc.h[5].value" MASK="%%%$*" /> 
   <VARIABLE NAME="deliveredTo" TYPE="String"  
             REFERENCE="doc.h[7].value" MASK="*:" /> 
</BINDING> 
 
</WIDL> 
            The FedexShipping interface in Example 1 contains one service 
            (TrackPackage) which takes three input parameters (TrackingNum, 
            DestCountry, ShipDate) and returns three output parameters 
            (disposition, deliveredOn, deliveredTo). The WIDL definition 
            describing the TrackPackage service is stored in an ASCII file, 
            which is utilized by client programs at runtime to determine both 
            the location of the service (URL) and the structure of documents 
            that contain the desired data. Client programs access WIDL 
            definitions from local files, naming services such as LDAP, HTTP 
            servers, or other URL access schemes (see Figure 3).
            
            Figure 3 WIDL files can be centrally managed with a well known URL 
            or via a directory service such as LDAP. Unlike the way CORBA and 
            DCE IDL are normally used, WIDL is interpreted at runtime. As a 
            result, Service, Condition, and Variable definitions within WIDL 
            files can be administered without requiring modification of client 
            code. This usage model supports application-to-application linkages 
            that are far more robust and maintainable than if they were coded by 
            hand.
            One of WIDL's most significant benefits is its ability to insulate 
            client programs from changes in the format and location of Web 
            documents. As long as the parameters of services do not change, 
            Service URLs, object references in variables, regions, and 
            conditions can all be modified without affecting applications that 
            utilize WIDL to access Web resources.
            There are three models for WIDL management:
            Client side--where WIDL files are colocated with a client 
                program
                Naming service--where WIDL definitions are returned from 
                directory services, i.e., LDAP
                Server side--where WIDL files are referenced by, colocated with, 
                or embedded within Web documents
                WIDL does not require that existing Web resources be modified in any 
            way. Flexible management models allow organizations to describe and 
            integrate Web sites that are beyond their control, as well as to 
            provide their business partners with interfaces to services that are 
            controlled. The ability to seamlessly migrate from independent to 
            shared management eases the transition from informal to formal 
            business-to-business integration. 
            Elements of WIDL
            The Web Interface Definition Language (WIDL) consists of six XML 
            tags: 
            <WIDL> defines an interface, which can contain multiple services 
                and bindings
                <SERVICE/> defines a service, which consists of input and output 
                bindings
                <BINDING> defines a binding, which specifies input and output 
                variables, as well as conditions for successful completion of a 
                service
                <VARIABLE/> defines input, output, and internal variables used 
                by a service to submit HTTP requests, and to extract data from 
                HTML/XML documents
                <CONDITION/> defines success and failure conditions for the 
                binding of output variables; specifies error messages to be 
                returned upon service failure; enables alternate bindings 
                attempts and the chaining of services
                <REGION/> defines a region within an HTML/XML document; useful 
                for extracting regular result sets which vary in size, such as 
                the output of a search engine, or news stories
                The complete WIDL DTD is included in Appendix A. In the next 
            sections the attributes of each element of WIDL are presented and 
            discussed by way of example.
            <WIDL>
            <WIDL> is the parent element for the Web Interface Definition 
            Language; it defines an interface. Interfaces are groupings of 
            related services and bindings. The following are attributes of the 
            <WIDL> element:
            NAME
            Required. Establishes a name for an interface. The interface 
                name is used in conjunction with a service name for naming or 
                directory services.
                VERSION
            Optional. Specifies the version of WIDL. webMethods first 
                implemented WIDL as HTML extensions. Experience with customers 
                since late 1996 resulted in WIDL 2.0, an application of XML that 
                is capable of automating complex interactions across multiple 
                Web servers.
                TEMPLATE
            Optional. WIDL enables common interfaces to services provided by 
                multiple sites. Templates allow the specification of interfaces, 
                implementations of which may be available from multiple sources. 
                A shipping template defines a functional interface for shipping 
                services; various implementations can be provided for 
                FederalExpress, UPS, and DHL.
                BASEURL
            Optional. BASEURL is similar to the <BASE HREF=""> statement in 
                HTML. Some of the services within a given WIDL may be hosted 
                from the same Base URL. If BASEURL is defined, the URL for 
                various services can be defined relative to BASEURL. This 
                feature is useful for replicated sites which can be addressed by 
                changing only the BASEURL, instead of the URL for each service.
                OBJMODEL
            Optional. Specifies an object model to be used for extracting 
                data elements from HTML and XML documents. Object models are the 
                result of parsing HTML or XML documents. The use of object 
                models is central to the functionality of WIDL. Object 
                references are used in <VARIABLE/>, <CONDITION/> and <REGION/> 
                elements. For this reason, the object model will be briefly 
                discussed before proceeding with the description of the element 
                definitions that constitute WIDL.
                Object model
            Many of the features of WIDL require a capability to reliably 
            extract specific data elements from Web documents and map them to 
            output parameters.
            Two candidate technologies for data extraction are pattern matching 
            and parsing. Pattern matching extracts data based on regular 
            expressions, and is well suited to raw text files and poorly 
            constructed HTML documents. There is a lot of bad HTML in the world! 
            Parsing, on the other hand, recovers document structure and exposes 
            relationships between document objects, enabling elements of a 
            document to be accessed with an object model.
            Using an object model, an absolute reference to an element of an 
            HTML document might be specified:
            doc.p[0].text
            This reference would retrieve the text of the first paragraph of a 
            given document.
            From both a development and an administrative point of view, pattern 
            matching is more labor intensive for establishing and maintaining 
            relationships between data elements and program variables. Regular 
            expressions are difficult to construct and prone to breakage as 
            document structures change. For instance, the addition of formatting 
            tags around data elements in HTML documents could easily derail the 
            search for a pattern. An object model, on the other hand, can see 
            through many such changes.
            Patterns must also be carefully constructed to avoid unintentional 
            matching. In complex cases, patterns must be nested within patterns. 
            The process of mapping patterns to a number of output parameters can 
            easily become unmanageable.
            It is possible to achieve the best of both worlds by using pattern 
            matching when necessary to match against the attributes of elements 
            accessible via an Object Model. Using a hybrid model of pattern 
            matching within parsed elements provides for the extraction of 
            target information from preformatted text regions or text files.
            This reference would retrieve the text of the first paragraph that 
            contains 'Currency:' within a given document.
            Various object models for working with HTML documents have been 
            specified. The W3C has established a working group to define a 
            standard Document Object Model (DOM). The WIDL specification allows 
            for multiple object models. In implementing WIDL, we discovered many 
            functional requirements not currently addressed by existing object 
            models. These requirements will be demonstrated in various examples 
            later in this article.
            We now continue with a discussion of the attributes of the elements 
            of the WIDL.
            <SERVICE/>
            The <SERVICE/> element describes a Web service, such as those 
            provided by CGI scripts, or via NSAPI, ISAPI, or other back-end Web 
            server programs. Services take a set of input parameters, perform 
            some processing, then return a dynamically generated HTML, XML or 
            text document.
            The attributes of the <SERVICE/> element map an abstract service 
            name into a service's actual URL, specify the HTTP method to be used 
            to access the service, and designate "bindings" for input and output 
            parameters.
            NAME
            Required. Establishes a name for a service. The service name is 
                used in conjunction with an interface for naming or directory 
                services.
                URL
            Required. Specifies the Uniform Resource Locator for the target 
                document. A service URL can be either a fully qualified URL or a 
                partial URL that is relative to the BASEURL provided as an 
                attribute of the <WIDL> element.
                METHOD
            Required. Specifies the HTTP method (GET or POST) to be used to 
                access the service.
                INPUT
            Required. Designates the <BINDING> to be used to define the 
                input parameters for programs that call the service. The 
                specified name must be that of a <BINDING> contained within the 
                same <WIDL> as the service.
                OUTPUT
            Required. Designates the <BINDING> to be used to define the 
                output parameters for programs that call the service. The 
                specified name must be that of a <BINDING> contained within the 
                same <WIDL> as the service.
                AUTHUSER
            Optional. Establishes the username for HTTP authentication.
                AUTHPASS
            Optional. Establishes the password for HTTP authentication.
                TIMEOUT
            Optional. Amount of time before service times out.
                RETRIES
            Optional. Number of times to retry the service before failing.
                Typically the username/password combination is set independent of 
            service definitions in WIDL. The AUTHUSER and AUTHPASS attributes 
            allow a username and password to be defined outside of a calling 
            program. This is useful in cases where multiple client programs use 
            the same service.
            <BINDING>
            The <BINDING> element defines input and output variables for a 
            service. Input bindings describe the data provided to a Web 
            resource, and are analogous to the input fields in an HTML form. For 
            a static HTML document no input variables are required. Output 
            bindings describe which data elements are to be mapped from the 
            output document returned as a result of accessing the Web resource 
            with the given input variables. In most cases an output binding will 
            map only a subset of the available elements in the output document.
            NAME
            Required. Identifies the binding for reference by service 
                definitions and other binding definitions.
                TYPE
            Required. Specifies whether a binding defines input or output 
                parameters.
                <VARIABLE/>
            The <VARIABLE/> element is used to describe both input and output 
            binding parameters; different attributes are used depending on the 
            type of parameter being described.
            Common attributes are:
            NAME
            Required. Identifies the variable to calling programs.
                VALUE
            Optional. Designates a value to be assigned to the variable in 
                HTTP transactions. For input variables this has the effect of 
                rendering the variable invisible to calling programs; i.e., the 
                specified value is submitted to the Web service without 
                requiring an input from calling programs. For output variables 
                this has the effect of hard-coding the value returned when the 
                service is invoked.
                USAGE
            Optional. The default usage of variables is for specification of 
                input and output parameters. Variables can also be used 
                internally within WIDL, as well as to pass header information 
                (i.e., USER-AGENT or REFERER) in an HTTP request. The USAGE 
                attribute will be explored in Examples 2 and 3, which follow 
                this <VARIABLE/> element overview.
                TYPE
            Required. Specifies both the data type and dimension of the 
                variable.
                The following attributes are specific to input variables:
            FORMNAME
            Optional. Specifies the variable name to be submitted via GET or 
                POST methods. Obscure back-end variables can be given names that 
                are more meaningful in the context of the service described by 
                WIDL. Used in conjunction with WIDL Templates, FORMNAME permits 
                the mapping of a single variable name across multiple service 
                implementations. In the package tracking service in Example 1, 
                the FORMNAME differs from the variable name. It is also possible 
                to set FORMNAME="" to pass only the variable's value to the 
                back-end program.
                OPTIONS
            Optional. Captures the options of list boxes, check boxes, and 
                radio buttons. Useful for validating inputs prior to submitting 
                input parameters to a service and for transforming input 
                criteria into formats acceptable to back-end programs. For 
                example, an options list could be used to translate a meaningful 
                input of "full" to the "f" acceptable to a back-end program.
                The following attributes are specific to output variables:
            REFERENCE
            Optional. Specifies an object reference to extract data from the 
                HTML, XML, or text document returned as the result of a service 
                invocation.
                MASK
            Optional. Masks permit the use of pattern matching and token 
                collecting to easily strip away unwanted labels and other text 
                surrounding target data items.
                NULLOK
            Optional. Overrides the implicit condition that all output 
                variables return a non-null value.
                Apart from the "default" behavior of variables defined in input 
            bindings, there are two other usage models supported by WIDL: 
            "internal" and "header." The USAGE attribute can define service 
            inputs in place of or in addition to those required by a Web 
            service's HTML form.
            Internal variables enable variable substitution within input and 
            output bindings. For instance, using internal variables, a portion 
            of a service's URL or a pattern for matching within an object 
            reference can be specified as a variable that is part of an input 
            binding.
            Header variables allow HTTP header information to be included as 
            part of a service request. This is useful in many situations, 
            including the passing of referrer information where required by 
            back-end systems.
            In Example 2, an auto loan service is defined for a site that uses a 
            directory structure to organize loan information for various states. 
            Rather than using CGI scripts to access a database of high, low, and 
            average loan rates, unique URLs which contain a state abbreviation 
            as part of target document names are linked from a pick list. The 
            use of internal variables enables the parameterization of a portion 
            of the URL. In this fashion, WIDL is able to define an input binding 
            even though no HTML forms are present to query the user for 
            information. The input binding specifies a variable "state" that is 
            referenced in the URL attribute of the service definition as 
            %state%. At runtime the value passed into the "state" variable is 
            used to complete the service URL.
            Example 2 Using Internal Variables to Parameterize Directory 
            Structures
            
            <WIDL NAME=autoLoan VERSION=2.0>   
 
<SERVICE NAME=AutoLoan METHOD=GET   
         URL="http://www.bankrate.com/autobytel/abt%state%a.htm"   
         INPUT="AutoLoanInput" OUTPUT="AutoLoanOutput" />   
 
<BINDING NAME=AutoLoanInput TYPE=INPUT>   
   <Variable NAME=state TYPE=String FORMNAME="state" USAGE="INTERNAL" />   
</BINDING>   
 
<BINDING NAME="AutoLoanOutput" TYPE="OUTPUT">   
   <CONDITION TYPE="Failure" REASONTEXT="State not found" />   
   <VARIABLE NAME="state" TYPE="String"    
             REFERENCE="doc.table[4].tr[1].th[0].text" />   
   <VARIABLE NAME="avgNew" TYPE="String"   
             REFERENCE="doc.table[4].tr[2].td[1].text" />   
   <VARIABLE NAME="highNew" TYPE="String"   
             REFERENCE="doc.table[4].tr[2].td[2].text" />   
   <VARIABLE NAME="lowNew" TYPE="String"   
             REFERENCE="doc.table[4].tr[2].td[3].text" />   
   <VARIABLE NAME="avgUsed" TYPE="String"   
             REFERENCE="doc.table[4].tr[3].td[1].text" />   
   <VARIABLE NAME="highUsed" TYPE="String"   
             REFERENCE="doc.table[4].tr[3].td[2].text" />   
   <VARIABLE NAME="lowUsed" TYPE="String"   
             REFERENCE="doc.table[4].tr[3].td[3].text" />   
</BINDING>   
 
</WIDL>   
            Because the AutoLoan service uses a variable to complete the URL to 
            access a static document, an invalid input parameter results in an 
            invalid URL. The <CONDITION/> statement in the output binding traps 
            the document not found condition and returns a sensible error 
            message to client programs.
            Internal variables can also be used within object references that 
            use pattern matching to index into the object tree.
            Example 3 uses the currency exchange service provided by the Federal 
            Reserve Bank to illustrate the use of internal variables to 
            interactively query a single static document. 
            Example 3 Using Internal Variables to Input Criteria in Object 
            References
            
            <WIDL NAME="FederalReserve" TEMPLATE="Currency"   
      BASEURL="http://www.ny.frb.org/" VERSION="2.0">   
 
<SERVICE NAME="ExchangeRate" METHOD="GET"   
         URL="/pihome/mktrates/forex12.shtml"   
         INPUT="currencyInput" OUTPUT="currencyOutput" />   
 
<BINDING NAME="currencyInput" TYPE="INPUT">   
   <VARIABLE NAME="Currency" TYPE="String"    
             FORMNAME="CURRENCY" USAGE="INTERNAL" />   
</BINDING>   
 
<BINDING NAME="currencyOutput" TYPE="OUTPUT">   
   <CONDITION TYPE="FAILURE" REASONTEXT="Currency not found" />   
   <VARIABLE NAME="rate" TYPE="String"   
             REFERENCE="doc.pre[0].line['*%Currency%*'].text[53-65]" />   
</BINDING>   
 
</WIDL>   
            In this example currency rates for a number of countries are 
            provided in a single document. The object reference for the 'rate' 
            variable in the output binding uses an internal variable 'Currency' 
            as part of the pattern that is matched to discover the current 
            exchange rate.
            The object reference used in this example also demonstrates two 
            additional text manipulation features of the object model developed 
            by webMethods. The .line[] construct allows access to individual 
            lines of both preformatted text and text that has been formatted 
            with the <br> line-break element. This greatly simplifies pattern 
            matching expressions within object references.
            The Federal Reserve Currency Exchange service returns rate 
            information in a column from character position 53 to character 
            position 65. This range of characters is specified by qualifying the 
            .text[53-65] attribute of the line matching the input criteria.
            <CONDITION/>
            The <CONDITION/> element is used in output bindings to specify 
            success and failure conditions for the extraction of data to be 
            returned to calling programs. Conditions enable branching logic 
            within service definitions; they are used to attempt alternate 
            bindings when initial bindings fail and to initiate service chains, 
            whereby the output variables from one service are passed into the 
            input bindings of a second service. Conditions also define error 
            messages returned to calling programs when services fail.
            TYPE
            Required. Specifies whether a condition is checking for the 
                "Success" or the "Failure" of a binding attempt.
                Any variable that returns a NULL value will cause the entire 
                binding to fail, unless the NULLOK attribute of that variable 
                has been set to true. Conditions can catch the success or 
                failure of either a specific object reference or of an entire 
                binding. In the case where a condition initiates a service 
                chain, it is important that all variables bind properly.
                REFERENCE
            Optional. Specifies an object reference which extracts data from 
                the HTML or XML document returned as the result of a service 
                invocation. The REFERENCE attribute for conditions is equivalent 
                to the REFERENCE attribute used in variable definitions.
                MATCH
            Required. Specifies a text pattern that will be compared with 
                the object property referenced by the REFERENCE attribute.
                REBIND
            Optional. Specifies an alternate output binding. Typically a 
                failure condition indicates that the document returned cannot be 
                bound properly. REBIND redirects the binding attempt. This is 
                useful in situations where the documents returned by a service 
                are dependent upon the input criteria that was submitted. For 
                example, a retail Web site may return a different document 
                structure for an SKU depending on whether the item requested is 
                a shirt, a tie, or trousers. The use of REBIND allows a 
                conditions to determine the appropriate binding for extracting 
                the desired data.
                SERVICE
            Optional. Specifies a service to invoke with the results of an 
                output binding. Aside from the obvious benefit of chaining 
                services to further automate the tasks that can be encapsulated 
                for client programs, there are many cases when target documents 
                can only be retrieved after visiting several Web pages in 
                succession. In some instances cookies are issues by an entry 
                page that must be visited prior to interacting with HTML forms, 
                in others URLs are dynamically generated from databases for 
                specific user identities.
                REASONTEXT
            Optional. The text to be returned as an error message when a 
                service fails.
                REASONREF
            Optional. Reference to an element's attribute to be returned as 
                an error message when a service fails.
                WAIT
            Optional. Amount of time to wait before re-trying retrieval of a 
                document after a server has returned a 'service busy' error.
                RETRIES
            Optional. Number of times to retry the service before failing.
                Example 4 illustrates the use of conditions to specify alternate 
            bindings. Alternate bindings can be used when documents returned by 
            services are dependent upon the inputs submitted to the service. In 
            some rare cases, such as the StockMarketInfo service defined in this 
            example, a service occasionally returns different document formats 
            for no apparent reason. Conditions and rebinding handle any such 
            situations.
            Example 4 Conditions Initiate Alternate Attempts for Extracting 
            Output Values
            
            <WIDL NAME="Yahoo" VERSION="2.0">   
 
<SERVICE NAME="StockMarketInfo" METHOD ="GET"   
         URL="http://quote.yahoo.com/" OUTPUT ="marketOut">   
 
<BINDING NAME="marketOut" TYPE="Output">   
   <CONDITION Type="Failure" REBIND="marketOut2" />   
   <VARIABLE TYPE="String[][]" NAME="info"    
             REFERENCE="doc.table[0].tr[0].td[].text" />   
   <VARIABLE TYPE="String[]" NAME="links"    
             REFERENCE="doc.table[0].tr[0].a[].href" />   
</BINDING>   
 
<BINDING NAME="marketOut2" TYPE="Output">   
   <VARIABLE TYPE="String[][]" NAME="info"    
             REFERENCE="doc.table[1].tr[0].td[].td[].text" />   
   <VARIABLE TYPE="String[]" NAME="links"    
             REFERENCE="doc.table[1].tr[0].a[].href" />   
</BINDING>   
 
</WIDL> 
            Example 5 illustrates the use of conditions to specify a service 
            chain. Service chains pass the name-value pairs of an output binding 
            into the input binding of the service specified by a <CONDITION/> 
            statement. Any name-value pairs matching the variables of the 
            chained service's input binding will be used as input parameters. In 
            this example, the productSearch service returns a URL when it 
            successfully finds a product matching the search criteria. The 
            success condition on the ProductSearchOutput binding causes the 
            ExtractPrices service to be called. Because the output binding of 
            productSearch matches the input binding of ExtractPrices, the 
            variables are passed from one service into the other.
            Example 5 Conditions Initiate Service Chains
            
            <WIDL NAME="EddieBaeur" VERSION=2.0>   
 
<SERVICE NAME="ProductSearch" METHOD=GET   
          URL="http://www.ebauer.com/eb/ShopEB/prod_search_results.asp"   
          INPUT="productSearchInput" OUTPUT="productSearchOutput" />   
  
<BINDING NAME="productSearchInput" TYPE="INPUT">   
     <VARIABLE NAME="searchstring" FORMNAME="searchstring"   
  </BINDING>   
 
  <BINDING NAME="productSearchOutput" TYPE="OUTPUT">   
     <CONDITION TYPE="Failure" REFERENCE="doc.p['*Sorry*'].text"    
                MATCH="*Sorry*" REASONREF="doc.p['*Sorry*'].text" />   
     <CONDITION TYPE="Success" SERVICE="ExtractPrices" />   
     <VARIABLE NAME="productURL" TYPE="String"    
               REFERENCE="doc.table[0].tr[1].td[3].a[0].href" />   
  </BINDING>   
 
  <SERVICE NAME="ExtractPrices" METHOD=GET URL="%productUrl%"   
           INPUT="ExtractPricesInput" OUTPUT="ExtractPricesOutput" />   
 
  <BINDING NAME="ExtractPricesInput" TYPE="INPUT">   
     <VARIABLE NAME="productUrl" TYPE="String" USAGE="INTERNAL" />   
  </BINDING>   
 
  <BINDING NAME="ExtractPricesOutput" TYPE="OUTPUT">   
     <VARIABLE NAME="Price" TYPE="String"   
               REFERENCE="doc.table[1].strong[0].value['*$$']" />   
  </BINDING>   
 
  </WIDL>   
            It is important to note that the ExtractPrices service can be called 
            independent of the productSearch service, and that the ExtractPrices 
            service specifies productURL as an internal variable. The output 
            variables from the productSearch service are not available to the 
            ExtractPrices service except in the case where they have been passed 
            via an input binding. 
            Service chains make it possible to interact with "shopping cart" 
            services, where multiple service calls are required to add items, 
            followed by a service call to submit an order.
            <REGION/>
            The <REGION/> element is used in output bindings to define targeted 
            subregions of a document. This is useful in services that return 
            variable arrays of information in structures that can be located 
            between well known elements of a page. 
            Regions are critical for poorly designed documents where it is 
            otherwise impossible to differentiate between desired data elements 
            (for instance, story links on a news page) and elements that also 
            match the search criteria.
            NAME
            Required. Specifies the name for a region. This name can then be 
                used as the root of an object reference. For instance, a region 
                named foo can be used in object references such as:
                foo.p[0].text
            START
            Required. An object reference that determines the beginning of a 
                region.
                END
            Required. An object reference that determines the end of a 
                region.
                Example 6 demonstrates the use of regions in a news service, where 
            the number of news stories varies day to day. Regions permit the 
            extraction of data elements relative to other features of a 
            document. The tops region begins with a text object that matches the 
            pattern 'Last Updated' and ends with an object that matches 'For 
            more*'.
            Example 6 Regions Permit the Extraction of Data Elements
            
            <WIDL NAME="News" VERSION="2.0">   
 
<SERVICE NAME="Techweb" METHOD="GET"   
         URL="http://www.techweb.com/" OUTPUT="techwebOut">   
 
<BINDING NAME="techwebOut" TYPE="OUTPUT">   
   <REGION NAME="tops" START="doc.font['Last?Updated*']"    
                          END="doc.b['For?more*']" />   
   <VARIABLE NAME="service" TYPE="String" VALUE="TECHWEB Top Stories" />   
   <VARIABLE NAME="url" TYPE="String" REFERENCE="doc.url" />   
   <VARIABLE NAME=stories TYPE="String[]" REFERENCE="tops.a[].text" />   
   <VARIABLE NAME="links" TYPE="String[]" REFERENCE="tops.a[].href" />   
</BINDING>   
 
</WIDL>   
            Variable references into the tops region collect arrays of anchors 
            and anchor text, regardless of the fact that the sizes of the arrays 
            change throughout the day. The object references within tops are 
            vastly simplified by the processing already provided by the region 
            definition:
            tops.a[].text
tops.a[].href
            It is also worth noting that the news service in Example 6 has no 
            input binding. Input bindings are not required for service 
            definitions. 
            Object References
            The default object model used by WIDL provides object references for 
            accessing elements and properties of HTML and XML documents. This 
            model is based on the JavaScript page object model, but without the 
            JavaScript method definitions.
            Using the default object model, all elements of HTML and XML 
            documents can be addressed in the following ways:
            By name, if the target element has a non-empty name attribute. 
                For example, the value of an HTML element <a name="foo"> can be 
                referenced:
                doc.foo.value
            By absolute indexing, where each array of elements has a zero-based 
            integer index, i.e.:
            doc.headings[0].text       
doc.p[1].text
            By relative indexing, which directs the binding algorithm to search 
            the VALUE attributes of each element in the array, until a match is 
            found. The match must be complete, which requires the use of 
            wildcard metacharacters for partial string matches. Note that the 
            search will return the first matching element, if any:
            doc.tr['*pattern*'].td[1].text
            By region indexing, which directs the binding algorithm to search 
            only within a region of a document:
            myregion.a[2].href
            By attribute matching, which directs the binding algorithm to search 
            an object's attributes until a match is found. Attribute matching is 
            done with parenthesis instead of square brackets:
            doc.a(name='foo').href
            The following properties are available for all objects:
            .text/.txt
            Returns the text of a container
                .value/.val
            Returns the value of a container
                .source/.src
            Returns the source of a container 
                .index/.idx
            Returns the index of a container 
                .reference/.ref
            Returns the fully qualified object reference
                Attributes of HTML containers take precedence over properties, which 
            have alternate accessors.
            .text/.txt and .value/.val are equivalent except when a document 
            element has an identically named attribute. 
            Putting WIDL to Work
            WIDL files can be hand-coded or developed interactively with command 
            line or graphical tools, which provide aids for determining object 
            references used in <VARIABLE/>, <CONDITION/>, and <REGION/> 
            declarations. 
            Once a WIDL file has been created, its use depends upon the 
            implementation of products that can process and understand WIDL 
            services. A Web integration platform based on WIDL needs to provide:
            A mechanism for retrieving WIDL files, either from a local file 
                system, a directory service such as LDAP, or a URL
                An HTML and XML parser, and text pattern matching capabilities, 
                providing an object model for accessing elements of Web 
                documents 
                HTTP and HTTPS support, to initiate requests and receive Web 
                documents 
                Apart from these requirements, a WIDL processor could be delivered 
            as a Java class or a Windows DLL, for integration directly with 
            client applications, or as a standalone server with middleware 
            interfaces, allowing thin-client access to Web automation 
            functionality.
            Generating Code
            The primary purpose of WIDL is integration with corporate business 
            applications. In much the same way that DCE or CORBA IDL is used to 
            generate code fragments, or "stubs," to be included in development 
            projects, WIDL provides the necessary ingredients for generating 
            Java, JavaScript, C/C++, and even Visual Basic client code.
            webMethods has developed a suite of Web Automation products for the 
            development and management of WIDL files, as well as the generation 
            of client code from WIDL files. Client stubs, which we 
            affectionately call "Weblets," present developers with local 
            function calls, and encapsulate all the methods required to invoke a 
            service that has been defined by a WIDL file. 
            Example 7 Java Stub
            
            import watt.api.*;   
 
public class TrackPackage extends Object   
{   
      public String TrackingNum;   
      public String disposition;   
      public String deliveredOn;   
 
      public String deliveredTo;   
 
      public TrackPackage(String TrackingNum)   
 
throws IOException, WattException, WattServiceException  
 
      {   
              String args[][] = {   
              {"TrackingNum", TrackingNum},   
              {"DestCountry", DestCountry},   
              {"ShipDate", ShipDate}   
              };   
 
              Context c = new <I>Context</I>();   
 
              c.loadDocument("Shipping.widl");   
              Result r = c.invokeService("FedexShipping",    
                                           "TrackPackage", args);   
 
              disposition = r.<I>getVariable</I>("disposition");   
              deliveredOn = r.<I>getVariable</I>("deliveredOn");   
              deliveredTo = r.<I>getVariable</I>("deliveredTo");   
      }   
}   
            Example 7 features a Java class generated from the package tracking 
            WIDL presented earlier in Example 1. This class demonstrates the 
            following methods that are part of the API that webMethods has 
            developed for processing WIDL: 
            Context
                loadDocument
                invokeService
                getVariable
                After declaring the variables that will be used by the 
            PackageTracking class, a handle c to a new Context of the webMethods 
            Web automation runtime is created. All API calls are then made 
            against this handle.
            loadDocument loads and parses the specified WIDL file, in this case 
            Shipping.widl. Loading the WIDL defines the services of the Shipping 
            interface to the runtime. invokeService actually submits the input 
            parameters to the TrackPackage service, which makes the appropriate 
            HTTP request and returns either a result set which contains the 
            bound output variables or an error message specified by a 
            <CONDITION/> statement within the <SERVICE/> definition. getVariable 
            is then used to extract the values of the output variables and to 
            assign them to class variables. 
            Within the Java application, the package tracking service looks like 
            a simple instantiation of the TrackPackage class:
            TrackPackage p = new TrackPackage("12345678");
            In short, an application makes a call to a local function that has 
            been generated by WIDL. The local function encapsulates the API 
            calls to the WIDL processor. The WIDL processor:
            Loads the WIDL file from a local or remote file system
                Passes the function's input parameters as an HTTP request 
                Parses the retrieved document to extract target data items
                Executes any conditional logic for error checking or service 
                chaining 
                Returns the extrated data into the output parameters of the 
                calling function
                Generated Java classes can be incorporated in standalone Java 
            applications, Java Applets, JavaScript routines, or server-side Java 
            "Servlets." Generated C/C++ encapsulating Web services can be 
            deployed as DLLs, shared libraries, or standalone executables. 
            webMethods implementation, the Web Automation Platform, provides 
            Java classes, a shared library, a Windows DLL and an Active/X 
            control to support Visual Basic modules which can be embedded in 
            spreadsheets and other Microsoft Office applications.
            Conclusion
            Web technology is strong on interactivity but low on automation. The 
            primary applications of the Web, including Push and Agent 
            technologies, are almost exclusively focused on end users. Data that 
            is being made available in HTML format is effectively inaccessible 
            to business applications other than the Web browser.
            On corporate intranets and extranets, the Web browser has enabled 
            access to business systems, but has in many cases reinforced manual 
            inefficiencies as data must be transcribed from browser windows into 
            other application interfaces.
            Electronic commerce on the Web is typically driven manually via a 
            browser. In order to achieve business-to-business integration, 
            organizations have resorted to proprietary protocols. The 
            many-to-many nature of Web commerce demands a standard for automated 
            integration.
            Interactions normally performed manually in a browser, such as 
            entering information into an HTML form, submitting the form, and 
            retrieving HTML documents, can be automated by capturing details 
            such as input parameters, service URLs, and data extraction methods 
            for output parameters. Mechanisms for condition processing can also 
            be provided to enable robust error handling.
            The Web Interface Definition Language (WIDL) is an application of 
            the Extensible Markup Language (XML), which allows the resources of 
            the World Wide Web to be described as functional interfaces that can 
            be accessed by remote systems over standard Web protocols. WIDL 
            transforms the Web into a standards-based integration platform, 
            providing a practical and cost-effective infrastructure for 
            business-to-business electronic commerce over Web.
            Appendix A
            Example 8 shows the WIDL DTD in its entirety. 
            Example 8 The WIDL DTD
            
            <bigger><<!ELEMENT WIDL ( SERVICE | BINDING )* > 
<<!ATTLIST WIDL
     NAME       CDATA #IMPLIED
     VERSION (1.0 | 2.0 | ...) "2.0"
     TEMPLATE   CDATA #IMPLIED
     BASEURL    CDATA #IMPLIED
     OBJMODEL (wmobj | ...) "wmobj"
 
<<!ELEMENT SERVICE EMPTY>
<<!ATTLIST SERVICE
     NAME       CDATA #REQUIRED
     URL        CDATA #REQUIRED
     METHOD (Get | Post) "Get"
     INPUT      CDATA #IMPLIED
     OUTPUT     CDATA #IMPLIED
     AUTHUSER   CDATA #IMPLIED
     AUTHPASS   CDATA #IMPLIED
     TIMEOUT    CDATA #IMPLIED
     RETRIES    CDATA #IMPLIED
 
<<!ELEMENT BINDING ( VARIABLE | CONDITION | REGION )* > <<!ATTLIST BINDING
     NAME       CDATA #REQUIRED
     TYPE (Input | Output) "Output"
 
<<!ELEMENT VARIABLE EMPTY>
<<!ATTLIST VARIABLE
     NAME       CDATA #REQUIRED
     FORMNAME   CDATA #IMPLIED
     TYPE (String | String[] | String[][]) "String" 
     USAGE (Default | Header | Internal) "Function" 
     REFERENCE  CDATA #IMPLIED
     VALUE      CDATA #IMPLIED
     MASK       CDATA #IMPLIED
     NULLOK          #BOOLEAN
 
<<!ELEMENT CONDITION EMPTY>
<<!ATTLIST CONDITION
     TYPE (Success | Failure | Retry) "Success" 
     REF        CDATA #REQUIRED
     MATCH      CDATA #REQUIRED
     REBIND     CDATA #IMPLIED
     SERVICE    CDATA #IMPLIED
     REASONREF  CDATA #IMPLIED
     REASONTEXT CDATA #IMPLIED
     WAIT       CDATA #IMPLIED
     RETRIES    CDATA #IMPLIED
 
<<!ELEMENT REGION EMPTY>
<<!ATTLIST REGION
     NAME       CDATA #REQUIRED
     START      CDATA #REQUIRED
     END        CDATA #REQUIRED
 
</bigger>
            About the Author
            Charles Allen
                3975 University Drive
                Suite 360
                Fairfax, VA 22030
                (703) 352-8345
                charles@webMethods.com 
                Charles Allen is VP of Product Management for webMethods, Inc., the 
            leading provider of Web Automation and integration solutions for the 
            Global 2000. Prior to joining webMethods, Mr. Allen was a founding 
            member of Open Environment Corporation. Most recently he was 
            responsible for technology acquisitions and joint ventures in the 
            Asia/Pacific region. An inveterate communicator, Mr. Allen has 
            presented extensively on the Web and distributed systems technology 
            at events around the world.
               
            
            
    
    
            
            Copyright © 1998 Seybold Publications and O'Reilly & Associates, 
            Inc.
            XML is a trademark of MIT and a product of the World Wide Web 
            Consortium.