Copyright © 1999, 2000 by Birdstep Technology AS
The IBXML SAX interface component provides the SAX version 1.0 interface and implements drivers for parsing of XML documents stored as text and for retrieval of XML documents from databases. It also implements handlers for outputting of XML documents as text and for storing of XML documents in databases (see Figure 1-1). In a sense these drivers and handlers are wrappers for the IBXML database I/O component and for XML generators of different kinds (see Figure 1-2).
The Simple API for XML (SAX) describes an event-driven interface to the process of parsing XML documents. SAX is an API in the public domain, developed by individuals on the XML-DEV mailing list and does not have a formal specification document, but is defined by a public domain implementation using the Java™ Programming Language. An XML parser is SAX conformant if it implements the interface defined by this public domain implementation. References to SAX version 1.0 in this text refers to the definition of SAX in [Sun99].
An event-driven interface provides a mechanism for notifications to the application code as the underlying parser recognizes XML syntactic constructions in the document.
The SAX interface is implemented using C++ and thus sometimes deviates from the defining public domain Java™ implementation due to language differences. The SAX interface component also include drivers for parsing of XML stored as plain text and in IBXML databases, and it features handlers for storing of data in the same two formats.
Using SAX in an application is quite simple and follows this procedure:
The application instantiates a driver for the parser.
The application registers handlers for all events which the application wants to receive.
The application transfers control to the parser which parses a given input source and makes calls back to the application code.
The application destroys the parser object.
Example 1-1. Program body of simple SAX application
#include <ibxml/sax/SAXException.h> #include <ibxml/sax/drivers/DriverText.h> using namespace IBXMLSAX; int main(int argc, char **argv) { [do processing of command line arguments] try { DriverText parser; MyHandler handler; parser.setDocumentHandler(&handler); parser.setErrorHandler(&handler); parser.setDTDHandler(&handler); parser.parse(argv[1]); } catch (SAXException& e) { [handle error] cerr << "Error parsing " << argv[1] << ": " << e.getMessage() << endl; return 1; } return 0; } |
The procedure is illustrated in the code fragment in Example 1-1 where an instance of the class DriverText is used to read XML data from an XML document in plain text. The DriverText encapsulates the expat parser and passes information from expat to the application's event handlers. The application's event handlers are registered with the parser using the methods setDocumentHandler, setDTDHandler and setErrorHandler. The parse process itself is initiated by an invocation of the method parse. In this case, the system identifier of the input source is passed as the single parameter to the parse method.
Example 1-2. Declaration of MyHandler class used by the simple SAX application in Example 1-1
class MyHandler : public HandlerBase { public: MyHandler(); virtual ~MyHandler(); virtual void startDocument(); virtual void endDocument(); virtual void startElement(const ibxmlchar* name, const AttributeList& atts); virtual void endElement(const ibxmlchar* name); virtual void characters(const ibxmlchar* s, unsigned int start, unsigned int length); }; |
The event handlers registered with the parser are contained inside a class called MyHandler (see Example 1-2). This class is a subclass of the SAX class HandlerBase, which in turn is derived from several different SAX classes (DTDHandler, ErrorHandler, DocumentHandler and EntityResolver). The HandlerBase class is provided as a convenience class for application developers and should be used by simple applications. Advanced applications may derive handlers directly from the actual handler interface classes.
Example 1-3. Implementation of MyHandler class used by the simple SAX application in Example 1-1
MyHandler::MyHandler() { } MyHandler::~MyHandler() { } void MyHandler::startDocument() { // This is called when processing of the XML document // starts. } void MyHandler::endDocument() { // This is called when processing of the XML document // ends. } void MyHandler::startElement(const ibxmlchar* name, const AttributeList& atts) { // For each opening tag in the XML document, this method // is called. name contains the name // of the tag, while atts contains a list // all the attributes. } void MyHandler::endElement(const ibxmlchar* name) { // For each opening tag in the XML document, this method // is called. name contains the name // of the tag. } void MyHandler::characters(const ibxmlchar* s, unsigned int start, unsigned int length) { // When a block of characters is encountered in the // XML document, this method is called. Please note that // a single block of characters in the XML document may // result in several invokations of this method. } |
In the declaration of the MyHandler class only a subset of the methods in the HandlerBase class are overridden. All other events will be ignored by the default implementation in HandlerBase. Example 1-3 shows the implementation of the the MyHandler class.