![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
How do I construct a parser in XML4J version 2? In XML4J version 2, the DOM api is implemented using the SAX api. XML4J
version 2 has a modular architecture and comes pre-bundled with 4 configurations of the parser (all in com.ibm.xml.parsers package). These are:
There are two ways the parser classes can be instantiated: The first way is to create a string containing the fully qualified name of the parser class. Pass this string to the import org.xml.sax.Parser; String xmlFile = "file:///xml4j2/data/personal.xml"; Parser parser = ParserFactory.makeParser(parserClass); try { parser.parse(xmlFile); } catch (SAXException se) { se.printStackTrace(); } catch (IOException ioe) { ioe.printStackTrace(); } // The next line is only for DOM Parsers Document doc = ((DOMParser) parser).getDocument(); ... The second way to instantiate a parser class is to explicitly instantiate the parser class, as shown in this example, which is creating a validating DOM Parser. Use this way when you know exactly which parser configuration you need, and you are sure that you will not need to switch configurations. import com.ibm.xml.parsers.DOMParser; try { parser.parse(xmlFile); } catch (SAXException se) { se.printStackTrace(); } catch (IOException ioe) { ioe.printStackTrace(); } // The next line is only for DOM Parsers Document doc = parser.getDocument(); ... Once you have the Document object, you can call any method on it as defined by the DOM specification.
How do I create a DOM parser? Use one of the methods in the question above, and use com.ibm.xml.parsers.DOMParser
to get a validating parser and com.ibm.xml.parsers.NonValidatingDOMParser to get a non-validating parser. To access the DOM tree, you can call the
How do I create a SAX parser?
Use one of the methods in the question above, and use com.ibm.xml.parsers.ValidatingSAXParser to get a validating parser and com.ibm.xml.parsers.SAXParser to get a non-validating parser.
Once you have the parser instance, you can use the standard SAX methods to set the various handlers provided by SAX.
How do I create a parser compatible with XML4J version 1? As an aid to developers currently using XML4J version 1, classes in the com.ibm.xml.parser and com.ibm.xml.xpointer packages are provided for
backward compatibility. If you need parser functionality that is provided in version 1 and there is no corresponding functionality in version 2, you can use the "TX compatibility" classes the same way you used them in version 1.
However, you cannot mix and match classes between the native classes and the TX compatibility classes. You should use the version 1 method for creating a parser class and causing the parser to read its input, as well as for setting all
options. The DOM returned by the compatiblity classes will be an instance of the TX* classes from version 1. Not all the functions available on com.ibm.xml.parser.Parser are supported or implemented:
XML4J version 1 occasionally inserted extra TX nodes in its DOM tree. Even though the compatibility classes provide a TX DOM tree, these extra nodes will not be present. If your application relies on their presence, you will need to modify your code. Users who are moving to the new parser architecture but want to use the catalog file format supported by the old parser should use the com.ibm.xml.internal.TXCatalog class. See the question "How do I use catalogs?".
What's the difference between the two DOM implementations? In XML4J version 2, there are two different DOM implementations provided:
Because XML4J version 2 is modular, you choose the DOM implementation you need for your application when you write your code. Note, however, that you cannot use both DOM's in the same parser at the same time. A summary of DOM features is shown below:
What new options are available on parsers?
How do I use the setNodeExpansion call on DOMParser and NonValidatingDOMParser? The native DOM parser classes,
com.ibm.xml.parsers.DOMParser and com.ibm.xml.parsers.NonValidatingDOMParser now use a DOM implementation that takes advantage of lazy evaluation to improve performance.
The setNodeExpansion call on these classes controls the use of lazy evaluation. There are two values for the argument to setNodeExpansion: FULL and DEFERRED(the default).
If node expansion is set to FULL, then the DOM classes behave as they always have, creating all nodes in the DOM tree by the end of parsing. If node expansion is set to DEFERRED, nodes in the DOM tree are only
created when they are accessed. This means that a call to getDocument will return a DOM tree that consists only of the Document node. When your program accesses a child of Document, the children of the Document node will
be created. All the immediate children of a Node are created when any of that Node's children are accessed. This shortens the time it takes to parse an XML file and create a DOM tree. This also increases the time it takes to access a
node that has not been created. After nodes have been created, they are cached, so this overhead only occurs on the first access to a Node.
How do I use namespaces? In XML4J version 2, the easiest way to get namespace support is to use the TX compatibility classes that provide an API for dealing with namespace
information. There are no standard API's for namespace manipulation in the standard DOM and SAX packages. The TX Compatibility classes provide additional, non-standard API's to work with namespaces.
When using the Standard DOM API, element names containing colons (":") are treated as normal element names. NOTE: The namespace specification does not currently specify the behavior of validation in the presence of namespaces. The behavior of validating parsers (all
validating parsers, not just XML4J) when namespaces are in use is currently undefined. If you want to use "namespace-like" element names (e.g. a:foo) with validation,
create a new DTD that contains fully qualified names from all the DTD's in use. Since the colon character is treated as a normal element name character, this
merged DTD will allow you to do validation, using these "namespace-like" names.
How do I use catalogs? XML4J Version 2 supports two catalog file formats: the SGML Open catalog
that was supported in version 1, and the proposed XCatalog specification. To use the original catalog file format, set a TXCatalog instance as the parser's EntityResolver. For example: XMLParser parser = new DOMParser(); parser.getEntityHandler().setEntityResolver(catalog); Once the catalog is installed, catalog files that conform to the TXCatalog format can be appended to the catalog by calling the loadCatalog method on the parser or the catalog instance. The following example loads the contents of two catalog files: parser.loadCatalog(new InputSource("catalogs/cat1.xml")); To use the XCatalog catalog, you must first have a catalog in XCatalog format. The current version of the XCatalog catalog supports the XCatalog proposal draft 0.2 posted to the xml-dev mailing list by John Cowan. XCatalog is an XML representation of the SGML Open TR9401:1997 catalog format. The current proposal supports public identifier maps, system identifier aliases, and public identifier prefix delegates. Refer to the XCatalog DTD for the full specification of this catalog format at http://www.ccil.org/~cowan/XML/XCatalog.html.
In order to use XCatalogs, you must write the catalog files with the following restrictions:
To use this catalog in a parser, set an XCatalog instance as the parser's EntityResolver. For example: XMLParser parser = new SAXParser(); parser.getEntityHandler().setEntityResolver(catalog); Once installed, catalog files that conform to the XCatalog grammar can be appended to the catalog by calling the loadCatalog method on the parser or the catalog instance. The following example loads the contents of two catalog files: parser.loadCatalog(new InputSource("catalogs/cat1.xml")); Limitations: The following are the current limitations of this XCatalog implementation:
How do I use the revalidation API? In XML4J version 2, you can validate a document after it has been parsed and
converted to a DOM tree. To do this, use the RevalidatingDOMParser or the TXRevalidatingDOMParser classes. The validate method on this class takes a DOM node as an argument, and performs a validity check on the DOM tree
rooted at that node, using the DTD of the current document. Currently, the native DOM prevents the insertion of invalid nodes, so this feature is not as useful for the native DOM.
This is an experimental feature, and the details of its operation will change in future releases of XML4J version 2. We are including it in order to hear your feedback on the functionality of these API's.
The sample program below parses a document, inserts an illegal node into the TX DOM and then tries to re-validate the document. import java.io.IOException;
How do I handle errors? When you create a parser instance, the default error handler does nothing. This means that your program will fail silently when it encounters an error. You should register an error handler with the parser by supplying a class which implements the org.xml.sax.ErrorHandler interface. This is true regardless of whether your parser is a DOM based or SAX based parser. IBM alphaWorks XML For Java communityXchange-XML for Java
How does entity expansion work in XML4J version 2? If you are using the TX Compatibility classes, you can already control entity expansion. (See the API docs for details). If you are using the native 2.0 DOM classes, the function setExpandEntityReferences controls how entities appear in the DOM tree. When setExpandEntityReferences is set to false (the default), an occurance of an entity reference in the XML document will be represented by a subtree with an EntityReference node at the root whose children represent the entity expansion. Unlike the TX compatibility classes and XML4J version 1.1.x, the entity expansion will be a DOM tree representing the structure of the entity expansion, not a text node containing the entity expansion as text. If setExpandEntityReferences is true, an entity reference in the XML document is represented by only the nodes that represent the entity expansion. Again, unlike the TX compatibility classes and XML4J version 1.1.x, the entity expansion will be a DOM tree representing the structure of the entity expansion, not a text node containing the entity expansion as text.
Why does "non-validating" not mean "well-formedness checking only"? Using a "non-validating" parser does not mean that only well-formedness checking is done! There are still many things that the XML specification requires of the parser, including entity substitution, defaulting of attribute values, and attribute normalization.
This table describes what "non-validating" really means for XML4J parsers. In this table, "no DTD" means no internal or external DTD subset is present.
How do associate my own data with a node in the DOM tree? The class com.ibm.xml.dom.NodeImpl provides a void setUserData(Object o) and an Object getUserData() method that you can use to attach any object to a node in the DOM tree.
How do I more efficiently parse several documents sharing a common DTD? DTDs are not currently cached by the parser. The common DTD, since it is specified in each XML document, will be re-parsed once for each document. However, there are things that you can do now, to make the process of reading DTD's more efficient:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||