com.sun.xml.tree
Class XmlDocumentBuilder

java.lang.Object
  |
  +--com.sun.xml.tree.XmlDocumentBuilder

public class XmlDocumentBuilder
extends java.lang.Object
implements LexicalEventListener

This class is a SAX DocumentHandler which converts a stream of parse events into an in-memory DOM document. After each Parser.parse() invocation returns, a resulting DOM Document may be accessed via the getDocument method. The parser and its builder should be used together; the builder may be used with only one parser at a time.

This builder optionally does XML namespace processing, reporting conformance problems as recoverable errors using the parser's error handler. If the parser is not a Sun parser, that handler will be inaccessible and so such errors will always be fatal. Also, if that handler does not treat such errors as fatal, processing will continue without raising an exception.

To customize the document, a powerful technique involves using an element factory specifying what element tags (from a given XML namespace) correspond to what implementation classes. Parse trees produced by such a builder can have nodes which add behaviors to achieve application-specific functionality, such as modifing the tree as it is parsed.

The object model here is that XML elements are polymorphic, with semantic intelligence embedded through customized internal nodes. Those nodes are created as the parse tree is built. Such trees now build on the W3C Document Object Model (DOM), and other models may be supported by the customized nodes. This allows both generic tools (understanding generic interfaces such as the DOM core) and specialized tools (supporting specialized behaviors, such as the HTML extensions to the DOM core; or for XSL elements) to share data structures.

Normally only "model" semantics are in document data structures, but "view" or "controller" semantics can be supported if desired.

Elements may choose to intercept certain parsing events directly. They do this by overriding the default implementations of methods in the XmlReadable interface. This is normally done to make the DOM tree represent application level modeling requirements, rather than matching an XML structure that may not be optimized appropriately.


Constructor Summary
XmlDocumentBuilder()
          Default constructor is for use in conjunction with a SAX parser's DocumentHandler callback.
 
Method Summary
 void characters(char[] buf, int offset, int len)
          SAX DocumentHandler callback, not for general application use.
 java.util.Locale chooseLocale(java.lang.String[] languages)
          Chooses a client locale to use for diagnostics, using the first language specified in the list that is supported by this builder.
 void comment(java.lang.String text)
          LexicalEventListener callback, not for general application use.
 XmlDocument createDocument()
          This is a factory method, used to create an XmlDocument.
 void endCDATA()
          LexicalEventListener callback, not for general application use.
 void endDocument()
          SAX DocumentHandler callback, not for general application use.
 void endElement(java.lang.String tag)
          SAX DocumentHandler callback, not for general application use.
 void endParsedEntity(java.lang.String name, boolean included)
          LexicalEventListener callback, not for general application use.
 boolean getDisableNamespaces()
          Returns true if namespace conformance is not checked as the DOM tree is built.
 XmlDocument getDocument()
          Returns the fruits of parsing, after a SAX parser has used this as a document handler during parsing.
 Locator getDocumentLocator()
          Returns the document locator provided by the SAX parser.
 ElementFactory getElementFactory()
          Returns the factory to be associated with documents produced by this builder.
 java.util.Locale getLocale()
          Returns the locale to be used for diagnostic messages by this builder, and by documents it produces.
 Parser getParser()
          Returns the parser used by this builder, if it is recorded; only Sun parsers are now recorded.
 void ignorableWhitespace(char[] buf, int offset, int len)
          SAX DocumentHandler callback, not for general application use.
 boolean isIgnoringLexicalInfo()
          Returns true (the default) if certain lexical information is automatically discarded when a DOM tree is built, producing smaller parse trees that are easier to use.
 void processingInstruction(java.lang.String name, java.lang.String instruction)
          SAX DocumentHandler callback, not for general application use.
 void setDisableNamespaces(boolean value)
          Controls whether namespace conformance is checked during DOM tree construction, or (the default) not.
 void setDocumentLocator(Locator locator)
          SAX DocumentHandler callback, not for general application use.
 void setElementFactory(ElementFactory factory)
          Assigns the factory to be associated with documents produced by this builder.
 void setIgnoringLexicalInfo(boolean value)
          Controls whether certain lexical information is discarded; by default, that information is discarded.
 void setLocale(java.util.Locale locale)
          Assigns the locale to be used for diagnostic messages.
 void setParser(Parser p)
          Sets the parser used by this builder.
 void startCDATA()
          LexicalEventListener callback, not for general application use.
 void startDocument()
          SAX DocumentHandler callback, not for general application use.
 void startElement(java.lang.String tag, AttributeList attributes)
          SAX DocumentHandler callback, not for general application use.
 void startParsedEntity(java.lang.String name)
          LexicalEventListener callback, not for general application use.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XmlDocumentBuilder

public XmlDocumentBuilder()
Default constructor is for use in conjunction with a SAX parser's DocumentHandler callback.
Method Detail

isIgnoringLexicalInfo

public boolean isIgnoringLexicalInfo()
Returns true (the default) if certain lexical information is automatically discarded when a DOM tree is built, producing smaller parse trees that are easier to use.

setIgnoringLexicalInfo

public void setIgnoringLexicalInfo(boolean value)
Controls whether certain lexical information is discarded; by default, that information is discarded.

That information includes whitespace in element content which is ignorable (note that some nonvalidating XML parsers will not report that information); all comments; which text is found in CDATA sections; and boundaries of entity references.

"Ignorable whitespace" as reported by parsers is whitespace used to format XML markup. That is, all whitespace except that in "mixed" or ANY content models is ignorable. When it is discarded, pretty-printing may be necessary to make the document be readable again by humans.

Whitespace inside "mixed" and ANY content models needs different treatment, since it could be part of the document content. In such cases XML defines a xml:space attribute which applications should use to determine whether whitespace must be preserved (value of the attribute is preserve) or whether default behavior (such as eliminating leading and trailing space, and normalizing consecutive internal whitespace to a single space) is allowed.

Parameters:
value - true indicates that such lexical information should be discarded during parsing.

getDisableNamespaces

public boolean getDisableNamespaces()
Returns true if namespace conformance is not checked as the DOM tree is built.

setDisableNamespaces

public void setDisableNamespaces(boolean value)
Controls whether namespace conformance is checked during DOM tree construction, or (the default) not. In this framework, the DOM Builder is responsible for enforcing all namespace constraints. When enabled, this makes constructing a DOM tree slightly slower. (However, at this time it can't enforce the requirement that parameter entity names not contain colons.)

setParser

public void setParser(Parser p)
Sets the parser used by this builder. If this is a Sun parser, error reports during parsing will use the parser's error handler, and DTD event processing is replaced. The parser's document handler is always set to this document builder.

getParser

public Parser getParser()
Returns the parser used by this builder, if it is recorded; only Sun parsers are now recorded.

getDocument

public XmlDocument getDocument()
Returns the fruits of parsing, after a SAX parser has used this as a document handler during parsing.

getLocale

public java.util.Locale getLocale()
Returns the locale to be used for diagnostic messages by this builder, and by documents it produces. This uses the locale of any associated parser.

setLocale

public void setLocale(java.util.Locale locale)
               throws SAXException
Assigns the locale to be used for diagnostic messages. Multi-language applications, such as web servers dealing with clients from different locales, need the ability to interact with clients in languages other than the server's default.

When an XmlDocument is created, its locale is the default locale for the virtual machine. If a parser was recorded, the locale will be associated with that parser.

See Also:
chooseLocale(java.lang.String[])

chooseLocale

public java.util.Locale chooseLocale(java.lang.String[] languages)
                              throws SAXException
Chooses a client locale to use for diagnostics, using the first language specified in the list that is supported by this builder. That locale is then automatically assigned using setLocale(). Such a list could be provided by a variety of user preference mechanisms, including the HTTP Accept-Language header field.
Parameters:
languages - Array of language specifiers, ordered with the most preferable one at the front. For example, "en-ca" then "fr-ca", followed by "zh_CN". Both RFC 1766 and Java styles are supported.
Returns:
The chosen locale, or null.
See Also:
MessageCatalog, Parser.chooseLocale(java.lang.String[])

setDocumentLocator

public void setDocumentLocator(Locator locator)
SAX DocumentHandler callback, not for general application use. Reports the locator object which will be used in reporting diagnostics and interpreting relative URIs in attributes and text.
Parameters:
locator - used to identify a location in an XML document being parsed.

getDocumentLocator

public Locator getDocumentLocator()
Returns the document locator provided by the SAX parser. This is commonly used in diagnostics, and when interpreting relative URIs found in XML Processing Instructions or other parts of an XML document. This locator is only valid during document handler callbacks.

createDocument

public XmlDocument createDocument()
This is a factory method, used to create an XmlDocument. Subclasses may override this method, for example to provide document classes with particular behaviors, or provide particular factory behaviours (such as returning elements that support the HTML DOM methods, if they have the right name and are in the right namespace).

setElementFactory

public final void setElementFactory(ElementFactory factory)
Assigns the factory to be associated with documents produced by this builder.

getElementFactory

public final ElementFactory getElementFactory()
Returns the factory to be associated with documents produced by this builder.

startDocument

public void startDocument()
                   throws SAXException
SAX DocumentHandler callback, not for general application use. Reports that the parser is beginning to process a document.

endDocument

public void endDocument()
                 throws SAXException
SAX DocumentHandler callback, not for general application use. Reports that the document has been fully parsed.

startElement

public void startElement(java.lang.String tag,
                         AttributeList attributes)
                  throws SAXException
SAX DocumentHandler callback, not for general application use. Reports that the parser started to parse a new element, with the given tag and attributes, and call its startParse method.
Throws:
SAXParseException - if XML namespace support is enabled and the tag or any attribute name contain more than one colon.
SAXException - as appropriate, such as if a faulty parser provides an element or attribute name which is illegal.

endElement

public void endElement(java.lang.String tag)
                throws SAXException
SAX DocumentHandler callback, not for general application use. Reports that the parser finished the current element. The element's doneParse method is then called.
Throws:
SAXException - as appropriate

comment

public void comment(java.lang.String text)
             throws SAXException
LexicalEventListener callback, not for general application use. Reports that a comment was found in the document. If this builder is set to record lexical information (by default it ignores such information) then this callback records a comment in the DOM tree.
Specified by:
comment in interface LexicalEventListener
Parameters:
text - body of the comment.

startCDATA

public void startCDATA()
                throws SAXException
LexicalEventListener callback, not for general application use. Reports that CDATA section was begun.

If this builder is set to record lexical information (by default it ignores such information) then this callback arranges that character data (and ignorable whitespace) be recorded as part of a CDATA section, until the matching endCDATA method is called.

Specified by:
startCDATA in interface LexicalEventListener

endCDATA

public void endCDATA()
              throws SAXException
LexicalEventListener callback, not for general application use. Reports that CDATA section was completed. This terminates any CDATA section that is being constructed.
Specified by:
endCDATA in interface LexicalEventListener

characters

public void characters(char[] buf,
                       int offset,
                       int len)
                throws SAXException
SAX DocumentHandler callback, not for general application use. Reports text which is part of the document, and which will be provided stored as a Text node.

Some parsers report "ignorable" whitespace through this interface, which can cause portability problems. That's because there is no safe way to discard it from a parse tree without accessing DTD information, of a type which DOM doesn't expose and most applications won't want to deal with. Avoid using such parsers.

Parameters:
buf - holds text characters
offset - initial index of characters in buf
len - how many characters are being passed
Throws:
SAXException - as appropriate

ignorableWhitespace

public void ignorableWhitespace(char[] buf,
                                int offset,
                                int len)
                         throws SAXException
SAX DocumentHandler callback, not for general application use. Reports ignorable whitespace; if lexical information is not ignored (by default, it is ignored) the whitespace reported here is recorded in a DOM text (or CDATA, as appropriate) node.
Parameters:
buf - holds text characters
offset - initial index of characters in buf
len - how many characters are being passed
Throws:
SAXException - as appropriate

processingInstruction

public void processingInstruction(java.lang.String name,
                                  java.lang.String instruction)
                           throws SAXException
SAX DocumentHandler callback, not for general application use. Reports that a processing instruction was found.

Some applications may want to intercept processing instructions by overriding this method as one way to make such instructions take immediate effect during parsing, or to ensure that processing instructions in DTDs aren't ignored.

Parameters:
name - the processor to which the instruction is directed
instruction - the text of the instruction (no leading spaces)
Throws:
SAXParseException - if XML namespace support is enabled and the name contains a colon.
SAXException - as appropriate

startParsedEntity

public void startParsedEntity(java.lang.String name)
                       throws SAXException
LexicalEventListener callback, not for general application use. Reports the begining of processing for a general entity.

If this builder is set to record lexical information (by default it ignores such information) then this callback arranges that an entity reference node hold data that is reported until the matching endParsedEntity callback. Otherwise that data is treated like any other content found in a document (and will not be marked as readonly).

Specified by:
startParsedEntity in interface LexicalEventListener
Parameters:
name - identifies the parsed general entity whose expansion will be represented in the DOM tree.

endParsedEntity

public void endParsedEntity(java.lang.String name,
                            boolean included)
                     throws SAXException
LexicalEventListener callback, not for general application use. Reports that the parser finished handling a general entity. If an entity reference was being recorded, this callback marks the entity being expanded as read only.
Specified by:
endParsedEntity in interface LexicalEventListener
Parameters:
name - identifies the parsed general entity whose expansion will be represented in the DOM tree.
included - lets nonvalidating XML parser tell applications about any external entities that were recognized but not included.


Submit Feedback to xml-feedback@java.sun.com