Java Project X
Release Notes

Version: Technology Release 1

This document contains notes that may help you use this library more effectively.

Please feel free to send problem reports, questions, and other feedback to the feedback alias, <xml-feedback@java.sun.com>. With respect to new feature requests, please keep in mind that we want to see packages built on top of this core for most features. The core API is intended to facilitate such a layered architecture for value-added products that leverage XML.

Conformance

The parsers conform to the W3C's XML 1.0 recommendation. Sun has done extensive testing to ensure that it conforms as closely as possible to this recommendation.
The parse tree supports the XML (core) part of W3C's DOM Level 1 recommendation.
In combination, the two also support the current W3C XML Namespaces Recommendation.
The parser supports the SAX 1.0 API. Sun has done extensive testing to ensure that it conforms as closely as possible to this API.
The entity resolution used within the parser normally conforms to the IETF's RFC 2376 registration for XML-related MIME content types. This can be overridden as required. (See below; overriding may be needed since many web servers do not conform to that specification, and report incorrect character set encoding information.)
This parser supports all of the character encodings supported by the Java platform with which it is used. See the package overview for the com.sun.xml.parser package for more detailed information, including names of specific encodings that are widely used.
When used in a supported configuration (JDK 1.1.6 and later), this software is Y2K compliant; it has no date related content.

Parser

There are two separate parsers, sharing almost all the same code. The validating parser is slightly slower since it performs additional error checking.
Because SAX specifies the default error handling to be that validation errors are ignored, the validating parser normally accepts invalid documents (when they are well formed). To reject such documents, use an implementation of ErrorHandler with a error() method which throws its argument; or use the new ValidatingParser(true) constructor, which uses a non-default error handler.
Both parsers normally read all external entities, and so their behavior is all but completely defined by the XML specification. An exception is that for valid standalone documents you may enable a "fast standalone" mode, which skips external parts of the DTD if validation is not being performed. (And accordingly, may not report the correct errors for invalid documents.)
Please let us know about any diagnostics produced by the parser which are misleading or confusing.
Whenever you work with text encodings other than UTF-8 and UTF-16, you should put an encoding declaration at the very beginning of all your XML files (including DTDs). If you don't do this, the parser will not be able to determine the encoding being used, and will probably be unable to parse your document. A text declaration like <?xml version='1.0' encoding='euc-jp'?> says that the document uses the "euc-jp" encoding.
The parser currently reports warnings, rather than errors, in cases where the declared and actual text encodings don't match. It may give those same warnings in the common case where the encoding name used internally to Java is not the one used in the document. If the declared encoding is truly an error, you'll usually see other errors (not warnings) being reported by the parser.
Currently, many web servers report incorrect MIME types for XML documents using non-ASCII encodings. Such situations are indicated through the warning noted above, and are often followed by other errors (not warnings). You may work around such web server problems by disabling MIME typing support by using a properly configured com.sun.xml.parser.Resolver, disabling the use of MIME typing information.
The parser currently does not report an error for content models which are not deterministic. Accordingly it may not behave well when given data which matches an "ambiguous" content model such as ((a,b)|(a,c)). DTDs with such models are in error, and must be restructured to be unambiguous. (In the example, (a,(b|c)) is an equivalent legal content model.)
If you are using JDK 1.1 with large numbers of symbols (more than can be counted in sixteen bits) you might run into a message, panic: 16-bit string hash table overflow as the Java VM aborts. JDK 1.2 does not have this limitation.

Object Model

Conforming to the XML specification, the parser reports all whitespace to the DOM even if it's meaningless. Many applications do not want to see such whitespace. You can remove it by invoking the ElementNode.normalize method, which merges adjacent text nodes and also canonicalizes adjacent whitespace into a single space (unless the xml:space="preserve" attribute prevents that).
Currently, attribute nodes may not have children. Access their values as strings, instead of enumerating children.
When this DOM implementation is driven by a SAX parser (such as Sun's), certain kinds of nodes will not appear in DOM trees. When using Sun's parser, you may request some such nodes be reported; use the XmlDocumentBuilder.setIgnoringLexicalInfo(false) method, and invoke SAX with such a builder as its document handler. Such nodes include:
- Ignorable whitespace ... for "children" and "empty" content models, whitespace in the element content is normally discarded.
- Comments ... are always directed towards humans. Programs should only use processing instructions.
- CDATA Sections ... these are just an alternative encoding for text (with different delimiters), so they are reported as text nodes.
- Document Types Declarations ... DOM only provides partial support for exposing DTDs. Sun's parser exposes information required by DOM, and saves the full text of the <DOCTYPE ...> declaration for use when printing.
Currently, when documents are cloned the clone will not have a clone of the associated ElementFactory or DocumentType.
The in-memory representation of text nodes has not been tuned to be efficient with respect to space utilization.

Other Issues

This software is not a "Java Standard Extension" for XML processing.
Note that in this release, source code is provided for your reference. This source code may not be redistributed.
Also note that you may redistribute classes in this release with your software, so long as that software has not been modified.
The XML messaging example (with a servlet) requires JDK 1.2 on the server side. You may work around the problem by changing the client to use the nonstandard encoding name "UTF8" (no dash) rather than the standard "UTF-8" encoding name (with a dash) as it now does. Such workarounds may prevent alternative server side implementations, such as CGI scripts, from being able to accept these messages.

Changes since Early Access 2 (EA2)

Some API cleanup and improvement was done. Some classes which could not be subclassed are no longer documented, and some methods have moved to different classes or been renamed.
The parsers will now optionally report more "lexical" information in element content. They have always reported ignorable whitespace, but now also report CDATA text delimiters as well as comments and entity expansions. By default, this DOM implementation will not record such information.
Diagnostic messages can now be localized. This ships with English message catalogs, but additional catalogs can be provided. Diagnostics will use the Locale provided to the parser (or XmlDocument), rather than any default locale. This permits multi-language servers to provide diagnostics in a language appropriate to each client.
The com.sun.xml.tree.TreeWalker utility class has been updated with removeCurrent() and reset() methods.
Minor bugfixes have been implemented; no major bugs were reported.

Changes since Early Access 1 (EA1)

DOM support conforms to the W3C recommendation, not the proposed recommendation. Changes include renaming the "Attribute" interface to "Attr" to fix a conflict with a reserved word in OMG-IDL.
DOM space and time performance was improved.
Most classes in the com.sun.xml.tree package are now private. There are new interfaces. Some of the APIs have changed.
In the com.sun.xml.parser package, the static methods for checking character and name types have been moved out of the parser class for clarity. New interfaces have been defined, for better DOM support.
There is now an ElementFactory interface which supports namespace-aware creation of elements. You can have two element types, distinguished only by namespace URIs, which instantiate as different DOM elements. SimpleElementFactory provides a table-driven implementation of this functionality.
Minor bugfixes have been implemented, including implementing some missing DOM functionality.
New examples have been provided, including one using XML with HTTP for messaging.