Java Project X
Release Notes
Version: Technology Release 1
This document contains notes that may help you use this library
more effectively.
Please feel free to send problem reports,
questions, and other feedback to the feedback alias,
<xml-feedback@java.sun.com>.
With respect to new feature requests, please keep in mind that we
want to see packages built on top of this core for most features.
The core API is intended to facilitate such a layered architecture
for value-added products that leverage XML.
Conformance
- The parsers conform to the W3C's
XML 1.0
recommendation. Sun has done extensive testing to ensure that
it conforms as closely as possible to this recommendation.
- The parse tree supports the XML (core) part of W3C's
DOM Level 1
recommendation.
- In combination, the two also support the current W3C
XML Namespaces
Recommendation.
- The parser supports the SAX
1.0 API. Sun has done extensive testing to ensure that it conforms
as closely as possible to this API.
- The entity resolution used within the parser normally
conforms to the IETF's RFC 2376 registration for XML-related
MIME content types. This can be overridden as required.
(See below; overriding may be needed since many web servers do
not conform to that specification, and report incorrect character
set encoding information.)
- This parser supports all of the character encodings supported
by the Java platform with which it is used. See the package
overview for the com.sun.xml.parser package for more
detailed information, including names of specific encodings that
are widely used.
- When used in a supported configuration (JDK 1.1.6 and later),
this software is Y2K compliant; it has no date related content.
Parser
- There are two separate parsers, sharing almost all the same
code. The validating parser is slightly slower since it performs
additional error checking.
- Because SAX specifies the default error handling to be that
validation errors are ignored, the validating parser normally accepts
invalid documents (when they are well formed). To reject such
documents, use an implementation of
ErrorHandler
with a error()
method which throws its argument; or
use the new ValidatingParser(true) constructor, which
uses a non-default error handler.
- Both parsers normally read all external entities, and so their
behavior is all but completely defined by the XML specification. An
exception is that for valid standalone documents you may enable a
"fast standalone" mode, which skips external parts of the DTD if
validation is not being performed. (And accordingly, may not report
the correct errors for invalid documents.)
- Please let us know about any diagnostics produced by the
parser which are misleading or confusing.
- Whenever you work with text encodings other than UTF-8 and
UTF-16, you should put an encoding declaration at the very beginning of
all your XML files (including DTDs). If you don't do this, the
parser will not be able to determine the encoding being used, and
will probably be unable to parse your document. A text declaration
like
<?xml version='1.0' encoding='euc-jp'?>
says
that the document uses the "euc-jp" encoding.
- The parser currently reports warnings, rather than errors,
in cases where the declared and actual text encodings don't match.
It may give those same warnings in the common case where the encoding
name used internally to Java is not the one used in the document.
If the declared encoding is truly an error, you'll usually see other
errors (not warnings) being reported by the parser.
- Currently, many web servers report incorrect MIME types for
XML documents using non-ASCII encodings. Such situations are
indicated through the warning noted above, and are often followed
by other errors (not warnings). You may work around such web server
problems by disabling MIME typing support by using a properly
configured com.sun.xml.parser.Resolver, disabling
the use of MIME typing information.
- The parser currently does not report an error for content
models which are not deterministic. Accordingly it may not behave
well when given data which matches an "ambiguous" content model
such as ((a,b)|(a,c)). DTDs with such models are in
error, and must be restructured to be unambiguous. (In the example,
(a,(b|c)) is an equivalent legal content model.)
- If you are using JDK 1.1 with large numbers of symbols
(more than can be counted in sixteen bits) you might run into
a message, panic: 16-bit string hash table overflow
as the Java VM aborts. JDK 1.2 does not have this limitation.
Object Model
- Conforming to the XML specification, the parser reports all
whitespace to the DOM even if it's meaningless. Many applications
do not want to see such whitespace. You can remove it by invoking
the ElementNode.normalize method, which merges adjacent text
nodes and also canonicalizes adjacent whitespace into a single space
(unless the xml:space="preserve" attribute prevents that).
- Currently, attribute nodes may not have children. Access their
values as strings, instead of enumerating children.
- When this DOM implementation is driven by a SAX parser (such
as Sun's), certain kinds of nodes will not appear in DOM trees.
When using Sun's parser, you may request some such nodes be reported;
use the XmlDocumentBuilder.setIgnoringLexicalInfo(false)
method, and invoke SAX with such a builder as its document handler.
Such nodes include:
- Ignorable whitespace ... for "children" and "empty" content
models, whitespace in the element content is normally discarded.
- Comments ... are always directed towards humans.
Programs should only use processing instructions.
- CDATA Sections ... these are just an alternative encoding
for text (with different delimiters), so they are reported as
text nodes.
- Document Types Declarations ... DOM only provides partial
support for exposing DTDs. Sun's parser exposes information
required by DOM, and saves the full text of the <DOCTYPE
...> declaration for use when printing.
- Currently, when documents are cloned the clone will not have a
clone of the associated ElementFactory or DocumentType.
- The in-memory representation of text nodes has not been tuned
to be efficient with respect to space utilization.
Other Issues
- This software is not a "Java Standard Extension" for
XML processing.
- Note that in this release, source code is provided for
your reference. This source code may not be redistributed.
- Also note that you may redistribute classes in this
release with your software, so long as that software has
not been modified.
- The XML messaging example (with a servlet) requires
JDK 1.2 on the server side. You may work around the problem by
changing the client to use the nonstandard encoding name
"UTF8" (no dash) rather than the standard "UTF-8" encoding name
(with a dash) as it now does. Such workarounds may prevent
alternative server side implementations, such as CGI scripts,
from being able to accept these messages.
Changes since Early Access 2 (EA2)
- Some API cleanup and improvement was done. Some classes which
could not be subclassed are no longer documented, and some methods
have moved to different classes or been renamed.
- The parsers will now optionally report more "lexical" information
in element content. They have always reported ignorable whitespace,
but now also report CDATA text delimiters as well as comments and
entity expansions. By default, this DOM implementation will not
record such information.
- Diagnostic messages can now be localized. This ships with English
message catalogs, but additional catalogs can be provided. Diagnostics
will use the Locale provided to the parser (or XmlDocument),
rather than any default locale. This permits multi-language servers
to provide diagnostics in a language appropriate to each client.
- The com.sun.xml.tree.TreeWalker utility class has
been updated with removeCurrent() and reset()
methods.
- Minor bugfixes have been implemented; no major bugs were
reported.
Changes since Early Access 1 (EA1)
- DOM support conforms to the W3C recommendation, not
the proposed recommendation. Changes include renaming the
"Attribute" interface to "Attr" to fix a conflict with a
reserved word in OMG-IDL.
- DOM space and time performance was improved.
- Most classes in the com.sun.xml.tree package
are now private. There are new interfaces. Some of the APIs
have changed.
- In the com.sun.xml.parser package, the
static methods for checking character and name types have
been moved out of the parser class for clarity. New
interfaces have been defined, for better DOM support.
- There is now an ElementFactory interface which
supports namespace-aware creation of elements. You can have
two element types, distinguished only by namespace URIs, which
instantiate as different DOM elements. SimpleElementFactory
provides a table-driven implementation of this functionality.
- Minor bugfixes have been implemented, including implementing
some missing DOM functionality.
- New examples have been provided, including one using XML
with HTTP for messaging.