IBM's XML Parser for Java - README

IBM's XML for Java is an Extensible Markup Language (XML) processor written in Java (alpha level). XML for Java provides two main functions:

Parsing an XML document and construction of a Java object tree
Generation of an XML document from a Java object tree

Installation
Sample Applications
Program Development

Programming Guide
API Documentation

Release Notes

Changes

To Do
Limitations of MS JVM
Limitations of SUN JVM
Contact
Frequently Asked Questions

Installation

Windows95, WindowsNT, OS/2 (ZIP archive)

Install JDK-1.1.
Install unzip or WinZip executable..
Download the XML4J distribution package in ZIP format.

Unzip the distribution package, xml4j_n_n_n.zip into a directory.

C:\>unzip some_directory\xml4j_n_n_n.zip
C:\>cd xml4j
C:\xml4j>

You will see the following files in the xml4j directory:

FAQ.html FAQ

README.html this file

license.html license information

apiDocs\ directory for API documents

docs\ directory for documents

data\personal.dtd sample DTD file

data\personal.xml sample XML document

src\ directory for source code

xml4j_n_n_n.jar contains parser class files

xml4jSamples_n_n_n.jar contains samples class files

scripts\ directory for build scripts

samples\ sample XML4J applications

Try the following command to test your installation. This test parses the input and then regenerates the same XML document.

C:\xml4j>type data\personal.xml
C:\xml4J>jre -cp xml4j_n_n_n.jar;xml4jSamples_n_n_n.jar samples.XJParse.XJParse -d data\personal.xml

This step is required only if you have installed JDK 1.1.6 or you experience a run-time fatal error while invoking 'XJParse'.
This fatal error is because of a bug in the JIT (symcjit.dll) shipped with JDK 1.1.6. The fix is to apply a patch which can be downloaded from the JavaSoft website: http://www.javasoft.com/products/jdk/1.1/download-jdk-windows.html

Installing the patch involves replacing symcjit.dll with the new one.

UNIX

Install JDK-1.1 and GNU gzip.
Download a distribution package in .tar.gz format. (If you have installed the unzip command for UNIX, ZIP format is also Ok.)

Extract the distribution package into a directory.

# cd /usr/local
# gzip -dc some_directory/xml4j.n.n.n.tar.gz | tar xvf -
# cd xml4j

You will see the following files in the xml4j directory:

FAQ.html FAQ

README.html this file

license.html license information

api/ directory for API documents

docs/ directory for documents

data/personal.dtd sample DTD file

data/personal.xml sample XML document

src/ directory for source code

xml4j_n_n_n.jar contains class files

xml4jSamples_n_n_n.jar contains samples class files

scripts/ directory for build scripts

samples/ sample XML4J applications

Try the following command to test your installation. This program parses the input and then regenerates the same XML document.

# cat data/personal.xml
# jre -cp "xml4j_n_n_n.jar:xml4jSamples_n_n_n.jar" samples.XJParse.XJParse -d data/personal.xml

Sample Applications

Some sample applications provided are (all classes required to run these sample applications are in xml4jSamples_n_n_n.jar. Remember, 'jre' ignores the CLASSPATH environment variable and so you have to specify any non-standard .jar files (like swing etc) explicitly using the -cp option):

samples.XJParse.XJParse (Java application, previously named 'trlx'):

XJParse is an XML syntax checker. To check an XML document, type:

jre -cp xml4j_n_n_n.jar;xml4jSamples_n_n_n.jar samples.XJParse.XJParse -d <xml-filename>

SiteOutliner (Java application):

SiteOutliner is a Java application that scans a Web site and reports its profile in CDF format. The profile contains a list of links to the pages, showing the structure of the site. The user can limit the files to be scanned by using some conditions, such as file types (extensions) and modified dates. The program can be used in both command prompt and window environments.

CDF Editor (Java application):

CDF Editor is a Java application to edit CDF files. The user loads a CDF file and edits the channels and items.

jre -cp xml4j_n_n_n.jar;xml4jSamples_n_n_n.jar samples.CdfEditor.CdfEditor

CDF Viewer (Java applet):

CDF Viewer is an applet that parses CDF files and visualizes their structures by using a tree.

Validating Generation sample (Java application):

This sample generates a valid element tree according to the specified DTD (specify full pathname to personal.dtd).

jre -cp xml4j_n_n_n.jar;xml4jSamples_n_n_n.jar samples.Miscellaneous.GeneratingSample e:\xml4j\data\personal.dtd

XML Tree-View (Java application):

A sample application using com.ibm.xml.parser.util.TreeFactory. You need to install JFC 1.1 (Swing-1.0) to run this program.

jre -cp "xml4j_n_n_n.jar;xml4jSamples_n_n_n.jar;C:/swing-1.0.2/swingall.jar" samples.Miscellaneous.TreeView
		data\personal.xml

XPointer Demonstration (Java application):

A sample application using com.ibm.xml.xpointer package. You need to install JFC 1.1 (Swing-1.0) to run this program.

jre -cp "xml4j_n_n_n.jar;xml4jSamples_n_n_n.jar;C:/swing-1.0.2/swingall.jar" samples.Miscellaneous.XPointerDemo
		data\personal.xml

This program has 2 function:

Click a node:

Display an XPointer expression of clicked node on a text field.

Put an XPointer expression and press "Go" button:

Select nodes pointed by the XPointer.

On certain platforms, where 'jre' is not be available, you can run these samples using 'java'. For this you can edit the CLASSPATH environment variable to include the parser (xml4j_n_n_n.jar) and samples (xml4jSamples_n_n_n.jar) jar files.

Program Development

This distribution archive includes a file named xml4j_n_n_n.jar. Add this file to your CLASSPATH environment variable, writing a command such as

set CLASSPATH=C:\xml4j\xml4j_n_n_n.jar;. (for Windows)

(assuming that you have installed XML for Java in C:\xml4j.)

setenv CLASSPATH /usr/local/xml4j/xml4j_n_n_n.jar:. (for UNIX, csh/tcsh)

export CLASSPATH="/usr/local/xml4j/xml4j_n_n_n.jar:." (for UNIX, ksh/bash/zsh)

The following resources are provided for application development:

Release Notes

This version of the processor is based on XML 1.0 Recomendation [10-Feb-1998] The processor supports 37 encodings for `<?xml encoding="...."': ISO-10646-UCS-4, ISO-10646-UCS-2, UTF-8, UTF-16, US-ASCII, ISO-8839-1 ... ISO-8859-9, ISO-2022-JP, Shift_JIS, EUC-JP, GB2312, Big5, EBCDIC-CP-(US, CA, NL, DK, NO, FI, SE, IT, ES, GB, FR, AR1, HE, CH, ROECE, YU, IS, AR2)
Validating Generation: Applications can recognize information in the Document Type Definition (DTD) and generate a document that has correct structure. See `How to query DTD information' in the programming guide.
W3C Document Object Model (DOM) Level 1 Specification [01-Oct-1998] Support:
Simple API for XML (SAX) 1.0 Support: com.ibm.xml.parser.SAXDriver provides the SAX interface.
Namespaces in XML Proposed Recommendation [17-Nov-1998] Support: See the programming guide.
Element Digest: See the DOMHASH document.
XPointer package: com.ibm.xml.xpointer package provides parsing XPointer expression, generating an XPointer instance from a node in a document tree, searching for nodes pointed by an XPointer instance.

CHANGES

19 Nov 1998

Release 1.1.9

19 Nov 1998

Fixed defects:

When using SAX, the entire document should not be loaded into memory.
NULL pointer exception in the sample CDFEditor.
Parser should not display warnings unless asked to.
XML4J code should not call printStackTrace().
Wild card defect in sample XJParse.
Stop using deprecated methods.
Start using JDK 1.2 JavaDoc for generating API documentation.

09 Nov 1998

Release 1.1.8

09 Nov 1998

Fixed defects:

The sample XJParse can now handle wild cards for filenames.
In the sample XJParse one can turn of fectching the DTD even though the DOCTYPE line is specified.
Fixed defect in the UTF8 decoder. Can now parse Jim Clark's valid/sa/052.xml.
Parser crashes while handling NOTATIONS.
StringPool#expandTable() crashes.
Fix defect in the way language ID's are checked. It should conform with section 2.12 of XML 1.0 Spec.
Normalization should handle entire sub-tree.
HTML DTD: TABLE content model's error.

30 Oct 1998

Release 1.1.7

30 Oct 1998

Fixed defects:

In an external DTD, a PE substitution at the end of a CDATA clause causes the parser to fail.
Parameter entity reference in a INCLUDE section generates error.
Bug in DTD getInsertableElementsForValidContent().
Make some methods in Parent class thread-safe.
Parser must check that default values for attributes conform to its type.

23 Oct 1998

Release 1.1.6

23 Oct 1998

Fixed defects:

Content Model matching fails against an enity reference.
ToXMLStringVisitor() prints entity declarations incorrectly.
TerminateSignal should not be ignored in readDTDStream().

09 Oct 1998

Release 1.1.5

09 Oct 1998

Fixed defects:

In the output of parser, 'quote' character is now represented as " instead of &#x22. Both Netscape and IE handle this correctly.

02 Oct 1998

Release 1.1.4

02 Oct 1998

Parser now conforms to DOM Level 1 Specification (01-Oct-1998).
Parser now additionally supports EBCDIC-CP-(DK, NO, FI, SE, IT, ES, GB, FR, AR1, HE, CH, ROECE, YU, IS, AR2) encodings.
Fixed defects:

SAX: Missing endElement() notification.
SAX: Distinguish fatalError() and error().

25 Sep 1998

Release 1.1.3

25 Sep 1998

Parser now supports EBCDIC-CP-(US, CA, NL) encodings ( = CP037 Java encoding).
Fixed defects:

SAX: reuse of parser and ErrorHandler calls.
Invalid peer exception in sample program 'GeneratingSample'.
CR's and CRLF's in CDATASection, Comment or PI's are not being replaced by LF.
Overwriting an attribute including entity references is incorrect.

18 Sep 1998

Release 1.1.2

18 Sep 1998

Parser can now be interrupted.
Fixed defects:

Fix byte mask in UTF8 Reader.

11 Sep 1998

Release 1.1.1

11 Sep 1998

Parser error message strings, in English, have been rewritten.
Documented, in README.html, limitation of MS JVM in lack of support for IANA encodings.
Performance enhancements. Optimized input string buffering.
Fixed defects:

Parser can now handle URN's as SYSTEMID.

04 Sep 1998

Release 1.1.0

04 Sep 1998

Namespace API Change: New return values of TXElement#getNamespaceForQName() and TXElement#getNamespaceForPrefix() methods.
Fixed defects:

Conform Util.getInvalidURIChar() to RFC2396.
getDigest() for EntityReference doesn't work.
Don't do anything when newChild parameter equals oldChild parameter in Parent#replaceChild().
An adopted child must sever connections with original parents.
Reduce memory requirements when using SAX.
PIs with 0-length PI data such as <?foo?> crashes the parser.
Parser reports errors when a PI occurs just before EOF in a document.
Add commands to XJParse for version # and name-space printing.
Leading '\' not handled properly in context of filenames.

28 Aug 1998

Release 1.0.9

28 Aug 1998

Support attribute-based namespace (WD-xml-names-19980802).
Two new sample DTD's are now bundled. HTML40frameset.xml.dtd and HTML40loose.xml.dtd.
Added printNonSpecifiedAttributes flag to ToXMLPrintVisitor.
Added '-stoponerror' command line option to samples.XJParse.XJParse.
Fixed defects:

Null pointer exception when com.ibm.xml.parser.SAXDriver.parse() is called a second time.
Validate only when target document has !DOCTYPE and one or more !ELEMENT declarations.
Parser should not stop after first validation error.
com.ibm.xml.parser.Parent.realInsert() should not call isCheckOwnerDocument() if it was not created by a factory.
#REQUIRED attributes return wrong getSpecified() flag.
Change ErrorListerner.error()method's return type from void to int.
']]>' terminating conditional sections in DTD's are not recognized.
Can't replace the root element in TXDocument.
TXNodeList#replace() doesn't set next/previousSibling of removed Node to null.
HTMLPrintVisitor should not print comments in interal DTD.
Parser, by default, now stops parsing after an error occurs as required by the XML spec.

21 Aug 1998

Conformance to DOM Level 1 Proposed Recommendation [18-Aug-1998]
Changes due to above conformance:

Replaced the files in org.w3c.dom package by java-binding.zip in PR-DOM.
Renamed

NodeList#getSize() to NodeList#getLength()
NamedNodeMap#getSize() to NamedNodeMap#getLength()
NodeType symbols (Node.ELEMENT to Node.ELEMENT_NODE, etc.)

Added

Node#getOwnerDocument()

Removed

DocumentFragment#getMasterDoc()

			Notation#setSystemId()

			Notation#setPublicId()

			Entity#setSystemId()

			Entity#setPublicId()

Fixed defects:

Use Exception instead of IOException in API (parser\FormatPrintVisitor.java ...).
Hexadecimal character references cause errors.
TXAttribute#toXMLString() prints contents twice.
TXCDATASection#getNodeType() doesn't return Node.CDATA_SECTION
HTMLPrintVisitor can't print empty content like "<BODY></BODY>".
HTMLPrintVisitor should not print entity references.
ToXMLStringVisitor prints replaced text instead of entity references in attribute values.
Shell scripts now work with a public domain shell (Cygnus-Win32).
SAX resolveEntity() handler that returns an InputSource now works.
DOM: TXAttribute has only String value, no value as child nodes.
XPointer#point() doesn't work against a tree including EntityR.
Document inherits from Node instead of DocumentFragment.
Node#insertBefore()/replaceChild()/appendChild() check types of children.
Attribute#getParentNode()/getPreviousSibling()/getNextSibling() always returns null.
Element#getElementsByTagName() returns all elements when the parameter is "*".
Removed the ElementFactory interface because all factory functions are moved to the TXDocument class.

14 Aug 1998

The jar file has been split into two: one for parser binaries, and the other for samples binaries.
Fixed defects:

Updated the programming guide.
Reduced memory occupied by Tree nodes.
Removed dependency on Symantec's JIT patch. Now its no longer necessary to install the JIT patch over 1.1.6.
Shell scripts (to compile all sources) should now work.
Correctly compiling Message_ja.java with -EUCJIS encoding.

07 Aug 1998

Moved all sample applications to toplevel/samples directory.
Fixed defects:

TreeView sample crashes with Null pointer exception.
TXAttribute#toXMLString() prints attribute value twice.
DTD#makeContentElementList() doesn't return null for EMPTY/ANY elements.
Use UNIX new line conventions in tar distribution.
DOM: TXElement#normalize() isn't implemented.
XPointer#point(TXDocument) should be point(Document).

31 Jul 1998

Parser now conforms to DOM-19980720 spec.
Fixed defects:

DTD#getInsertableElementsForValidContent() doesn't return correct result.
ContentModel#checkAfterTargetPosition() is wrong.
TXComment is always printed as "".
Parser crashes by NullPointerException in init2().
TXPI("foo", "bar") is printed as "<?foobar?>".
DOM: Factory methods in TXDocument aren't used.
ToXMLStringVisitor and FormatPrintVisitor print an internal DTD subset twice, GeneralReference(&foo;) and the reference's contents.
Parent#insertAfter() doesn't work correctly.
Parser#readDTDStream() aborts by NullPointerException.

24 Jul 1998

Release 1.0.4

24 Jul 1998

Added SCCS revision control strings to source files.

22 Jul 1998

Updated documentation.
New exceptions classes defined for TreeTraversal.

7 Jul 1998

Added new parameter to Parser#parseSingleContent() for alias feature
Added new sample: com.ibm.xml.sample.Alias and alias.dtd, alias-sample.xml
Added Stderr#loadCatalog()
Added "-c catalogfile" option to trlx.

6 Jul 1998

Fixed a bug of TXElement#addTextElement()

Modified util.TreeFactory for current DefaultElementFactory

Replaced util.XHFactory to util.HTMLPrintVisitor

Util.backReference() doesn't convert ' and

The parser never warn redefined entities for lt/gt/amp/quot/apos

Added new sample: com.ibm.xml.sample.HTMLPrint

3 Jul 1998

Moved Format#printSpace() to Util, Format#indent() to Util

Moved DefaultElementFactory#sortStringVector() to Util

Added new class: FormatPrintVisitor, and removed Format

Fixed a bug of Text#insert()

23 Jun 1998

Fixed a bug of a TextDecl in an external parameter entity in an external DTD subset (Parser.java, Token.java)
Fixed a bug that an encoding of DTD wasn't set (Parser.java)

19 Jun 1998

Fixed a bug of parameter entity references in an IGNORE section.

19 Jun 1998

Release 1.0.0

18 Jun 1998

Removed com.ibm.xml.xpointer.Version
Change format of com.ibm.xml.parser.Version
Added DTD#getInsertableElementsForValidContent()

16 Jun 1998

Moved Parser#setNamespace() to TXDocument#setNamespaceParameters()

Added Parser#getNumberOfWarnings()

Added Parser#setEndBy1stError()

12 Jun 1998

Public release.

12 Jun 1998

Added new class: com.ibm.xml.xpointer.Pointed

Some changes for XPointer#point()

Renamed XPointerSample to XPointerDemo

User javadoc in JDK-1.2beta3

11 Jun 1998

Removed com.ibm.xml.xpointer.RelTermArguments class

Added new method: XPointer#point()

Added new sample program: com.ibm.xml.sample.XPointerSample

10 Jun 1998

Renamed TXAttributeList#toArray() to makeArray() because of a conflict to java.util.Vector#toArray() in JDK-1.2beta

Fixed a bug of conditional section in parameter entities.

Fixed a bug of TXElement#attributeElements()

Added new methods: Child#makeXPointer(), XPointer#makeXPointer(Child)

Some changes for xpointer package.

Added -xpointer option to trlx

8 Jun 1998

Fixed a bug of TXElement#searchAncestors()

Moved searchAncestors() from TXElement to Child

Renamed Namespace#getNSNs()/setNSNs() to Namespace#getNSName()/setNSName()

Removed Namesapce#getNSPrefixName()/setNSPrefixName()

Added Namespace#getUniversalName()

Added TXElement#TXElement(TXDocument,String prefix,String localpart)

4 Jun 1998

Added

ElementFactory#createText(char[],int,int,boolean) and modify the parser to use this method insted
	of createText(String,boolean)

3 Jun 1998

Moved trlx and sample programs into com.ibm.xml.sample package

Added java.io.Serializable interface to object model classes.

Added new samples: com.ibm.xml.sample.SerializeSave and com.ibm.xml.sample.SerializeLoad

2 Jun 1998

Fixed some bugs of parameter entity

Fixed a bug of parsing NOTATION attribute

1 Jun 1998

Rewrite parameter entity processing

Integrate EntityValue and Entity

Added new method DTD#getEntity()

28 May 1998

Added Source#Source(InputStream,String) and Source#getEncoding()

Removed Parser#notifyNextEncoding()

Replace almost all code for parameter entities

27 May 1998

Removed TXElement#setUserData() and getUserData()

Removed some deprecated methods

Added attribute normalization (I had forgot to implement it ;-)

25 May 1998

Removed Parser#setDebugPrintName()

22 May 1998

Fixed a bug of more than one definitoins for the same attribute

Added Attlist#getAttDef(String)

Fixed a bug of more than one ID attribute in an element

Added Attlist#contains()

Attlist#addElement() returns boolean

Added `isParameter' to a constructor of Entity

Fixed a bug of chracter encoding detection when no XMLDecl

Fixed a bug of 0Byte entities (xmltest/valid/not-sa/001.xml)

20 May 1998

Improved error checking for TextDecl. (xmltest/not-wf/ext-sa/{002,003}.xml)

Fixed a bug of detection of UTF-16/UCS-2 encodings. (xmltest/valid/ext-sa/014.xml)

19 May 1998

Add a question about VisualAge for Java to FAQ.html.

Removed TXDocument#setRootName()

Removed debug code in ContentModel

Added new class: LibraryException

18 May 1998

Fixed a bug of SAX characters().

13 May 1998

Public release.

TO DO

Follow DOM and Namespace spec.
XLL support
Performance improvement
KNOWN PROBLEMS:
- NodeLists returned by getElementsByTagName() aren't LIVE.
- Multi threads safety

Limitation of MS JVM

Microsoft's JVM does not support the same encodings as the Sun's JVM implementation. So if you use any of these encodings (like ISO-8859-2) in your document in a Windows environment with Microsoft's JVM, you will get a run time error. This is a limitation of Microsoft's JVM.

Limitation of SUN JVM

Current releases of JVM from SUN Microsystems (JDK 1.1.6) do not correctly support EBCDIC encodings. Does not translate the new line character correctly.

IBM's implemenation of Java 1.1.6 correctly tranlates EBCDIC characters to Unicode.

Contact

Technical questions and comments to alphaWorks communityXchange or xml4j@us.ibm.com.

Non-technical questions to xml4j@us.ibm.com.

[ IBM | alphaWorks | XML for Java | communityXchange - XML for Java]

`FAQ.html`	FAQ
`README.html`	this file
`license.html`	license information
`apiDocs\`	directory for API documents
`docs\`	directory for documents
`data\personal.dtd`	sample DTD file
`data\personal.xml`	sample XML document
`src\`	directory for source code
`xml4j_n_n_n.jar`	contains parser class files
`xml4jSamples_n_n_n.jar`	contains samples class files
`scripts\`	directory for build scripts
samples\	sample XML4J applications

IBM's XML Parser for Java - README

Contents

Installation

Sample Applications

Program Development

Release Notes

CHANGES

TO DO

Limitation of MS JVM

Limitation of SUN JVM

Contact