IBM's XML Parser for Java - README


IBM's XML Parser for Java (XML4J) is an Extensible Markup Language (XML) processor written in Java. XML Parser for Java provides two main functions:


Contents


Installation

Windows95, WindowsNT, OS/2 (ZIP archive)
  1. Install JDK 1.1 and Swing 1.1 (for some samples. Only file you need is 'swingall.jar') or JDK 1.2.
  2. Install unzip or WinZip executable..
  3. Download the XML4J distribution package in ZIP format.
  4. Unzip the distribution package, xml4j_1_1_16.zip into a directory.
    C:\>unzip some_directory\xml4j_1_1_16.zip
    C:\>cd xml4j
    C:\xml4j>
    	    

    You will see the following files in the xml4j directory:

    FAQ.html

    FAQ

    README.html

    this file

    license.html

    license information

    apiDocs\

    directory for API documents

    docs\

    directory for documents

    data\personal.dtd

    sample DTD file

    data\personal.xml

    sample XML document

    src\

    directory for source code

    xml4j_1_1_16.jar

    contains parser class files

    xml4jSamples_1_1_16.jar

    contains samples class files

    scripts\

    directory for build scripts

    samples\

    sample XML4J applications

  5. Try the following command to test your installation. This test parses the input and then regenerates the same XML document.
    C:\xml4j>type data\personal.xml
    C:\xml4J>jre -cp xml4j_1_1_16.jar;xml4jSamples_1_1_16.jar samples.XJParse.XJParse -d data\personal.xml
    		
  6. This step is required only if you have installed JDK 1.1.6 or you experience a run-time fatal error while invoking 'XJParse'.

    This fatal error is because of a bug in the JIT (symcjit.dll) shipped with JDK 1.1.6. The fix is to apply a patch which can be downloaded from the JavaSoft website: http://www.javasoft.com/products/jdk/1.1/download-jdk-windows.html

    Installing the patch involves replacing symcjit.dll with the new one.

UNIX
  1. Install JDK 1.1 and Swing 1.1 (for some samples. Only file you need is 'swingall.jar') or JDK 1.2 and GNU gzip.
  2. Download a distribution package in .tar.gz format. (If you have installed the unzip command for UNIX, ZIP format is also Ok.)
  3. Extract the distribution package into a directory.
    # cd /usr/local
    # gzip -dc some_directory/xml4j.n.n.n.tar.gz | tar xvf -
    # cd xml4j
    	    

    You will see the following files in the xml4j directory:

    FAQ.html

    FAQ

    README.html

    this file

    license.html

    license information

    api/

    directory for API documents

    docs/

    directory for documents

    data/personal.dtd

    sample DTD file

    data/personal.xml

    sample XML document

    src/

    directory for source code

    xml4j_1_1_16.jar

    contains class files

    xml4jSamples_1_1_16.jar

    contains samples class files

    scripts/

    directory for build scripts

    samples/

    sample XML4J applications

  4. Try the following command to test your installation. This program parses the input and then regenerates the same XML document.
    # cat data/personal.xml
    # jre -cp "xml4j_1_1_16.jar:xml4jSamples_1_1_16.jar" samples.XJParse.XJParse -d data/personal.xml
    	    


Sample Applications

Some sample applications provided are (all classes required to run these sample applications are in xml4jSamples_1_1_16.jar. Remember, 'jre' ignores the CLASSPATH environment variable and so you have to specify any non-standard .jar files (like swing etc) explicitly using the -cp option):

samples.XJParse.XJParse (Java application, previously named 'trlx'):

XJParse is an XML syntax checker. To check an XML document, type:

jre -cp xml4j_1_1_16.jar;xml4jSamples_1_1_16.jar samples.XJParse.XJParse -d <xml-filename>
SiteOutliner (Java application):

SiteOutliner is a Java application that scans a Web site and reports its profile in CDF format. The profile contains a list of links to the pages, showing the structure of the site. The user can limit the files to be scanned by using some conditions, such as file types (extensions) and modified dates. The program can be used in both command prompt and window environments.

CDF Editor (Java application):

CDF Editor is a Java application to edit CDF files. The user loads a CDF file and edits the channels and items.
jre -cp xml4j_1_1_16.jar;xml4jSamples_1_1_16.jar samples.CdfEditor.CdfEditor

CDF Viewer (Java applet):

CDF Viewer is an applet that parses CDF files and visualizes their structures by using a tree.

Validating Generation sample (Java application):

This sample generates a valid element tree according to the specified DTD (specify full pathname to personal.dtd).
jre -cp xml4j_1_1_16.jar;xml4jSamples_1_1_16.jar samples.Miscellaneous.GeneratingSample e:\xml4j\data\personal.dtd

XML Tree-View (Java application):

A sample application using com.ibm.xml.parser.util.TreeFactory. If you use JDK 1.1, you need to install JFC 1.1 (Swing-1.1) to run this program. In the command line below, you will need replace C:/swing-1.1/ with the location of the swingall.jar file on your system.
jre -cp "xml4j_1_1_16.jar;xml4jSamples_1_1_16.jar;C:/swing-1.1/swingall.jar" samples.Miscellaneous.TreeView data\personal.xml

[A capture of TreeView]

XPointer Demonstration (Java application):

A sample application using com.ibm.xml.xpointer package. If you use JDK 1.1, you need to install JFC 1.1 (Swing-1.1) to run this program. In the command line below, you will need replace C:/swing-1.1/ with the location of the swingall.jar file on your system.
jre -cp "xml4j_1_1_16.jar;xml4jSamples_1_1_16.jar;C:/swing-1.1/swingall.jar" samples.Miscellaneous.XPointerDemo data\personal.xml
This program has 2 function:
Click a node:
Display an XPointer expression of clicked node on a text field.
Put an XPointer expression and press "Go" button:
Select nodes pointed by the XPointer.


On certain platforms, where 'jre' is not be available, you can run these samples using 'java'. For this you can edit the CLASSPATH environment variable to include the parser (xml4j_1_1_16.jar) and samples (xml4jSamples_1_1_16.jar) jar files.


Program Development

This distribution archive includes a file named xml4j_1_1_16.jar. Add this file to your CLASSPATH environment variable, writing a command such as

set CLASSPATH=C:\xml4j\xml4j_1_1_16.jar;. (for Windows)

(assuming that you have installed XML Parser for Java in C:\xml4j.)

setenv CLASSPATH /usr/local/xml4j/xml4j_1_1_16.jar:. (for UNIX, csh/tcsh)

export CLASSPATH="/usr/local/xml4j/xml4j_1_1_16.jar:." (for UNIX, ksh/bash/zsh)

The following resources are provided for application development:


Release Notes

This version of the processor is based on XML 1.0 Recomendation [10-Feb-1998]
The processor supports 37 encodings for `<?xml encoding="...."'
ISO-10646-UCS-4, ISO-10646-UCS-2, UTF-8, UTF-16, US-ASCII, ISO-8839-1 ... ISO-8859-9, ISO-2022-JP, Shift_JIS, EUC-JP, GB2312, Big5, EBCDIC-CP-(US, CA, NL, DK, NO, FI, SE, IT, ES, GB, FR, AR1, HE, CH, ROECE, YU, IS, AR2)
Validating Generation
Applications can recognize information in the Document Type Definition (DTD) and generate a document that has correct structure. See `How to query DTD information' in the programming guide.
W3C Document Object Model (DOM) Level 1 Specification [01-Oct-1998] Support:
Simple API for XML (SAX) 1.0 Support
com.ibm.xml.parser.SAXDriver provides the SAX interface.
Namespaces in XML Proposed Recommendation [17-Nov-1998] Support
See the programming guide.
Element Digest
See the DOMHASH document.
XPointer package
com.ibm.xml.xpointer package provides parsing XPointer expression, generating an XPointer instance from a node in a document tree, searching for nodes pointed by an XPointer instance.


CHANGES

2 Apr 1999
Release 1.1.16
2 Apr 1999
Fixed Defects:
TX Document cloning is broken.
Surrogate character bugs in XMLReader.
CM2op.toString() bug.
26 Mar 1999
Release 1.1.15
26 Mar 1999
Feature: DOMHash supports namespaces.
Feature: Added DTD.unregistID() for use with XPointer.
Fixed Defects:
TX DOM serializability problems.
Bugs in CMLeaf.equals(), CM2op.equals().
SAX Error handling bugs.
TXAttribute.cloneNode() does not clone all data
DTD.realInsert() doesn't add new entities to EntityPool.
XMLChar.java now compiles under Jikes 0.47.
XMLTreeViewer corner case bugs.
Javadoc incorrect: DTD.setName() deprecated.
Javadoc incorrect: TXDocument doesn't call createAttribute().
Added FAQ for Visual Age for Java 2.0.
Fixed URLs in Programming Guide.
1 Feb 1999
Release 1.1.14
1 Feb 1999
Correct Version numbering.
22 Jan 1999
Release 1.1.13
22 Jan 1999
License: To get better information on the commercial use of XML4J we have changed the license agreement. The Free Commercial Distribution License is STILL readily available. To get the free distribution license all you have to do is register (see license.html for details) and fill in a very short questionnaire. The free distribution license agreement will be emailed to the address specified in the registration form.

To avoid you having to supply the information each time you do a download, we have removed the distribution license from the archive (.zip and .tar.gz file). The archive now contains only an Evaluation License, but you get all the source and other technical assets as before.

Feature: User data Object on every DOM node.
Fixed defects:

TXDocument: Set doctype member to NULL when it is deleted.
Element#GetAttribute() should return null string, not NULL.
Accepts inappropriate non-initial chars in NAME.
Rejects (fatal) valid documents with xml:lang values. This error is now classified as a warning.
Undeclared PE's are flagged as fatal error. This error is now classified as a non-fatal error.
Declaration of unparsed entities in internal subset failed, when refering to notations declared in external subset.
PE nesting VC errors are fatal. They should be non-fatal.
Samples 'XPointerDemo' & 'XMLTreeViewer' do not print helpful error messages if you don't have Swing-1.1 installed.

08 Jan 1999
Release 1.1.12
08 Jan 1999
Feature: Get the owner element of attributes.
Feature: Get parameter entities information from DTD.
Fixed defects:
TXDocument#cloneNode() problem in subclass of TXDocument.
FormatPrintVisitor stops by the first ENTITY declaration.
SAX: incorrectly accepts text with literal ']]>'.
TXDocument#getDocumentElement() returns wrong value after replaceChild().

11 Dec 1998
Release 1.1.11
11 Dec 1998
Change Swing package from com.sum.java.swing to javax.swing. Now some samples require Swing 1.1 API.
Added statement in FAQ.html about 100% Pure Java compliance..
Fixed defects:
Null pointer exception in sample CdfViewer.
Access to protected member exception. This defect is not seen when XML4J source is compiled with JDK 1.2.
DOMDuplicator bug for CDATA.
Replace '//' by '#' in error message strings.

04 Dec 1998
Release 1.1.10
04 Dec 1998
New command line switch in sample XJParse to redirect errors from stderr to stdout.
Added new utility class com.ibm.xml.domutil.DOMDuplicator.
Add TXDocument#setAddFixedAttributes to reduce memory usage.
Fixed defects:
ClassCastException in TXElement#normalize().
Error in sample documentation file CdfViewer.html.
Set incorrect ignorable flag in TXElement#normalize().
SAX: Crash if oldSystemID is null.
DOM: Predefined entities should not make EntityReference nodes.

19 Nov 1998
Release 1.1.9
19 Nov 1998
Fixed defects:
When using SAX, the entire document should not be loaded into memory.
NULL pointer exception in the sample CDFEditor.
Parser should not display warnings unless asked to.
XML4J code should not call printStackTrace().
Wild card defect in sample XJParse.
Stop using deprecated methods.
Start using JDK 1.2 JavaDoc for generating API documentation.

09 Nov 1998
Release 1.1.8
09 Nov 1998
Fixed defects:
The sample XJParse can now handle wild cards for filenames.
In the sample XJParse one can turn of fectching the DTD even though the DOCTYPE line is specified.
Fixed defect in the UTF8 decoder. Can now parse Jim Clark's valid/sa/052.xml.
Parser crashes while handling NOTATIONS.
StringPool#expandTable() crashes.
Fix defect in the way language ID's are checked. It should conform with section 2.12 of XML 1.0 Spec.
Normalization should handle entire sub-tree.
HTML DTD: TABLE content model's error.

30 Oct 1998
Release 1.1.7
30 Oct 1998
Fixed defects:
In an external DTD, a PE substitution at the end of a CDATA clause causes the parser to fail.
Parameter entity reference in a INCLUDE section generates error.
Bug in DTD getInsertableElementsForValidContent().
Make some methods in Parent class thread-safe.
Parser must check that default values for attributes conform to its type.

23 Oct 1998
Release 1.1.6
23 Oct 1998
Fixed defects:
Content Model matching fails against an enity reference.
ToXMLStringVisitor() prints entity declarations incorrectly.
TerminateSignal should not be ignored in readDTDStream().

09 Oct 1998
Release 1.1.5
09 Oct 1998
Fixed defects:
In the output of parser, 'quote' character is now represented as &#34; instead of &#x22. Both Netscape and IE handle this correctly.

02 Oct 1998
Release 1.1.4
02 Oct 1998
Parser now conforms to DOM Level 1 Specification (01-Oct-1998).
Parser now additionally supports EBCDIC-CP-(DK, NO, FI, SE, IT, ES, GB, FR, AR1, HE, CH, ROECE, YU, IS, AR2) encodings.
Fixed defects:
SAX: Missing endElement() notification.
SAX: Distinguish fatalError() and error().

25 Sep 1998
Release 1.1.3
25 Sep 1998
Parser now supports EBCDIC-CP-(US, CA, NL) encodings ( = CP037 Java encoding).
Fixed defects:
SAX: reuse of parser and ErrorHandler calls.
Invalid peer exception in sample program 'GeneratingSample'.
CR's and CRLF's in CDATASection, Comment or PI's are not being replaced by LF.
Overwriting an attribute including entity references is incorrect.

18 Sep 1998
Release 1.1.2
18 Sep 1998
Parser can now be interrupted.
Fixed defects:
Fix byte mask in UTF8 Reader.

11 Sep 1998
Release 1.1.1
11 Sep 1998
Parser error message strings, in English, have been rewritten.
Documented, in README.html, limitation of MS JVM in lack of support for IANA encodings.
Performance enhancements. Optimized input string buffering.
Fixed defects:
Parser can now handle URN's as SYSTEMID.

04 Sep 1998
Release 1.1.0
04 Sep 1998
Namespace API Change: New return values of TXElement#getNamespaceForQName() and TXElement#getNamespaceForPrefix() methods.
Fixed defects:
Conform Util.getInvalidURIChar() to RFC2396.
getDigest() for EntityReference doesn't work.
Don't do anything when newChild parameter equals oldChild parameter in Parent#replaceChild().
An adopted child must sever connections with original parents.
Reduce memory requirements when using SAX.
PIs with 0-length PI data such as <?foo?> crashes the parser.
Parser reports errors when a PI occurs just before EOF in a document.
Add commands to XJParse for version # and name-space printing.
Leading '\' not handled properly in context of filenames.

28 Aug 1998
Release 1.0.9
28 Aug 1998
Support attribute-based namespace (WD-xml-names-19980802).
Two new sample DTD's are now bundled. HTML40frameset.xml.dtd and HTML40loose.xml.dtd.
Added printNonSpecifiedAttributes flag to ToXMLPrintVisitor.
Added '-stoponerror' command line option to samples.XJParse.XJParse.
Fixed defects:
Null pointer exception when com.ibm.xml.parser.SAXDriver.parse() is called a second time.
Validate only when target document has !DOCTYPE and one or more !ELEMENT declarations.
Parser should not stop after first validation error.
com.ibm.xml.parser.Parent.realInsert() should not call isCheckOwnerDocument() if it was not created by a factory.
#REQUIRED attributes return wrong getSpecified() flag.
Change ErrorListerner.error()method's return type from void to int.
']]>' terminating conditional sections in DTD's are not recognized.
Can't replace the root element in TXDocument.
TXNodeList#replace() doesn't set next/previousSibling of removed Node to null.
HTMLPrintVisitor should not print comments in interal DTD.
Parser, by default, now stops parsing after an error occurs as required by the XML spec.

21 Aug 1998
Conformance to DOM Level 1 Proposed Recommendation [18-Aug-1998]
Changes due to above conformance:
Replaced the files in org.w3c.dom package by java-binding.zip in PR-DOM.
Renamed
NodeList#getSize() to NodeList#getLength()
NamedNodeMap#getSize() to NamedNodeMap#getLength()
NodeType symbols (Node.ELEMENT to Node.ELEMENT_NODE, etc.)
Added
Node#getOwnerDocument()
Removed
DocumentFragment#getMasterDoc()
Notation#setSystemId()
Notation#setPublicId()
Entity#setSystemId()
Entity#setPublicId()
Fixed defects:
Use Exception instead of IOException in API (parser\FormatPrintVisitor.java ...).
Hexadecimal character references cause errors.
TXAttribute#toXMLString() prints contents twice.
TXCDATASection#getNodeType() doesn't return Node.CDATA_SECTION
HTMLPrintVisitor can't print empty content like "<BODY></BODY>".
HTMLPrintVisitor should not print entity references.
ToXMLStringVisitor prints replaced text instead of entity references in attribute values.
Shell scripts now work with a public domain shell (Cygnus-Win32).
SAX resolveEntity() handler that returns an InputSource now works.
DOM: TXAttribute has only String value, no value as child nodes.
XPointer#point() doesn't work against a tree including EntityR.
Document inherits from Node instead of DocumentFragment.
Node#insertBefore()/replaceChild()/appendChild() check types of children.
Attribute#getParentNode()/getPreviousSibling()/getNextSibling() always returns null.
Element#getElementsByTagName() returns all elements when the parameter is "*".
Removed the ElementFactory interface because all factory functions are moved to the TXDocument class.

14 Aug 1998
The jar file has been split into two: one for parser binaries, and the other for samples binaries.
Fixed defects:
Updated the programming guide.
Reduced memory occupied by Tree nodes.
Removed dependency on Symantec's JIT patch. Now its no longer necessary to install the JIT patch over 1.1.6.
Shell scripts (to compile all sources) should now work.
Correctly compiling Message_ja.java with -EUCJIS encoding.

07 Aug 1998
Moved all sample applications to toplevel/samples directory.
Fixed defects:
TreeView sample crashes with Null pointer exception.
TXAttribute#toXMLString() prints attribute value twice.
DTD#makeContentElementList() doesn't return null for EMPTY/ANY elements.
Use UNIX new line conventions in tar distribution.
DOM: TXElement#normalize() isn't implemented.
XPointer#point(TXDocument) should be point(Document).
31 Jul 1998
Parser now conforms to DOM-19980720 spec.
Fixed defects:
DTD#getInsertableElementsForValidContent() doesn't return correct result.
ContentModel#checkAfterTargetPosition() is wrong.
TXComment is always printed as "<!--null-->".
Parser crashes by NullPointerException in init2().
TXPI("foo", "bar") is printed as "<?foobar?>".
DOM: Factory methods in TXDocument aren't used.
ToXMLStringVisitor and FormatPrintVisitor print an internal DTD subset twice, GeneralReference(&foo;) and the reference's contents.
Parent#insertAfter() doesn't work correctly.
Parser#readDTDStream() aborts by NullPointerException.
24 Jul 1998
Release 1.0.4
24 Jul 1998
Added SCCS revision control strings to source files.
22 Jul 1998
Updated documentation.
New exceptions classes defined for TreeTraversal.
7 Jul 1998
Added new parameter to Parser#parseSingleContent() for alias feature
Added new sample: com.ibm.xml.sample.Alias and alias.dtd, alias-sample.xml
Added Stderr#loadCatalog()
Added "-c catalogfile" option to trlx.
6 Jul 1998
Fixed a bug of TXElement#addTextElement()
Modified util.TreeFactory for current DefaultElementFactory
Replaced util.XHFactory to util.HTMLPrintVisitor
Util.backReference() doesn't convert ' and
The parser never warn redefined entities for lt/gt/amp/quot/apos
Added new sample: com.ibm.xml.sample.HTMLPrint
3 Jul 1998
Moved Format#printSpace() to Util, Format#indent() to Util
Moved DefaultElementFactory#sortStringVector() to Util
Added new class: FormatPrintVisitor, and removed Format
Fixed a bug of Text#insert()
23 Jun 1998
Fixed a bug of a TextDecl in an external parameter entity in an external DTD subset (Parser.java, Token.java)
Fixed a bug that an encoding of DTD wasn't set (Parser.java)
19 Jun 1998
Fixed a bug of parameter entity references in an IGNORE section.
19 Jun 1998
Release 1.0.0
18 Jun 1998
Removed com.ibm.xml.xpointer.Version
Change format of com.ibm.xml.parser.Version
Added DTD#getInsertableElementsForValidContent()
16 Jun 1998
Moved Parser#setNamespace() to TXDocument#setNamespaceParameters()
Added Parser#getNumberOfWarnings()
Added Parser#setEndBy1stError()
12 Jun 1998
Public release.
12 Jun 1998
Added new class: com.ibm.xml.xpointer.Pointed
Some changes for XPointer#point()
Renamed XPointerSample to XPointerDemo
User javadoc in JDK-1.2beta3
11 Jun 1998
Removed com.ibm.xml.xpointer.RelTermArguments class
Added new method: XPointer#point()
Added new sample program: com.ibm.xml.sample.XPointerSample
10 Jun 1998
Renamed TXAttributeList#toArray() to makeArray() because of a conflict to java.util.Vector#toArray() in JDK-1.2beta
Fixed a bug of conditional section in parameter entities.
Fixed a bug of TXElement#attributeElements()
Added new methods: Child#makeXPointer(), XPointer#makeXPointer(Child)
Some changes for xpointer package.
Added -xpointer option to trlx
8 Jun 1998
Fixed a bug of TXElement#searchAncestors()
Moved searchAncestors() from TXElement to Child
Renamed Namespace#getNSNs()/setNSNs() to Namespace#getNSName()/setNSName()
Removed Namesapce#getNSPrefixName()/setNSPrefixName()
Added Namespace#getUniversalName()
Added TXElement#TXElement(TXDocument,String prefix,String localpart)
4 Jun 1998
Added ElementFactory#createText(char[],int,int,boolean) and modify the parser to use this method insted of createText(String,boolean)
3 Jun 1998
Moved trlx and sample programs into com.ibm.xml.sample package
Added java.io.Serializable interface to object model classes.
Added new samples: com.ibm.xml.sample.SerializeSave and com.ibm.xml.sample.SerializeLoad
2 Jun 1998
Fixed some bugs of parameter entity
Fixed a bug of parsing NOTATION attribute
1 Jun 1998
Rewrite parameter entity processing
Integrate EntityValue and Entity
Added new method DTD#getEntity()
28 May 1998
Added Source#Source(InputStream,String) and Source#getEncoding()
Removed Parser#notifyNextEncoding()
Replace almost all code for parameter entities
27 May 1998
Removed TXElement#setUserData() and getUserData()
Removed some deprecated methods
Added attribute normalization (I had forgot to implement it ;-)
25 May 1998
Removed Parser#setDebugPrintName()
22 May 1998
Fixed a bug of more than one definitoins for the same attribute
Added Attlist#getAttDef(String)
Fixed a bug of more than one ID attribute in an element
Added Attlist#contains()
Attlist#addElement() returns boolean
Added `isParameter' to a constructor of Entity
Fixed a bug of chracter encoding detection when no XMLDecl
Fixed a bug of 0Byte entities (xmltest/valid/not-sa/001.xml)
20 May 1998
Improved error checking for TextDecl. (xmltest/not-wf/ext-sa/{002,003}.xml)
Fixed a bug of detection of UTF-16/UCS-2 encodings. (xmltest/valid/ext-sa/014.xml)
19 May 1998
Add a question about VisualAge for Java to FAQ.html.
Removed TXDocument#setRootName()
Removed debug code in ContentModel
Added new class: LibraryException
18 May 1998
Fixed a bug of SAX characters().
13 May 1998
Public release.


TO DO


Limitation of MS JVM

Microsoft's JVM does not support the same encodings as the Sun's JVM implementation. So if you use any of these encodings (like ISO-8859-2) in your document in a Windows environment with Microsoft's JVM, you will get a run time error. This is a limitation of Microsoft's JVM.


Limitation of SUN JVM

Current releases of JVM from SUN Microsystems (JDK 1.1.6) do not correctly support EBCDIC encodings. Does not translate the new line character correctly.

IBM's implemenation of Java 1.1.6 correctly tranlates EBCDIC characters to Unicode.


Contact

Technical questions and comments to alphaWorks communityXchange or xml4j@us.ibm.com.

Non-technical questions to xml4j@us.ibm.com.


[ IBM | alphaWorks | XML Parser for Java | communityXchange - XML Parser for Java]