IBM's XML Parser for Java - Programming Guide

How to get the parse tree and capture errors
How to traverse the parse tree
White Space
How to get a filtered parse tree
How to make new XML document
Adding Parser functionality by subclassing TX classes
Query DTD information
How can I turn off validation
Namespace
Element Digest
What names should I use in encoding declaration of XML files

How to get the parse tree and capture errors

The simplest example: read and parse an XML file

In the following program, the parse tree is read in from a file. The lines in red are the key statements that initiate the creation of the Parser object, and the initial read from the file.

import com.ibm.xml.parser.*; import java.io.*; public class GetParseTree { public static void main (String args[]) { String filename = null; if (args.length > 0) { filename = args[0]; if (filename != null) { InputStream is; try { is = new FileInputStream(filename); } catch (FileNotFoundException notFound) { System.err.println(notFound); return; } //*** The doc is the root of the DOM Tree. It is of type //*** TXDocument which implements the DOM Document interface. TXDocument doc = new Parser(filename).readStream(is); } } } }

Note that Parser#readStream() never returns null. By default, the parser prints parse errors to the standard error stream.

A Parser instance cannot be reused. An application can call Parser#readStream() method only once.

Write the parse tree to a stream or file

You can output the parse tree in XML format into a stream. These lines can be inserted right after the last statement above,
and the XML will be echoed to standard output.

                //*** Output Document as XML
                String charset = "ISO-8859-1";    // MIME charset name
                String jencode = MIME2Java.convert(charset);
                PrintWriter pw;
                try {
                    pw = new PrintWriter(new OutputStreamWriter(System.out, jencode));
                } catch (UnsupportedEncodingException unsupported) {
                    System.err.println(unsupported);
                    return;
                }
                doc.setEncoding(charset);
                try {
                    doc.print(pw, jencode);
                } catch (IOException io) {
                    System.err.println(io);
                    return;
                    
                }

To access, traverse, and modify a parse tree, use TXDocument#getDocumentElement()
(We will cover this below in How to traverse the parse tree).

Set parse options

You can configure the parser's behavior after making a Parser instance, and before the call of readStream(). The following functions may be called:

For example:

....

                Parser parser = new Parser(filename);
                //*** Set some Parse options...
                parser.setWarningNoDoctypeDecl(false);
                parser.setWarningNoXMLDecl(false);
                parser.setPreserveSpace(false);
                parser.setKeepComment(false);
                TXDocument doc = parser.readStream(is);
....

Redirect parse errors

You can control output of errors produced by the parser. You might want to do this if you want to handle your own errors, or if you don't want error messages printed to stderr. To handle your own errors, make an instance of a class implementing the interface ErrorListener, and then specify the instance to Parser constructor.

The Object key parameter of the error() method is an instance of String or Exception.. When key is a String, it means a type of error (See a source com/ibm/xml/parser/r/Message.java).

Let the parser be silent (disable error output to stderr)

import com.ibm.xml.parser.*;
import java.io.*;

class ErrorIgnorer implements ErrorListener {
          public int error(String fname, int lineno,
                            int charoff, Object key, String mes) {
              // do nothing
              return 0;
          }
}

    
....
                //*** Parser uses ErrorIgnorer class
                Parser parser = new Parser(filename, new ErrorIgnorer(), null);
....

Insert errors into an AWT TextArea

import com.ibm.xml.parser.*;
import java.io.*;
import java.awt.TextArea;
import java.awt.Frame;

class ErrorEater extends TextArea implements ErrorListener {
    
    public int error(String fname, int lineno,
                    int charoff, Object key, String mes) {
        append( fname+":"+lineno+":"+mes+"\n");
        return 1;
    }
}

public class UseErrorEater {

    public static void main (String args[]) {
        
        String filename = null;
        if (args.length > 0) {
            filename = args[0];

            if (filename != null) {
                InputStream is;
                try {
                    is = new FileInputStream(filename);
                } catch (FileNotFoundException notFound) {
                    System.err.println(notFound);
                    return;
                }

                //*** Parser uses ErrorEater TextArea class
                ErrorEater ee = new ErrorEater();
                Frame f = new Frame();
                // allows us to close the frame with the mouse.
                f.addWindowListener(new java.awt.event.WindowAdapter() {
                    public void windowClosing(java.awt.event.WindowEvent e) {
                        System.exit(0);
                    }
                });
                    
                f.setSize(400,300);
                f.add("Center", ee);
                f.show();
                //*** Here is the usage of our ErrorEater...
                Parser parser = new Parser(filename, ee, null);
                TXDocument doc = parser.readStream(is);
            }
        }
    }
}

Here is one source file which includes everything we've learned so far, GetParseTreeAndErrors.java.

See the sources, com/ibm/xml/parser/trlxml.java, com/ibm/xml/parser/Stderr.java for additional information.

How to traverse the parse tree

Using only the DOM interfaces

Since our Parser's TXDocument, and TXElement are implementation classes of the DOM interfaces Document and Element
respectively, you can write a client which can traverse the tree using only the DOM interfaces, without refering to our implementation classes. For example:

....

import org.w3c.dom.*;

....

    //*** Only refer to DOM Interfaces...
    public static void traverseDOMBranch(Node node) {
        // do what you want with this node here...
        System.out.println(node.getNodeName()+":"+node.getNodeValue());
        
        if (node.hasChildNodes()) {
            NodeList nl = node.getChildNodes();
            int size = nl.getLength();
            for (int i = 0; i < size; i++) {
                traverseDOMBranch(nl.item(i));
            }
        }
    }
....

                //*** Note how we refer only to DOM Interface references.
                Document doc = parser.readStream(is);

                Element root = (Element)doc.getDocumentElement();
                traverseDOMBranch(root);

You might be wondering when to use the TXDocument classes instead of the DOM classes. The TX* classes provide additional features not present in the DOM, such as the ability to ask whether inserting an Element as a child is valid.

Using the TX classes to traverse and manipulate the parse tree

TXDocument can have one TXElement instance, zero or one DTD instances, and instances of TXPI and TXComment as children. All children of TXDocument can be accessed with TXDocument#getChildren() / TXDocument#getChildrenArray(). The TXElement instance can be accessed with TXDocument#getDocumentElement() also.

TXElement can have some instances of TXElement, TXText, TXPI, and TXComment as children. All children of TXElement can be accessed with TXElement#getChildren() / TXElement#getChildrenArray().

Some methods of TXDocument and TXElement returns instance(s) of Child interface. These Child instances are also instances of TXElement or TXText or TXPI or TXComment or DTD(if a child of TXDocument). To know what class an instance belongs to, use Node#getNodeType() or instanceof operator like this:

import com.ibm.xml.parser.*;
import org.w3c.dom.*;
import java.util.Enumeration;
    ....
    //*** Refer to TX classes...
    public static void traverseTX(Node node) {
        // do what you want with this node here...
        if (node instanceof TXElement) {
            TXElement el = (TXElement)node;
            // Do fancy TXElement stuff here...
            System.out.println(node.getNodeName()+":"+node.getNodeValue());
        } else if (node instanceof TXText) {
            TXText te = (TXText)node;
            // Do fancy TXText stuff here...
            System.out.println(node.getNodeName()+":"+node.getNodeValue());
        } 

        if (node.hasChildNodes()) {
            NodeList nl = node.getChildNodes();
            int size = nl.getLength();
            for (int i = 0; i < size; i++) {
                traverseTX(nl.item(i));
            }
        }
    }

....

                Document doc = parser.readStream(is);

                Element root = (Element)doc.getDocumentElement();
                traverseTX(root);

These traversal examples may be found in ParseTree.java.

White Space

The XML4J parser keeps all spaces and passes them to applications, according to 2.10 White Space Handling in XML 1.0 Proposed Recommendation. The processor sets the IsIgnorableWhitespace() flag to true on TXText instances which consist of only white spaces.

<MEMBERS>
  <PERSON>Hiroshi</PERSON>
  <PERSON>Naohiko</PERSON>
  <PERSON>
    Kent
  </PERSON>
</MEMBERS>

XML4J parses this Element in the following way:

TXElement (getName():"MEMBERS", getText():"\n  Hiroshi\n  Naohiko\n  \n    Kent\n  \n")
  TXText ("\n  ", ignorable)
  TXElement (getName():"PERSON", getText():"Hiroshi")
    TXText ("Hiroshi")
  TXText ("\n  ", ignorable)
  TXElement (getName():"PERSON", getText():"Naohiko")
    TXText ("Naohiko")
  TXText ("\n  ", ignorable)
  TXElement (getName():"PERSON", getText():"\n    Kent\n  ")
    TXText ("\n    Kent\n  ")
  TXText ("\n", ignorable)

You might find it useful to call TXText#trim(String) / TXText#trim(String,boolean,boolean) when your application does not need to retain leading or trailing spaces.

How to get a "filtered" parse tree

Creating an end tag hook

In this example, we create a "hook", so we can do processing when specific end tags are encountered.

import com.ibm.xml.parser.*;
import java.io.*;

// AElementHandler handles an Element and doesn't Filter it
class AElementHandler implements ElementHandler {
     public TXElement handleElement(TXElement el) {
        System.out.println("handling:"+el.getName());
        return el;
     }
 }
....

            Parser parser = new Parser(filename);
            parser.addElementHandler(new AElementHandler(), "SPEECH");
            TXDocument doc = parser.readStream(is);

This ElementHandler#handleElement() method is called after parsing each end tag (</SPEECH>), and before being added to a parent while processing Parser#readStream(). The parser adds to the parent a TXElement instance returned by handleElement(). If handleElement() returns null, the parser doesn't add this TXElement instance to the parent.

Filtering out specific tags

This second example shows you how to filter out tags, by not allowing them to be placed into the parse tree.

import com.ibm.xml.parser.*;
import java.io.*;
// FilterElementHandler handles an Element and Filters LINE elements...
class FilterElementHandler implements ElementHandler {
     public TXElement handleElement(TXElement el) {
        System.out.println("handling:"+el.getName());
        if (el.getName().equals("LINE"))
            return null;
        else
            return el;
     }
 }

....

            Parser parser = new Parser(filename);
            parser.addElementHandler(new FilterElementHandler());
            TXDocument doc = parser.readStream(is);

There are two methods to set ElementHandler:

For a specific TXElement
addElementHandler(handler, "SPEECH");
This handler is called for each </SPEECH> tag encountered.
For all TXElement
addElementHandler(handler);
This handler is called for each end tag.

Remember, to filter out tags, return null from handleElement(). These examples can be found in ElementHandlers.java.

The Order of Calling ElementHandlers

When more than one ElementHandler is registered with the parser, the parser will first call ElementHandlers for specific TXElement's (first set, first called) and then will call ElementHandlers for all TXElement.

Even if an ElementHandler changes the name of a TXElement, the parser calls other ElementHandlers with the original name. When an ElementHandler returns null, the parser stops calling other ElementHandlers.

    Parser parse = new Parser(...);
    parse.addElementHandler(handler1);
    parse.addElementHandler(handler2, "SPEECH");
    parse.addElementHandler(handler3, "SPEECH");
    parse.addElementHandler(handler4);
    TXDocument doc = parse.readStream(is);

In this case, when the parser processes the </SPEECH> tag, the parser calls handler2 first, then handler3, handler1 and handler4, in that order.

How to make new XML document

Make a TXDocument instance.

TXDocument doc = new TXDocument();

Create something, a DTD or root Element.
Element root = doc.createElement("ROOT"));

Append the newly created Element.
doc.appendChild(root);

Append something to the root Element you have added.

root.appendChild(doc.createElement("FOO"));

Common XML tasks

Use the quick reference below to see how to do common XML tasks using XML4J.

to create this XML representation: Use this code:

<?xml version="1.0" encoding="ISO-8859-1"?> TXDocument doc = new TXDocument();
doc.setVersion("1.0");
doc.setEncoding("ISO-8859-1");

<?footarget foodata?> TXPI pi = (TXPI)doc.createProcessingInstruction("footarget","foodata");

<?footarget?> TXPI pi = (TXPI)doc.createProcessingInstruction("footarget", "");

 TXComment comm = (TXComment)doc.createComment(" comment ");

<!DOCTYPE ROOT SYSTEM "root.dtd"> DTD dtd = doc.createDTD("ROOT", new ExternalID("root.dtd"));

<!DOCTYPE ROOT [...]> DTD dtd = doc.createDTD("ROOT", null);
dtd.addElement(...);

<!ELEMENT ROOT EMPTY> ElementDecl ed = doc.createElementDecl("ROOT", doc.createContentModel(ElementDecl.EMPTY));

<!ELEMENT ROOT (#PCDATA|FOO|BAR)*> CMNode model = new CM1op('*', new CM2op('|', new CM2op('|', new CMLeaf("#PCDATA"), new CMLeaf("FOO")), new CMLeaf("BAR")));
ContentModel cm = doc.createContentModel(model);
ElementDecl ed = fatory.createElementDecl("ROOT", cm);
or

ContentModel cm = doc.createContentModel(ElementDecl.MODEL_GROUP);
cm.setPseudoContentModel("(#PCDATA|FOO|BAR)*");
ElementDecl ed = doc.createElementDecl("ROOT", cm);
(A DTD including this instance can't be used for validation. It can only be used for printing.)

<!ELEMENT ROOT (FOO?, (DL|DD)+, BAR*)> CMNode model = new CM2op(',', new CM2op(',', new CM1op('?', new CMLeaf("FOO")), new CM1op('+', new CM2op('|', new CMLeaf("DL"), new CMLeaf("DD")))),new CM1op('*', new CMLeaf("BAR")));
ContentModel cm = doc.createContentModel(model);
ElementDecl ed = doc.createElementDecl("ROOT", cm);
or

ContentModel cm = doc.createContentModel(ElementDecl.MODEL_GROUP);
cm.setPseudoContentModel("(FOO?, (DL|DD)+, BAR*)");
ElementDecl ed = doc.createElementDecl("ROOT", cm);
(A DTD including this instance can't be used for validation. It can only be used for printing.)

<!ATTLIST ROOT
att1 CDATA #IMPLIED
att2 (A|B|O|AB) "A"> Attlist al = doc.createAttlist("ROOT");
AttDef ad = doc.createAttDef("att1");
ad.setDeclaredType(AttDef.CDATA);
ad.setDefaultType(AttDef.IMPLIED);
al.addElement(ad);
ad = doc.createAttDef("att2");
ad.setDeclaredType(AttDef.NAME_TOKEN_GROUP);
ad.addElement("A");
ad.addElement("B");
ad.addElement("O");
ad.addElement("AB");
ad.setDefaultStringValue("A");
al.addElement(ad);

<!NOTATION png SYSTEM "viewpng.exe"> TXNotation no = doc.createNotation("png", new ExternalID("viewpng.exe"));

<!ENTITY version.num "1.1.6"> Entity ent = doc.createEntityDecl("version.num", "1.1.6", false);

<!ENTITY version.num SYSTEM "version.ent"> Entity ent = doc.createEntityDecl("version.num", new ExternalID("version.ent"), null);

<!ENTITY logoicon SYSTEM "logo.png" NDATA png> Entity ent = doc.createEntityDecl("logoicon", new ExternalID("logo.png"), "png");

<ROOT att1="val1" att2="val2">any text</ROOT> TXElement el = doc.createElement("ROOT");
el.setAttribute("att1", "val1");
el.setAttribute("att2", "val2");
el.addElement(doc.createText("any text"));

<![CDATA[any text]]> TXCDATASection cd = (TXCDATASection)doc.createCDATASection("any text");

&foobar; GeneralReference gr = (GeneralReference)doc.createEntityReference("foobar");

NOTE: Any XML node can be created manually using the PseudoNode construct, i.e. with `new PseudoNode("literal");'. For example, `dtd.addElement(new PseudoNode("<!ELEMENT ROOT (FOO, BAR)*>"));' will create `<!ELEMENT ROOT (FOO, BAR)*>'. However, you can use a tree including PseudoNode instances only for printing.

Example: Creating a DOM tree with an inline DTD

Here is an example program which uses a subset from the table above to generate a DOM tree with an inline DTD. The program also prints out the XML. If you redirect this to a file, you can check for correctness using XJParse.

import com.ibm.xml.parser.*;
import java.io.*;
import org.w3c.dom.*;

/**
 * This class tests the various table entries in the 
 * Programming Guide - guide.html, under the section
 *
 * How to make a new XML document.
 *
 * This is to verify the code snippets for accuracy with
 * the latest DOM.
 */
public class MakeNewDocument {

    public static void main (String args[]) {
        
        TXDocument doc = new TXDocument();
//<?xml version="1.0"
//encoding="ISO-8859-1"?> 
        doc.setVersion("1.0");
        doc.setEncoding("ISO-8859-1"); 
//<?footarget foodata?> 
        TXPI pi = (TXPI)doc.createProcessingInstruction("footarget", " foodata"); 
        doc.appendChild(pi);
// <!-- comment --> 
        TXComment comm = (TXComment)doc.createComment(" comment "); 
        doc.appendChild(comm);
//<!DOCTYPE ROOT [...]> 
        DTD dtd = doc.createDTD("ROOT", null);
        doc.appendChild(dtd);
        
//<!ELEMENT ROOT (#PCDATA|FOO|BAR)*> 
        CMNode model = new CM1op('*', new CM2op('|', new
        CM2op('|', new CMLeaf("#PCDATA"), new CMLeaf("FOO")),
        new CMLeaf("BAR")));
        ContentModel cm = doc.createContentModel(model);
        ElementDecl ed = doc.createElementDecl("ROOT", cm); 
        
        dtd.appendChild(ed);
         
        ElementDecl foodecl =  doc.createElementDecl("FOO", 
            doc.createContentModel(new CMLeaf("#PCDATA"))); 
        ElementDecl bardecl = doc.createElementDecl("BAR", 
            doc.createContentModel(new CMLeaf("#PCDATA"))); 
            
        dtd.appendChild(foodecl);
        dtd.appendChild(bardecl);

 //<!ATTLIST ROOT
 //att1 CDATA #IMPLIED
 //att2 (A|B|O|AB) "A"> 
        Attlist al = doc.createAttlist("ROOT");
        AttDef ad = doc.createAttDef("att1");
        ad.setDeclaredType(AttDef.CDATA);
        ad.setDefaultType(AttDef.IMPLIED);
        al.addElement(ad);
        ad = doc.createAttDef("att2");
        ad.setDeclaredType(AttDef.NAME_TOKEN_GROUP);
        ad.addElement("A");
        ad.addElement("B");
        ad.addElement("O");
        ad.addElement("AB");
        ad.setDefaultStringValue("A");
        al.addElement(ad);
                           
        dtd.appendChild(al);
        
 //<!NOTATION png SYSTEM "viewpng.exe"> 
        TXNotation no = doc.createNotation("png", new
        ExternalID("viewpng.exe")); 
        
        dtd.appendChild(no);
 //<!ENTITY version.num "1.1.6"> 
        Entity ent = doc.createEntityDecl("version.num", "1.1.6", false); 
        dtd.appendChild(ent);
 //<ROOT att1="val1"
 //att2="val2">any
 //text</ROOT> 
        TXElement rt = (TXElement)doc.createElement("ROOT");
        rt.setAttribute("att1", "val1");
        rt.setAttribute("att2", "B");
        rt.appendChild(doc.createTextNode("any text")); 
        TXElement foo = (TXElement)doc.createElement("FOO");
        TXElement bar = (TXElement)doc.createElement("BAR");
        rt.appendChild(foo);
        rt.appendChild(bar);
        doc.appendChild(rt);
         
        String encode = MIME2Java.convert("ISO-8859-1");
        PrintWriter pw;
        try {
            pw = new PrintWriter(new OutputStreamWriter(System.out, encode));
        } catch (UnsupportedEncodingException badEncoding) {
        System.err.println(badEncoding);
        return;
        }
        
        doc.setEncoding("ISO-8859-1");
        try {
            doc.print(pw, encode);
        } catch (IOException io) {
            System.err.println(io);
        }
    }
}

Adding parser functionality by subclassing TX classes

If you want to add functionality to the TXElement class you must subclass TXElement. Then you must subclass the TXDocument class and call Parser#setElementFactory(my-new-TXDocument-subclass).The setElementFactory function is still named so for backwards compatibility.

Design a subclass of TXElement class
Design a subclass of TXDocument class.
Call Parser#setElementFactory() with an instance of the class implementing TXDocument.

For example, suppose we wanted to write an XML browser or editor. Not only do we need to remember and store errors away, but we will need to map these errors to DOM nodes. To do this, we need to map the current node to the error as the XML is being parsed. Well, we've already learned how to subclass the ErrorListener. And we've learned how to subclass the ErrorHandler, which we could use to keep track of the current TXElement. However, now we want to capture errors on Nodes, whether they are TXElements or anything else.

 class MyElement extends TXElement {
     ....
 }
 class MyText extends TXText {
     ....
 }
 class MyFactory extends TXDocument {
    //*** The current Node!
    public static Node currentNode;

     public TXElement createElement(String name) {
         MyElement el = new MyElement(name);
         el.setFactory(this);
         currentNode = el;
         return el;
     }
     public TXText createText(String data, boolean ignorable) {
         MyText te = new MyText(data);
         te.setFactory(this);
         te.setIsIgnorableWhitespace(ignorable);
         currentNode = te;
         return te;
     }
     ....
 }
 class ErrorFlagger implements ErrorListener {

    static Hashtable errorNodes = new Hashtable();
    static Object previous;
    static String errorString;

    public void error(String fname, int lineno,
                    int charoff, Object key, String mes) {
        errorString = mes;
        previous = errorNodes.get(MyFactory.currentNode);
        if (previous != null)
            errorString += (String)previous;
        errorNodes.put(MyFactory.currentNode, errorString);
    }
    
    public static String getError(Node node) {
        return errorNodes.get(node);
    }
    
    public static Hashtable getErrorNodes() {
        return errorNodes.clone();
    }
 }

     ....
     Parser parse = new Parser(...);
     parse.setElementFactory(new MyFactory());
     TXDocument doc = parse.readStream(is);
     // doc has MyElement instances instead of  TXElement instances
     // doc has MyText instances instead of  TXText instances

Now, we have a mapping in ErrorFlagger between DOM Nodes and error messages. If we wanted to capture more or all error information, we can create a class/object to wrap all the error functions and put them into the errorNodes Hashtable instead of a simple String. Obviously, this example isn't very safe, and a few more wrapping functions should be added.

NOTE:

Querying DTD information

Loading a DTD without loading a document

String systemlit = "http://.../foobar.dtd"; InputStream is = (new URL(systemlit)).openStream(); Parser parse = new Parser(...); DTD dtd = parse.readDTDStream(is);

What attributes can be set in element "FOO"?

Enumeration en = dtd.getAttributeDeclarations("FOO"); while (en.hasMoreElements()) { AttDef attd = (AttDef)en.nextElement(); // attd.getName() is attribute name }

What value can an attribute have?

First, get AttDef instance by the above method or by DTD#getAttributeDeclaration(String,String).

Second, check the attribute type by AttDef#getDeclaredType(), which returns one of the following values.

AttDef.CDATA

Any text value.

AttDef.ENTITIES

A subset of unparsed entity names. Names are chained with " " when you want to specify more than one value. For example: "name1 name2 name3".

    Enumeration en = dtd.getEntities();
    while (en.hasMoreElements()) {
        EntityValue ev = (EntityValu)en.nextElement();
        if (ev.isNDATA()) {
            // Each ev.getName() is valid value.
        }
    }

AttDef.ENTITY

One of unparsed entity names (See above).

AttDef.NAME_TOKEN_GROUP

One of AttDef#elements().

    Enumeration en = attd.elements();
    while (en.hasMoreElements()) {
        String s = (String)en.nextElement();
        // Each s is valid.
    }

AttDef.ID

Any Name which DTD#checkID() returns null.

    String newid = ...
    if (null != dtd.checkID(newid)) {
        // Can't use newid
    } else
        dtd.registID(element, newid);

AttDef.IDREF

One of registered IDs.

    Enumeration en = dtd.IDs();
    while (en.hasMoreElements()) {
        String id = (String)en.nextElement();
        // The attribute can has one in a set of each id.
    }

AttDef.IDREFS

A subset of registered IDs. IDs are chained with " " when you want to specify more than one value.

AttDef.NMTOKEN

One Nmtoken.

AttDef.NMTOKENS

A set of Nmtoken. Nmtokens are cahined with " " when you want to specify more than one values.

AttDef.NOTATION

One of AttDef#elements().

    Enumeration en = attd.elements();
    while (en.hasMoreElements()) {
        String s = (String)en.nextElement();
        // Each s is valid.
    }

What element can be inserted into an element "FOO" as a child?

<!ELEMENT PERSON (NAME, HEIGHT, WEIGHT, EMAIL?)>

This DTD declaration means that we must first insert a "NAME" element, then a "HEIGHT" element, a "WEIGHT" element, and finally we may optionally insert an "EMAIL" element.

Applications can get information about these validation rules using DTD#getInsertableElements() or DTD#getAppendableElements().

    TXElement el = new TXElement("PERSON");
    ....
    switch (dtd.getContentType("PERSON")) {
      case 0:
        // This element is not declared.
        break;
      case ElementDecl.EMPTY:
        // No element is insertable.
        break;
      case ElementDecl.ANY:
        // Any element is insertable.
        break;
      case ElementDecl.MODEL_GROUP:
        Hashtable tab = dtd.prepareTable("PERSON");
            // This hashtable is reusable for any elements.
        dtd.getAppendableElement(el, tab);
        if (((InsertableElement)tab.get(DTD.CM_ERROR)).status) {
            // This element has incorrect structure.
        } else {
            Enumeration en = tab.elements();
            while (en.hasMoreElements()) {
                InsertableElement ie = (InsertableElement)en.nextElement();
                if (!ie.name.equals(DTD.CM_ERROR)
                    && !ie.name.equals(DTD.CM_EOC)
                    && ie.status) {
                    if (ie.name.equals(DTD.CM_PCDATA)) {
                        // Can append TextElement instance to el.
                    } else {
                        // Can append Element instance named ie.name.
                    }
                }
            }
        }
        break;
    }

How can I turn off validation

XML4J allows you to turn off any validation of the content model. This means that the parser will not check whether the xml document being parsed follows the definition in the DTD file. You will still get an error if the parser cannot find the .dtd file refered to in the DOCTYPE line.

To turn off validation

Make a subclass of TXDocument.
Override isCheckValidity() to return false.
Call Parser#setElementFactory() with an instance of this subclass.

Here is the code illustrating how this can be done.

... Parser p = new Parser(fileName); p.setElementFactory(new TXDocument() { public boolean isCheckValidity() { return false; } }); TXDocument doc = p.readStream(is); ...

Namespaces

NOTE: The namespace specification is Work in Progress. The implementation of namespaces in XML4J is experimental.

By default, the XML4J processor doesn't understand namespaces. Call Parser#setProcessNamespace(true) when you need to enable the namespace feature.
When enabled, namespaces are supported in element names and attribute names.
Element#getTagName() and Node#getNodeName() always return qualified names (Prefix+localPart).

Element Digest (DOMHash)

TXElement / TXText / TXComment / TXPI have getDigest() methods, which can be used to return hash digest values calculated for subtrees of the parse tree. The getDigest() method defaults to returning a 128bit MD5 hash code for the current element and all its children. Therefore, when a child element is modified, each parent element's getDigest() will return a new digest value.

You can use getDigest for fast and efficient comparison of two parse trees, or for detecting changes in XML documents.

See the DOMHASH document for additional details.

What names should I use in encoding declaration of XML files

As per the XML specification, 'encoding=' declaration is only optional in XML files in UTF-8 encoding. If the XML file is in any other encoding, then the 'encoding' attribute must be present in the XML header declaration in the XML file (in the first line). The value of the 'encoding' attribute must be a supported encoding name. A list of names of encodings supported by XML4J may be found in the file com.ibm.xml.parser.MIME2Java.html.