All Packages Class Hierarchy This Package Previous Next Index
java.lang.Object | +----hplb.xml.Parser
This class has very shallow (no) understanding of HTML. Correct handling of <p> tags requires some special code as does correct handling of <li>. This parser doesn't know that an "li" tag can be terminated by another "li" tag or a "ul" end tag. Hence "li" is treated as an empty tag here which means that in the generated parse tree the children of the "li" element are represented as siblings of it.
protected Hashtable emptyElms
protected Hashtable terminators
protected Tokenizer tok
protected DOM dom
protected Document root
protected Node current
public Parser()
public DOM setDOM(DOM dom)
public Tokenizer getTokenizer()
public void addEmptyElms(String elms[])
public void clearEmptyElmSet()
public boolean isEmptyElm(String elmName)
public void setElmTerminators(String elmName, String elmTerms[])
public void addTerminator(String elmName, String elmTerm)
public static final Dictionary putIds(Dictionary dict, String sary[])
protected Document root()
public Document parse(InputStream in) throws Exception
public void startDocument()
public void endDocument()
public void doctype(String name, String publicID, String systemID)
public void startElement(String name, AttributeMap attributes)
public void endElement(String name)
public void characters(char ch[], int start, int length)
public void ignorable(char ch[], int start, int length)
public void processingInstruction(String target, String remainder)
public AttributeList getDOMAttrs(AttributeMap attrs)
public static void main(String args[]) throws Exception
All Packages Class Hierarchy This Package Previous Next Index