All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class hplb.xml.Parser

java.lang.Object
   |
   +----hplb.xml.Parser

public class Parser
extends Object
implements DocumentHandler
Parses a stream of MarkupTokens into a tree structure. Uses Tokenizer.

This class has very shallow (no) understanding of HTML. Correct handling of <p> tags requires some special code as does correct handling of <li>. This parser doesn't know that an "li" tag can be terminated by another "li" tag or a "ul" end tag. Hence "li" is treated as an empty tag here which means that in the generated parse tree the children of the "li" element are represented as siblings of it.

Author:
Anders Kristensen
See Also:
Tokenizer

Variable Index

 o current
 o dom
 o emptyElms
Set of elements which the parser will expect to be empty, i.e.
 o root
 o terminators
Maps element names to a list of names of other elements which terminate that element.
 o tok

Constructor Index

 o Parser()

Method Index

 o addEmptyElms(String[])
Add the set of HTML empty elements to the set of tags recognized as empty tags.
 o addTerminator(String, String)
 o characters(char[], int, int)
 o clearEmptyElmSet()
 o doctype(String, String, String)
 o endDocument()
 o endElement(String)
 o getDOMAttrs(AttributeMap)
 o getTokenizer()
 o ignorable(char[], int, int)
 o isEmptyElm(String)
 o main(String[])
 o parse(InputStream)
 o processingInstruction(String, String)
 o putIds(Dictionary, String[])
 o root()
 o setDOM(DOM)
 o setElmTerminators(String, String[])
 o startDocument()
 o startElement(String, AttributeMap)

Variables

 o emptyElms
 protected Hashtable emptyElms
Set of elements which the parser will expect to be empty, i.e. it will not expect an end tag (e.g. IMG, META HTML elements). End tags for any of these are ignored...

 o terminators
 protected Hashtable terminators
Maps element names to a list of names of other elements which terminate that element. So for example "dt" might be mapped to ("dt", "dd") and "p" might be mapped to all blocklevel HTML elements.

 o tok
 protected Tokenizer tok
 o dom
 protected DOM dom
 o root
 protected Document root
 o current
 protected Node current

Constructors

 o Parser
 public Parser()

Methods

 o setDOM
 public DOM setDOM(DOM dom)
 o getTokenizer
 public Tokenizer getTokenizer()
 o addEmptyElms
 public void addEmptyElms(String elms[])
Add the set of HTML empty elements to the set of tags recognized as empty tags.

 o clearEmptyElmSet
 public void clearEmptyElmSet()
 o isEmptyElm
 public boolean isEmptyElm(String elmName)
 o setElmTerminators
 public void setElmTerminators(String elmName,
                               String elmTerms[])
 o addTerminator
 public void addTerminator(String elmName,
                           String elmTerm)
 o putIds
 public static final Dictionary putIds(Dictionary dict,
                                       String sary[])
 o root
 protected Document root()
 o parse
 public Document parse(InputStream in) throws Exception
 o startDocument
 public void startDocument()
 o endDocument
 public void endDocument()
 o doctype
 public void doctype(String name,
                     String publicID,
                     String systemID)
 o startElement
 public void startElement(String name,
                          AttributeMap attributes)
 o endElement
 public void endElement(String name)
 o characters
 public void characters(char ch[],
                        int start,
                        int length)
 o ignorable
 public void ignorable(char ch[],
                       int start,
                       int length)
 o processingInstruction
 public void processingInstruction(String target,
                                   String remainder)
 o getDOMAttrs
 public AttributeList getDOMAttrs(AttributeMap attrs)
 o main
 public static void main(String args[]) throws Exception

All Packages  Class Hierarchy  This Package  Previous  Next  Index