Given by Lukasz Beca at CPS714 Computational Science Information Track on July 7 99. Foils prepared August 1 99
Outside Index
Summary of Material
XML documents and grammar specifications |
XML in applications |
XSL stylesheets |
Outside Index Summary of Material
Lukasz Beca |
beca@npac.syr.edu |
3-287 CST |
NPAC, Syracuse University |
XML documents and grammar specifications |
XML in applications |
XSL stylesheets |
Structure of XML document |
Elements of XML markup |
Document grammar specifications
|
Valid and well-formed documents |
Namespaces |
Prolog
|
Body - structured data with one root element |
Comments |
Entity references |
Character references |
Processing instructions |
CDATA sections |
Start tags and end tags |
Empty elements |
XML markup specifies the structure of the document. All text that is not markup is the character data of the document: |
Comments make the structure of the document clearer |
Can appear anywhere in a document |
Comments are not part of the document data (content of comments may be ignored by XML parsers) |
Example: |
<name> |
<!--This is a short comment--> |
Smith |
</name> |
Entity is a term that represents certain data |
XML parser will substitute that data for the entity |
Entities can be used to store binary data |
Predefined entities: amp, lt, gt, apos, quot that stand for: &, <, >, `, " |
Example: |
<statementɱ < 8</statement> |
Character reference is a character in the ISO 10646 character set, usually not directly accessible from available input devices |
Character reference is specified as a hexadecimal or decimal code for a character |
Example: |
#x000d is a carriage return |
Processing instructions are not part of the document's data but must be passed through to the application |
Hold processing directions and information passed to XML parsers and programs |
PI begins with the target application identifier |
Example: |
<?xml version = "1.0" ?> |
<?xml-stylesheet type="text/xsl" href="mystyle.xsl"?> |
CDATA section can be used to store marked-up text so that the markup is not evaluated |
CDATA sections are useful if the user wants to store XML markup as a data |
Example: |
<buffer> |
<![CDATA[<priceᡪ</price>]]> |
</buffer> |
Denote the start and end of the element |
Element: <tagname>content</tagname> |
Start tag can contain attributes |
Content of the element can contain markup and character data |
Example: |
<name>Smith</name> |
Empty element tag has special form |
<tagname/> |
Represent elements that have no content |
Example: |
<img align="left" source="picture.jpg" /> |
<?xml version="1.0" ?> |
<!DOCTYPE doc SYSTEM "pubgrammar.dtd"> |
<doc> |
<publication number="pn1"> |
<title>Collaborative Virtual Workspace</title> |
<author> |
<lastname>Spellman</lastname> |
<firstname>Peter</firstname> |
</author> |
<date</date> |
<keywords> |
<keyword>collaboration framework</keyword> |
<keyword>virtual environments</keyword> |
</keywords> |
</publication> |
</doc> |
Document Type Definition |
Provides definition of:
|
Enables validation of the document |
XML Schema
|
External DTD |
<?xml version="1.0"?> |
<!DOCTYPE greeting SYSTEM "hello.dtd"> |
<greeting>Hello</greeting> |
Internal DTD |
<?xml version="1.0"?> |
<!DOCTYPE greeting [ |
<!ELEMENT greeting (#PCDATA)> |
]> |
<greeting>Hello</greeting> |
Standalone Document Declaration section can be used to specify which DTD should be used |
<xml version="1.0" standalone=`yes' ?> |
(expression) - expression treated as a unit |
(a, b) - sequence: a followed by b |
(a|b) - choice: a or b but not both |
a? - a or nothing |
a+ - one or more occurrences of a |
a* - zero or more occurrences of a |
Example: |
(title,author+,date,keywords?)* |
Element declaration indicates the element's type and its content |
The content:
|
Example: |
<!ELEMENT thing (#PCDATA|container)> |
<!ELEMENT container (#PCDATA,(thing)*)> |
where PCDATA stands for parsed character data |
Attributes are used to associate name-value pairs with elements |
Attribute specifications may appear only within start-tags and empty-element tags |
Declarations may be used to:
|
Attribute types:
|
Attribute defaults:
|
Declaration of `name' attribute for element `thing': |
<!ATTLIST thing name CDATA #REQUIRED> |
Entities are replaced in the document by the parser with actual data |
Entity types
|
Location of data
|
Parsed entities - external files that contain XML, can be inserted in the document |
Unparsed entities - mostly binary files, can be used only as attribute values on elements with ENTITY attributes |
<!ENTITY picture SYSTEM "mypicture.gif" NDATA gif> |
Notation is used to identify an external binary format |
Name of the helper application can be specified that parser might use to launch or process the binary entity |
<!NOTATION gif SYSTEM "Editor.exe"> |
<!ATTLIST picture |
format NOTATION (gif | jpg)> |
<!ELEMENT doc (publication)*> |
<!ELEMENT publication (title,author+,date,keywords?)> |
<!ELEMENT title (#PCDATA)> |
<!ELEMENT author (lastname, firstname)> |
<!ELEMENT date (#PCDATA)> |
<!ELEMENT lastname (#PCDATA)> |
<!ELEMENT firstname (#PCDATA)> |
<!ELEMENT keywords (keyword)+> |
<!ELEMENT keyword (#PCDATA)> |
<!ATTLIST publication number ID #REQUIRED> |
Well-formed document:
|
Valid document: document complies with DTD
|
Collection of related XML elements and attributes identified by a URI reference |
NOTE: URIs are used to avoid collisions in namespace's names |
Provides unique names for elements and attributes by adding context to the tags |
Enables reuse of grammar specifications |
<?xml version="1.0" ?> |
<!DOCTYPE doc SYSTEM "pubgrammar.dtd"> |
<doc xmlns="http://npac.syr.edu/publications"> |
<publication number="pn1"> |
<title>Collaborative Virtual Workspace</title> |
<author> |
<lastname>Spellman</lastname> |
<firstname>Peter</firstname> |
</author> |
<date</date> |
<keywords> |
<keyword>collaboration framework</keyword> |
<keyword>virtual environments</keyword> |
</keywords> |
</publication> |
</doc> |
XML parsers |
Operations on XML documents |
Storing objects as XML documents |
XML and databases |
Validation
|
Application Programming Interfaces
|
SAXParser |
ErrorHandler |
DocumentHandler |
Document |
Element publication |
Element doc |
Element publication |
Element author |
Element title |
Text |
Element lastname |
Element firstname |
Text |
Text |
Browsing document |
Document modification |
Saving changes |
Validation of XML documents |
import org.xml.sax.HandlerBase; |
import org.xml.sax.AttributeList; |
public class MyHandler extends HandlerBase { |
String tag = "outside"; |
int indent = 0; |
public void startElement (String name, AttributeList atts) { |
int i; |
indent = indent + 2; |
for(i = 0; i < indent; i ++) { |
System.out.print(" "); |
} |
System.out.println("Start element: " + name); |
tag = "inside"; |
} |
import org.xml.sax.Parser; |
import org.xml.sax.DocumentHandler; |
import org.xml.sax.helpers.ParserFactory; |
public class XMLContent { |
static final String parserClass = "com.ibm.xml.parsers.SAXParser"; |
public static void main (String args[]) throws Exception { |
Parser parser = ParserFactory.makeParser(parserClass); |
DocumentHandler handler = new MyHandler(); |
parser.setDocumentHandler(handler); |
for (int i = 0; i < args.length; i++) { |
parser.parse(args[i]); |
} |
} |
} |
Start element: doc |
Start element: publication |
Start element: title |
Content: Collaborative Virtual Workspace |
End element: title |
Start element: author |
Start element: lastname |
Content: Spellman |
End element: lastname |
Start element: firstname |
Content: Peter |
End element: firstname |
End element: author |
Start element: date |
Content: 1997 |
End element: date |
Start element: keywords |
Start element: keyword |
Content: collaboration framework |
import org.w3c.dom.*; |
import com.ibm.xml.parsers.DOMParser; |
public class DOMAccess { |
static final String parserClass = "com.ibm.xml.parsers.DOMParser"; |
public static void main (String args[]) throws Exception { |
DOMParser parser = new DOMParser(); |
Document document; |
Element root; |
NodeList publications; |
Element publication; |
Element author; |
Element lastname; |
Text nameString; |
parser.parse(args[0]); |
document = parser.getDocument(); |
root = document.getDocumentElement(); |
System.out.println("Node name: " + root.getNodeName()); |
publications = root.getElementsByTagName("publication"); |
publication = (Element) publications.item(0); |
System.out.println("Node name: "+publication.getNodeName()); |
author = (Element) (publication.getElementsByTagName("author")).item(0); |
System.out.println("Node name: " + author.getNodeName()); |
lastname = (Element) (author.getElementsByTagName("lastname")).item(0); |
System.out.println("Node name: " + lastname.getNodeName()); |
nameString = (Text) lastname.getFirstChild(); |
System.out.println("Last Name: " + nameString.getData()); |
Node name: doc |
Node name: publication |
Node name: author |
Node name: lastname |
Last Name: Spellman |
import org.w3c.dom.*; |
import com.ibm.xml.parser.*; |
import java.io.*; |
public class DocModification { |
public static void main (String args[]) throws Exception { |
Parser parser = new Parser(args[0]); |
InputStream in = new FileInputStream(args[0]); |
TXDocument document; |
Element root; |
Element publication; |
NodeList publications; |
Element title; |
Element lastname; |
Text titleText; |
PrintWriter pw; |
document = parser.readStream(in); |
root = document.getDocumentElement(); |
System.out.println("Node name: " + root.getNodeName()); |
publications = root.getElementsByTagName("publication"); |
publication = (Element) publications.item(0); |
System.out.println("Node name: "+publication.getNodeName()); |
title = (Element) (publication.getElementsByTagName("title")).item(0); |
System.out.println("Node name: " + title.getNodeName()); |
titleText = (Text) title.getFirstChild(); |
System.out.println("Title: " + titleText.getData()); |
titleText.setData("Something More Interesting"); |
System.out.println("Title: " + titleText.getData()); |
pw = new PrintWriter(new BufferedWriter (new FileWriter(args[0]))); |
document.printWithFormat(pw); |
import org.xml.sax.Parser; |
import org.xml.sax.ErrorHandler; |
import org.xml.sax.helpers.ParserFactory; |
public class Validator { |
static final String parserClass = "com.ibm.xml.parsers.ValidatingSAXParser"; |
public static void main (String args[]) throws Exception { |
Parser parser = ParserFactory.makeParser(parserClass); |
ErrorHandler handler = new ErrorReport(); |
parser.setErrorHandler(handler); |
parser.parse(args[0]); |
} |
} |
import org.xml.sax.ErrorHandler; |
import org.xml.sax.SAXException; |
import org.xml.sax.SAXParseException; |
public class ErrorReport |
implements ErrorHandler { |
/** Warning. */ |
public void warning(SAXParseException ex) { |
System.err.println("[Warning] "+ |
getLocationString(ex)+": "+ |
ex.getMessage()); |
} |
/** Error. */ |
public void error(SAXParseException ex) { |
System.err.println("[Error] "+ |
getLocationString(ex)+": "+ |
ex.getMessage()); |
} |
<doc> |
<publications number="pn1"> |
<title>Something More Interesting</title> |
<author> |
<lastname>Spellman</lastname> |
<firstname>Peter</firstname> |
</author> |
<date</date> |
<keywords> |
<keyword>collaboration framework</keyword> |
<keyword>virtual environments</keyword> |
</keywords> |
</publications> |
. |
. |
. |
D:\docs\cis\domexample>java Validator ..\publications.xml |
[Error] publications.xml:15:24: Attribute, "number", is not declared in element, "publications". |
[Error] publications.xml:26:18: Element, "publications" is not declared in the DTD |
[Error] publications.xml:56:7: Element "<doc>" is not valid because it does not follow the rule, "(publication)*".) |
Customizing Java serialization mechanism
|
Solution for JavaBeans
|
Part Name Part ID Price InStock |
window 001 40$ yes |
muffler 002 150$ yes |
door 003 30$ no |
<store> |
<part id="p001"> |
<part-name>window</part-name> |
<priceᡠ</price> |
<instock>yes</instock> |
</part> |
<part id="p002"> |
<part-name>muffler</part-name> |
<price</price> |
<instock>yes</instock> |
</part> |
</store> |
Process |
Result tree construction |
XSL template element |
XSL patterns |
Important XSL elements |
Displaying XML data in Web browsers |
Construction of source tree from XML document |
Transformation of source tree to result tree using stylesheet contained in XSL document |
Application of style rules to each node of result tree |
Display of document by user agent using appropriate styling on a display, on paper or some other medium |
Source Tree |
Result Tree |
Interpretation of result tree |
Stylesheet - set of template rules |
Template rule
|
Creation of result tree: finding the template rule for the root node and instantiating its template |
Describes template rule |
match attribute - source node to which the rule applies |
Content - the template, may contain XSL formatting vocabulary |
Conflict resolution
|
Namespaces used to distinguish XSL instructions from other template content |
Matching by name |
<xsl:template match="publication"> |
Matching by ancestry |
<xsl:template match="publication/title"> |
Matching several names |
<xsl:template match="title|keyword"> |
Matching the root |
<xsl:template match="/"> |
Wildcard matches |
<xsl:template match="*"> |
Matching by ID |
<xsl:template match="id(pn1)"> |
Matching by attribute |
<xsl:template match="publication[attribute(number) =`pn1']"> |
Matching by child |
<xsl:template match="publication[date]"> |
Matching by position |
<xsl:template match="publication[first-of-type()]"> |
<xsl:apply-templates select="pattern"/> |
Applies template rules to the children of the node |
<xsl:value-of select="pattern"> |
Extracts value of element pattern |
<xsl:for-each select="pattern"> |
Performs operation for each element described by pattern |
<xsl:sort select="key"> |
Used in apply-template or for-each element, sorts children according to the key |
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl" xmlns:HTML="http://www.w3.org/Profiles/XHTML-transitional"> |
<xsl:template><xsl:apply-templates/></xsl:template> |
<xsl:template match="/doc"> |
<HTML> |
<HEAD> |
<TITLE>Publications</TITLE> |
</HEAD> |
<BODY> |
<xsl:apply-templates/> |
</BODY> |
</HTML> |
</xsl:template> |
<xsl:template match="/doc/publication"> |
<P> |
<H1> |
Title: <xsl:value-of select="title"/> |
</H1> |
Author: |
<BR/> |
<xsl:apply-templates select="author"/> |
<BR/> |
Date: <xsl:value-of select="date"/> |
<BR/> |
Keywords: |
<BR/> |
<xsl:apply-templates select="keywords"/> |
</P> |
<HR/> |
</xsl:template> |
<xsl:template match="/doc/publication/author"> |
<B> |
<xsl:value-of select="firstname"/> |
<xsl:value-of select="lastname"/> |
</B> <BR/> |
</xsl:template> |
<xsl:template match="/doc/publication/keywords"> |
<I> |
<xsl:apply-templates select="keyword"/> |
</I> <BR/> |
</xsl:template> |
<xsl:template match="/doc/publication/keywords/keyword"> |
<I> |
<xsl:value-of/> |
</I> <BR/> |
</xsl:template> |
</xsl:stylesheet> |
XML Applications by Frank Boumphrey et al. |
XML Complete by Steven Holtzner |
Extensible Stylesheet Language (XSL) Version 1.0, W3C Working Draft 16-Dec-98 |
Extensible Markup Language (XML) 1.0, W3C Recommendation 10-Feb-98 |
SAX - http://www.megginson.com/SAX |
Document Object Model (DOM) Level 1 Specification, Verson 1.0, W3C Recommendation 1-Oct-98 |
Various Info - www.xml.com |
Parsers and Tools - http://www.alphaworks.ibm.com/tech/xml/ |