Given by Nancy McCracken,Ozgur Balsoy, Tom Scavo at Basic Information Track Computational Science Course CPS616 on Spring Semester 1999. Foils prepared May 19 99
Outside Index
Summary of Material
Overview of XML and its relationship to HTML and SGML |
XML extensible tags: DTD |
Some simple examples |
XSL and XLL |
XML tools |
Details on writing XML |
Example showing the use of XML to store a simple database |
Outside Index Summary of Material
Nancy McCracken, |
Ozgur Balsoy, Tom Scavo |
Northeast Parallel Architectures Center at Syracuse University |
111 College Place, Syracuse, NY 13244 |
http://www.npac.syr.edu/XML |
Overview of XML and its relationship to HTML and SGML |
XML extensible tags: DTD |
Some simple examples |
XSL and XLL |
XML tools |
Details on writing XML |
Example showing the use of XML to store a simple database |
XML Complete, Steven Holzner [McGraw-Hill, 1998, ISBN 0-07-913702-4] |
"XML, Java, and the future of the Web", Jon Bosak, Sun Microsystems, 1997 |
"Weaving a Better Web", S. Mace, U. Flohr, R. Dobson, T. Graham, Byte, March 1998, pp.58-68 |
NPAC's XML Resources page, http://www.npac.syr.edu/projects/tutorials/XML/ |
HTML = Hypertext Markup Language
|
HTML 2.0 spec completed in Nov 95 |
HTML+ and HTML 3.0 never released |
HTML 3.2 (Jan 97) added tables, applets, and other capabilities (approximately 70 tags)
|
HTML 4.0 spec released in Dec 97 |
Limitations of HTML:
|
XML = eXtensible Markup Language |
XML is a subset of Standard Generalized Markup Language, but unlike the latter, XML is specifically designed for the web |
How XML fits into the new HTML world:
|
The logical design of a document (content) should be separate from its visual design (presentation) |
Separation of logical and visual design
|
XML can be used to define the logical design, while the XSL (Extensible Style Language) is used to define the visual design. |
SGML = Standard Generalized ML |
A SGML document carries with it a grammar called a Document Type Definition (DTD). The DTD defines the tags and the meaning of those tags |
Presentation is governed by a style sheet written in the Document Style Semantics and Specification Language (DSSSL) |
Note that HTML is a fixed SGML application, a hard-wired set of about 70 tags and 50 attributes, and does not need to have a DTD. |
A simple SGML document with embedded DTD: <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT O O (p*,BIGP*)> <!ELEMENT p - O (#PCDATA)> <!ELEMENT BIGP - O (#PCDATA)> ]> <DOCUMENT> <p>Welcome to <BIGP>XML Style! </DOCUMENT> |
A corresponding DSSSL style sheet: <!DOCTYPE style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN"> (root (make simple-page-sequence)) (element p (make paragraph)) (element BIGP (make paragraph font-size: 24pt space-before: 12pt)) |
XML is also an SGML application, but since XML is extensible (XML is also a metalanguage), every XML document must be accompanied by its DTD |
XML is a compromise between the non-extensible, limited capabilities of HTML and the full power and complexity of SGML |
XML offers "80% of the benefits of SGML for 20% of its complexity"
|
XML allows you to define your own tags and to describe nested hierarchies of information. |
1) XML shall be usable over the Internet |
2) XML shall support a variety of applications |
3) XML shall be compatible with SGML |
4) It shall be easy to write programs that process XML documents |
5) Optional features in XML shall be kept to the absolute minimum, ideally zero |
6) XML documents should be human-legible and reasonably clear |
7) Design of XML should be prepared quickly |
8) Design of XML shall be formal and concise |
9) XML documents shall be easy to create |
10) Terseness in XML markup is of minimal importance |
First draft of XML spec released by W3C in Nov 96 (four other drafts published in 1997) |
The first XML parser (written in Java) released by Microsoft in July 97 |
Microsoft released version 1.8 of its XML parser (which supports XML 1.0) in Jan 98 |
W3C finalized the XML 1.0 spec in Feb 98 |
First XML-aware beta versions of NC and IE5.0 released in June 98 |
Sun announced Java Standard Extension for XML (XML API) in March 99 |
An XML document with external DTD: <?xml version="1.0"?> <!DOCTYPE greeting SYSTEM "hello.dtd"> <greeting>Hello World!</greeting> |
An XML document with embedded DTD: <?xml version="1.0"?> <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello World!</greeting> |
Document Type Definition (DTD), which defines the tags and their relationships |
Extensible Style Language (XSL) style sheets, which specify the presentation of the document |
Extensible Link Language (XLL), which defines link-handling details |
The DTD specifies the logical structure of the document; it is a formal grammar describing document syntax and semantics |
The DTD does not describe the physical layout of the document; this is left to the style sheets and the scripts |
It is no mean task to write a DTD, so most users will adopt predefined DTDs |
DTDs can be written in separate files to facilitate re-use. |
Content-providers, industries and other groups can collaborate to define sets of tags. |
For the data contained in an XML document to be parsed correctly, its markup must be well-formed, meaning that properly nested and nonabbreviated starting and ending tags are used.
|
Scenario #1: the server offers the XML document without its DTD, the parser does a syntax check, and the DTD follows if the XML document is "well-formed" |
Scenario #2: the server checks the XML document against its DTD ("validity") before sending the document to the client |
Another example which could be used for URL exchanges between network capable applications: <LINK> <TITLE>XML Recommendation</TITLE> <URL> http://www.w3.org/TR/REC-xml </URL> <DESCRIPTION> The official XML spec from W3C </DESCRIPTION> </LINK> |
A document may have many such links: <DOCUMENT> <LINKS> <LINK>...</LINK> <LINK>...</LINK> ... </LINKS> </DOCUMENT> |
Now write a DTD for this document: <!ELEMENT DOCUMENT (LINKS)> <!ELEMENT LINKS (LINK)*> <!ELEMENT LINK (TITLE,URL,DESCRIPTION)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT URL (#PCDATA)> <!ELEMENT DESCRIPTION (#PCDATA)> |
PCDATA stands for "parsed character data" |
Store the DTD in a file (links.dtd) and write an XML document based on this DTD: <?XML version="1.0"?> <!DOCTYPE DOCUMENT SYSTEM "links.dtd"> <DOCUMENT> <LINKS> <LINK>...</LINK> <LINK>...</LINK> ... </LINKS> </DOCUMENT> |
Note that you need an XML compiler to generate regular HTML in Netscape browsers - Internet Explorer 5.0 has a compiler built in. |
In XML terminology, a pair of start and end tags is an element. |
XML documents allow only one root element. |
XML documents must have a strict hierarchical structure.
|
Empty tags are allowed as elements in XML documents.
|
All attribute values must be within single or double quotes. <FONT COLOR="#FF00CC"> quoted attribute </FONT> |
XML tags are case-sensitive. (<H1> is not the same as <h1>. |
White space in the data between tags is relevant. But within the markup itself and within quoted attribute values, white space is normalized (removed). |
XML allows you to specify different character set encodings. <?xml version=`1.0' encoding=`UTF-8' ?> |
Predefined entities:
|
An optional, but powerful feature of XML that provides a formal set of rules to define a document structure |
Defines the elements that may be used, and dictates where they may be applied in relation to each other; therefore specifies the document hierarchy and granularity |
Comprises a set of declarations that define a document structure tree |
Declarations stored either at the top of each document that must conform to the rules, or alternatively, and more usually, in separate data files, referred by a special instruction at the top of each document. |
Each DTD element must either be a container element, or be empty (a place holder). Container elements may contain text, child elements, or a mixture of both. |
DTD also specifies the names of attributes, and dictates which elements they may appear in. For each attribute it specifies whether it is optional or required. |
DTD tree describing a book as containing a number of Chapter elements, with each chapter containing either a number of Paragraph elements, or a single Sections element |
A particular document tree has a node for each actual chapter and paragraph present, and may omit some of the optional elements |
A DTD allows you to create new tags by writing grammar rules which the tags must obey. The rules specify which tags and attributes are valid and their context. |
A DTD element declaration looks like: <!ELEMENT person(name, email*)>
|
Each declaration must follow markup format <!...>, and can only use the one of the following keywords:
|
Declarations are grouped within a DTD <!DOCTYPE MYDTD [ <!-- The MYDTD appears here --> <!......> ]> |
Declarations stored externally and shared by different documents linked as: <!DOCTYPE MYDTD SYSTEM "EXTRNL.DTD" [ <!-- Some of MYDTD appears here --> <!......> ]> |
Keyword ELEMENT Introduces a new element <!ELEMENT title .........> |
Element name must begin with a letter, and may additionally contain digits and some punctuations, i.e. `.', `-', `_', and `:' |
If an element can hold no child elements, and also no text, then it is known as empty element and denoted by EMPTY |
An element declared to have a content .of ANY may contain all of the other elements declared in the DTD |
<!ELEMENT p ANY> <!ELEMENT image EMPTY> |
Empty element usage: <image></image> or <image/> |
A model group is used to define an element that has mixed content or element content. |
A model group is bounded by brackets, and contains at least one token. |
When a model group contains more than one content token, the child elements are controlled using two logical connector operators; sequence connector `,', and choice connector `|' |
<!ELEMENT element1 (a, b, c)> indicates a is followed by element b, which in turn is followed by c. |
<!ELEMENT element2 (a | b | c)> indicates either one can be selected. |
Combinations are possible: (a,b,(c|d)), or ((a,b,c) | d) |
Quantity indicators can also be used.
|
Document text is indicated by the keyword PCDATA (Parsable Character Data) <!ELEMENT emph (#PCDATA|sub|super)*> <!ELEMENT sub (#PCDATA)> <!ELEMENT super (#PCDATA)> <emph>H<subɮ</subɬ is water.</emph> |
The rules for attribute declarations follow a similar structure to elements. <!ATTLIST person gender (male|female)#IMPLIED >
|
The keywords following an attribute definition can be
|
Enumerated types (male|female|unknown) |
CDATA type is character data - may include markup <!ATTLIST form method CDATA #FIXED `POST'> |
Tokenized types include the following tokens with special meanings:
|
The DTD of an XML document can contain entity declarations. These are like constants in other languages.
|
Create a DTD file for an address book named "ab.dtd" |
<!ELEMENT addressBook (person)+> |
<!ELEMENT person (name, email*, link?) > |
<!ATTLIST person id ID #REQUIRED > |
<!ATTLIST person gender (male|female)#IMPLIED > |
<!ELEMENT name (#PCDATA|(family,given))> |
<!ELEMENT family (#PCDATA)> |
<!ELEMENT given (#PCDATA)> |
<!ELEMENT email (#PCDATA)> |
<!ELEMENT link EMPTY > <!ATTLIST link manager IDREF #IMPLIED subordinates IDREFS #IMPLIED > |
<?xml version="1.0"?> <!DOCTYPE addressBook SYSTEM "ab.dtd"> <addressBook> <person id="B.WALLACE" gender="male"> <name> <family>Wallace</family> <given>Bob</given> </name> <email>bwallace@megacorp.com</email> <link> manager="C.TUTTLE"/> </person> <person id="C.TUTTLE" gender="femail"> <name> <family>Tuttle</family> <given>Claire </given </name> <email>ctuttle@megacorp.com</email> <link subordinates="B.WALLACE"/> </person> </addressBook> |
Encoding, internalization and languages |
Entities: Internal and External, the constants and macro-processing of XML |
Processing instructions - allow documents to contain instructions for applications |
Using a language such as Java to write a parser for XML documents. |
W3C is considering draft proposals for schemas that would allow you an easier syntax to write DTD grammar rules. |
XSL is to XML as Cascading Style Sheets (CSS) are to HTML |
Like a CSS, an XSL style sheet describes the presentation of the XML document |
Advanced layout features of XSL include: rotated text, multiple columns, and independent regions |
Development of XSL lags behind XML |
Content of XML documents is intended to be easily read by both people and software, but raw XML data is not suitable for viewing by people who are not interested in structure. |
To publish information held in XML format it is necessary to replace the tags with appropriate text styles. |
Extensible Style Language (XSL) meets this requirement. |
A style rule is used to assign a style to a particular XML element. |
It is possible to embed a style rule within an attribute: <p xsl::font-size="9pt" xsl::color="blue">A blue, 9pt paragraph.</p> |
The problem with this approach is that the rule must be repeated each time the element is used. The style sheets are developed to solve this problem so that rules are grouped together, and can be shared by multiple files. |
In general, XSL first specifies how to process the source tree to get a result tree. |
The pattern of a template rule is matched with the source tree and replaces it with the template. |
The result tree is then processed with formatting to achieve a document suitable for display, printing, speech or other media. |
XSL and XML may have their own namespaces for rules. |
General Structure <?xml-stylesheet href="article.stl" type="text/xsl"?> <!DOCTYPE article .........> <article> ... </article> |
Each style sheet has a root element called xsl. It may contain any combination of Rule and Stylerule elements. |
<xsl> <rule>...</rule><rule>...</rule><stylerule>...</stylerule> </xsl> |
XLL supports simple links (like HTML) plus:
|
XLL components: Xlink and XPointer |
XML can be used as Electronic Data Interchange (EDI):
|
An XML parser (in Java or C++) and an XSL parser are available from Microsoft and IBM |
Internet Explorer 5.0 supports XML |
An HTML browser may be retrofitted with an XML plugin or applet, but if XML is to survive, full-fledged XML browsers must be developed |
Coupled with XSL, DOM, and a scripting language, XML provides a powerful alternative to HTML |
Search engines may make better use of XML documents |
Prediction: XML will replace HTML! |
Few browsers support XML at this time |