際際滷

際際滷Share a Scribd company logo
Introduction to XML
Extensible Markup Language
What is XML
 XML stands for eXtensible Markup Language.
 A markup language is used to provide
information about a document.
 Tags are added to the document to provide the
extra information.
 HTML tags tell a browser how to display the
document.
 XML tags give a reader some idea what some of
the data means.
What is XML Used For?
 XML documents are used to transfer data from one place
to another often over the Internet.
 XML subsets are designed for particular applications.
 One is RSS (Rich Site Summary or Really Simple
Syndication ). It is used to send breaking news bulletins
from one web site to another.
 A number of fields have their own subsets. These
include chemistry, mathematics, and books publishing.
 Most of these subsets are registered with the
W3Consortium and are available for anyones use.
Advantages of XML
 XML is text (Unicode) based.
 Takes up less space.
 Can be transmitted efficiently.
 One XML document can be displayed differently
in different media.
 Html, video, CD, DVD,
 You only have to change the XML document in order
to change all the rest.
 XML documents can be modularized. Parts can
be reused.
Example of an HTML Document
<html>
<head><title>Example</title></head.
<body>
<h1>This is an example of a page.</h1>
<h2>Some information goes here.</h2>
</body>
</html>
Example of an XML Document
<?xml version=1.0/>
<address>
<name>Alice Lee</name>
<email>alee@aol.com</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>
Difference Between HTML and XML
 HTML tags have a fixed meaning and
browsers know what it is.
 XML tags are different for different
applications, and users know what they
mean.
 HTML tags are used for display.
 XML tags are used to describe documents
and data.
XML Rules
 Tags are enclosed in angle brackets.
 Tags come in pairs with start-tags and
end-tags.
 Tags must be properly nested.
 <name><email></name></email> is not allowed.
 <name><email></email><name> is.
 Tags that do not have end-tags must be
terminated by a /.
 <br /> is an html example.
More XML Rules
 Tags are case sensitive.
 <address> is not the same as <Address>
 XML in any combination of cases is not allowed
as part of a tag.
 Tags may not contain < or &.
 Tags follow Java naming conventions, except
that a single colon and other characters are
allowed. They must begin with a letter and may
not contain white space.
 Documents must have a single root tag that
begins the document.
Encoding
 XML (like Java) uses Unicode to encode characters.
 Unicode comes in many flavors. The most common one
used in the West is UTF-8.
 UTF-8 is a variable length code. Characters are
encoded in 1 byte, 2 bytes, or 4 bytes.
 The first 128 characters in Unicode are ASCII.
 In UTF-8, the numbers between 128 and 255 code for
some of the more common characters used in western
Europe, such as 達, 叩, 奪, or 巽.
 Two byte codes are used for some characters not listed
in the first 256 and some Asian ideographs.
 Four byte codes can handle any ideographs that are left.
 Those using non-western languages should investigate
other versions of Unicode.
Well-Formed Documents
 An XML document is said to be well-formed if it
follows all the rules.
 An XML parser is used to check that all the rules
have been obeyed.
 Recent browsers such as Internet Explorer 5
and Netscape 7 come with XML parsers.
 Parsers are also available for free download
over the Internet. One is Xerces, from the
Apache open-source project.
 Java 1.4 also supports an open-source parser.
XML Example Revisited
<?xml version=1.0/>
<address>
<name>Alice Lee</name>
<email>alee@aol.com</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>
 Markup for the data aids understanding of its purpose.
 A flat text file is not nearly so clear.
Alice Lee
alee@aol.com
212-346-1234
1985-03-22
 The last line looks like a date, but what is it for?
Expanded Example
<?xml version = 1.0 ?>
<address>
<name>
<first>Alice</first>
<last>Lee</last>
</name>
<email>alee@aol.com</email>
<phone>123-45-6789</phone>
<birthday>
<year>1983</year>
<month>07</month>
<day>15</day>
</birthday>
</address>
XML Files are Trees
address
name email phone birthday
first last year month day
XML Trees
 An XML document has a single root node.
 The tree is a general ordered tree.
 A parent node may have any number of
children.
 Child nodes are ordered, and may have
siblings.
 Preorder traversals are usually used for
getting information out of the tree.
Validity
 A well-formed document has a tree structure and
obeys all the XML rules.
 A particular application may add more rules in
either a DTD (document type definition) or in a
schema.
 Many specialized DTDs and schemas have
been created to describe particular areas.
 These range from disseminating news bulletins
(RSS) to chemical formulas.
 DTDs were developed first, so they are not as
comprehensive as schema.
Document Type Definitions
 A DTD describes the tree structure of a
document and something about its data.
 There are two data types, PCDATA and
CDATA.
 PCDATA is parsed character data.
 CDATA is character data, not usually parsed.
 A DTD determines how many times a
node may appear, and how child nodes
are ordered.
DTD for address Example
<!ELEMENT address (name, email, phone, birthday)>
<!ELEMENT name (first, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT birthday (year, month, day)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT day (#PCDATA)>
Schemas
 Schemas are themselves XML documents.
 They were standardized after DTDs and provide
more information about the document.
 They have a number of data types including
string, decimal, integer, boolean, date, and time.
 They divide elements into simple and complex
types.
 They also determine the tree structure and how
many children a node may have.
Schema for First address Example
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="address">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
<xs:element name="phone" type="xs:string"/>
<xs:element name="birthday" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Explanation of Example Schema
<?xml version="1.0" encoding="ISO-8859-1" ?>
 ISO-8859-1, Latin-1, is the same as UTF-8 in the first 128 characters.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 www.w3.org/2001/XMLSchema contains the schema standards.
<xs:element name="address">
<xs:complexType>
 This states that address is a complex type element.
<xs:sequence>
 This states that the following elements form a sequence and must
come in the order shown.
<xs:element name="name" type="xs:string"/>
 This says that the element, name, must be a string.
<xs:element name="birthday" type="xs:date"/>
 This states that the element, birthday, is a date. Dates are always of
the form yyyy-mm-dd.
XSLT
Extensible Stylesheet Language Transformations
 XSLT is used to transform one xml document
into another, often an html document.
 The Transform classes are now part of Java 1.4.
 A program is used that takes as input one xml
document and produces as output another.
 If the resulting document is in html, it can be
viewed by a web browser.
 This is a good way to display xml data.
A Style Sheet to Transform address.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="address">
<html><head><title>Address Book</title></head>
<body>
<xsl:value-of select="name"/>
<br/><xsl:value-of select="email"/>
<br/><xsl:value-of select="phone"/>
<br/><xsl:value-of select="birthday"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The Result of the Transformation
Alice Lee
alee@aol.com
123-45-6789
1983-7-15
Parsers
 There are two principal models for
parsers.
 SAX  Simple API for XML
 Uses a call-back method
 Similar to javax listeners
 DOM  Document Object Model
 Creates a parse tree
 Requires a tree traversal
References
 Elliotte Rusty Harold, Processing XML with
Java, Addison Wesley, 2002.
 Elliotte Rusty Harold and Scott Means,
XML Programming, OReilly & Associates,
Inc., 2002.
 W3Schools Online Web Tutorials,
http://www.w3schools.com.

More Related Content

What's hot (15)

Xml presentation
Xml presentationXml presentation
Xml presentation
Miguel Angel Teheran Garcia
02 well formed and valid documents
02 well formed and valid documents02 well formed and valid documents
02 well formed and valid documents
Baskarkncet
Xml dom
Xml domXml dom
Xml dom
sana mateen
Xml
XmlXml
Xml
Santosh Pandey
Xml
XmlXml
Xml
Dr. C.V. Suresh Babu
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language
Ann Joseph
Extensible Markup Language (XML)
Extensible Markup Language (XML)Extensible Markup Language (XML)
Extensible Markup Language (XML)
AakankshaR
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
Kumar
Publishing xml
Publishing xmlPublishing xml
Publishing xml
Kumar
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
Fazli Kabashi
Intro xml
Intro xmlIntro xml
Intro xml
sana mateen
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
shannonsdavis
XML - EXtensible Markup Language
XML - EXtensible Markup LanguageXML - EXtensible Markup Language
XML - EXtensible Markup Language
Reem Alattas
Xml and webdata
Xml and webdataXml and webdata
Xml and webdata
Harry Potter
HTML and XML Difference FAQs
HTML and XML Difference FAQsHTML and XML Difference FAQs
HTML and XML Difference FAQs
Umar Ali
02 well formed and valid documents
02 well formed and valid documents02 well formed and valid documents
02 well formed and valid documents
Baskarkncet
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language
Ann Joseph
Extensible Markup Language (XML)
Extensible Markup Language (XML)Extensible Markup Language (XML)
Extensible Markup Language (XML)
AakankshaR
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
Kumar
Publishing xml
Publishing xmlPublishing xml
Publishing xml
Kumar
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
Fazli Kabashi
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
shannonsdavis
XML - EXtensible Markup Language
XML - EXtensible Markup LanguageXML - EXtensible Markup Language
XML - EXtensible Markup Language
Reem Alattas
Xml and webdata
Xml and webdataXml and webdata
Xml and webdata
Harry Potter
HTML and XML Difference FAQs
HTML and XML Difference FAQsHTML and XML Difference FAQs
HTML and XML Difference FAQs
Umar Ali

Similar to Xml unit1 (20)

Introduction to XML.ppt
Introduction to XML.pptIntroduction to XML.ppt
Introduction to XML.ppt
Varsha Uchagaonkar
Introduction to XML.ppt
Introduction to XML.pptIntroduction to XML.ppt
Introduction to XML.ppt
Varsha Uchagaonkar
web program-Extended MARKUP Language XML.ppt
web program-Extended MARKUP Language XML.pptweb program-Extended MARKUP Language XML.ppt
web program-Extended MARKUP Language XML.ppt
mcjaya2024
WT UNIT-2 XML.pdf
WT UNIT-2 XML.pdfWT UNIT-2 XML.pdf
WT UNIT-2 XML.pdf
Ranjeet Reddy
msc_xml1.ppt
msc_xml1.pptmsc_xml1.ppt
msc_xml1.ppt
ADVAITHRRAJESH224720
msc_xml1.ppt
msc_xml1.pptmsc_xml1.ppt
msc_xml1.ppt
ADVAITHRRAJESH224720
msc_xml1.ppt
msc_xml1.pptmsc_xml1.ppt
msc_xml1.ppt
ADVAITHRRAJESH224720
Xml
XmlXml
Xml
Sudharsan S
1 xml fundamentals
1 xml fundamentals1 xml fundamentals
1 xml fundamentals
Dr.Saranya K.G
BITM3730 10-31.pptx
BITM3730 10-31.pptxBITM3730 10-31.pptx
BITM3730 10-31.pptx
MattMarino13
BITM3730 10-18.pptx
BITM3730 10-18.pptxBITM3730 10-18.pptx
BITM3730 10-18.pptx
MattMarino13
Unit_2_Xml.ppt
Unit_2_Xml.pptUnit_2_Xml.ppt
Unit_2_Xml.ppt
Sushil Bhardwaj
Xml
XmlXml
Xml
baabtra.com - No. 1 supplier of quality freshers
IT6801-Service Oriented Architecture- UNIT-I notes
IT6801-Service Oriented Architecture- UNIT-I notesIT6801-Service Oriented Architecture- UNIT-I notes
IT6801-Service Oriented Architecture- UNIT-I notes
Ramco Institute of Technology, Rajapalayam, Tamilnadu, India
XML(EXtensible Markup Language). XML(EXtensible Markup Language).pptppt
XML(EXtensible Markup Language). XML(EXtensible Markup Language).pptpptXML(EXtensible Markup Language). XML(EXtensible Markup Language).pptppt
XML(EXtensible Markup Language). XML(EXtensible Markup Language).pptppt
sivani14565220
Xml
XmlXml
Xml
baabtra.com - No. 1 supplier of quality freshers
Unit3wt
Unit3wtUnit3wt
Unit3wt
vamsi krishna
Unit3wt
Unit3wtUnit3wt
Unit3wt
vamsitricks
Xml basics
Xml basicsXml basics
Xml basics
Kumar
00 introduction
00 introduction00 introduction
00 introduction
Baskarkncet

Recently uploaded (20)

The basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptxThe basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptx
heathfieldcps1
NURSING PROCESS AND ITS STEPS .pptx
NURSING PROCESS AND ITS STEPS                 .pptxNURSING PROCESS AND ITS STEPS                 .pptx
NURSING PROCESS AND ITS STEPS .pptx
PoojaSen20
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VIAnti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Samruddhi Khonde
Digital Electronics: Fundamentals of Combinational Circuits
Digital Electronics: Fundamentals of Combinational CircuitsDigital Electronics: Fundamentals of Combinational Circuits
Digital Electronics: Fundamentals of Combinational Circuits
GS Virdi
Different Facets of Knowledge on different View.pptx
Different Facets of Knowledge on different View.pptxDifferent Facets of Knowledge on different View.pptx
Different Facets of Knowledge on different View.pptx
NrapendraVirSingh
Unit No 4- Chemotherapy of Malignancy.pptx
Unit No  4- Chemotherapy of Malignancy.pptxUnit No  4- Chemotherapy of Malignancy.pptx
Unit No 4- Chemotherapy of Malignancy.pptx
Ashish Umale
Viceroys of India & Their Tenure Key Events During British Rule
Viceroys of India & Their Tenure  Key Events During British RuleViceroys of India & Their Tenure  Key Events During British Rule
Viceroys of India & Their Tenure Key Events During British Rule
DeeptiKumari61
Knownsense 2025 Finals-U-25 General Quiz.pdf
Knownsense 2025 Finals-U-25 General Quiz.pdfKnownsense 2025 Finals-U-25 General Quiz.pdf
Knownsense 2025 Finals-U-25 General Quiz.pdf
Pragya - UEM Kolkata Quiz Club
How to Install Odoo 18 with Pycharm - Odoo 18 際際滷s
How to Install Odoo 18 with Pycharm - Odoo 18 際際滷sHow to Install Odoo 18 with Pycharm - Odoo 18 際際滷s
How to Install Odoo 18 with Pycharm - Odoo 18 際際滷s
Celine George
MIPLM subject matter expert Nicos Raftis
MIPLM subject matter expert Nicos RaftisMIPLM subject matter expert Nicos Raftis
MIPLM subject matter expert Nicos Raftis
MIPLM
compiler design BCS613C question bank 2022 scheme
compiler design BCS613C question bank 2022 schemecompiler design BCS613C question bank 2022 scheme
compiler design BCS613C question bank 2022 scheme
Suvarna Hiremath
Early 20th Century Modern Art: Movements and Artists
Early 20th Century Modern Art: Movements and ArtistsEarly 20th Century Modern Art: Movements and Artists
Early 20th Century Modern Art: Movements and Artists
Damian T. Gordon
UTI Quinolones by Mrs. Manjushri Dabhade
UTI Quinolones by Mrs. Manjushri DabhadeUTI Quinolones by Mrs. Manjushri Dabhade
UTI Quinolones by Mrs. Manjushri Dabhade
Dabhade madam Dabhade
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
heathfieldcps1
Role of Teacher in the era of Generative AI
Role of Teacher in the era of Generative AIRole of Teacher in the era of Generative AI
Role of Teacher in the era of Generative AI
Prof. Neeta Awasthy
Chapter 6. Business and Corporate Strategy Formulation.pdf
Chapter 6. Business and Corporate Strategy Formulation.pdfChapter 6. Business and Corporate Strategy Formulation.pdf
Chapter 6. Business and Corporate Strategy Formulation.pdf
Rommel Regala
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VIAnti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Samruddhi Khonde
How to Setup Company Data in Odoo 17 Accounting App
How to Setup Company Data in Odoo 17 Accounting AppHow to Setup Company Data in Odoo 17 Accounting App
How to Setup Company Data in Odoo 17 Accounting App
Celine George
ANTIVIRAL agent by Mrs. Manjushri Dabhade
ANTIVIRAL agent by Mrs. Manjushri DabhadeANTIVIRAL agent by Mrs. Manjushri Dabhade
ANTIVIRAL agent by Mrs. Manjushri Dabhade
Dabhade madam Dabhade
MIPLM subject matter expert Dr Alihan Kaya
MIPLM subject matter expert Dr Alihan KayaMIPLM subject matter expert Dr Alihan Kaya
MIPLM subject matter expert Dr Alihan Kaya
MIPLM
The basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptxThe basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptx
heathfieldcps1
NURSING PROCESS AND ITS STEPS .pptx
NURSING PROCESS AND ITS STEPS                 .pptxNURSING PROCESS AND ITS STEPS                 .pptx
NURSING PROCESS AND ITS STEPS .pptx
PoojaSen20
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VIAnti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Samruddhi Khonde
Digital Electronics: Fundamentals of Combinational Circuits
Digital Electronics: Fundamentals of Combinational CircuitsDigital Electronics: Fundamentals of Combinational Circuits
Digital Electronics: Fundamentals of Combinational Circuits
GS Virdi
Different Facets of Knowledge on different View.pptx
Different Facets of Knowledge on different View.pptxDifferent Facets of Knowledge on different View.pptx
Different Facets of Knowledge on different View.pptx
NrapendraVirSingh
Unit No 4- Chemotherapy of Malignancy.pptx
Unit No  4- Chemotherapy of Malignancy.pptxUnit No  4- Chemotherapy of Malignancy.pptx
Unit No 4- Chemotherapy of Malignancy.pptx
Ashish Umale
Viceroys of India & Their Tenure Key Events During British Rule
Viceroys of India & Their Tenure  Key Events During British RuleViceroys of India & Their Tenure  Key Events During British Rule
Viceroys of India & Their Tenure Key Events During British Rule
DeeptiKumari61
How to Install Odoo 18 with Pycharm - Odoo 18 際際滷s
How to Install Odoo 18 with Pycharm - Odoo 18 際際滷sHow to Install Odoo 18 with Pycharm - Odoo 18 際際滷s
How to Install Odoo 18 with Pycharm - Odoo 18 際際滷s
Celine George
MIPLM subject matter expert Nicos Raftis
MIPLM subject matter expert Nicos RaftisMIPLM subject matter expert Nicos Raftis
MIPLM subject matter expert Nicos Raftis
MIPLM
compiler design BCS613C question bank 2022 scheme
compiler design BCS613C question bank 2022 schemecompiler design BCS613C question bank 2022 scheme
compiler design BCS613C question bank 2022 scheme
Suvarna Hiremath
Early 20th Century Modern Art: Movements and Artists
Early 20th Century Modern Art: Movements and ArtistsEarly 20th Century Modern Art: Movements and Artists
Early 20th Century Modern Art: Movements and Artists
Damian T. Gordon
UTI Quinolones by Mrs. Manjushri Dabhade
UTI Quinolones by Mrs. Manjushri DabhadeUTI Quinolones by Mrs. Manjushri Dabhade
UTI Quinolones by Mrs. Manjushri Dabhade
Dabhade madam Dabhade
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
heathfieldcps1
Role of Teacher in the era of Generative AI
Role of Teacher in the era of Generative AIRole of Teacher in the era of Generative AI
Role of Teacher in the era of Generative AI
Prof. Neeta Awasthy
Chapter 6. Business and Corporate Strategy Formulation.pdf
Chapter 6. Business and Corporate Strategy Formulation.pdfChapter 6. Business and Corporate Strategy Formulation.pdf
Chapter 6. Business and Corporate Strategy Formulation.pdf
Rommel Regala
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VIAnti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Samruddhi Khonde
How to Setup Company Data in Odoo 17 Accounting App
How to Setup Company Data in Odoo 17 Accounting AppHow to Setup Company Data in Odoo 17 Accounting App
How to Setup Company Data in Odoo 17 Accounting App
Celine George
ANTIVIRAL agent by Mrs. Manjushri Dabhade
ANTIVIRAL agent by Mrs. Manjushri DabhadeANTIVIRAL agent by Mrs. Manjushri Dabhade
ANTIVIRAL agent by Mrs. Manjushri Dabhade
Dabhade madam Dabhade
MIPLM subject matter expert Dr Alihan Kaya
MIPLM subject matter expert Dr Alihan KayaMIPLM subject matter expert Dr Alihan Kaya
MIPLM subject matter expert Dr Alihan Kaya
MIPLM

Xml unit1

  • 2. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information about a document. Tags are added to the document to provide the extra information. HTML tags tell a browser how to display the document. XML tags give a reader some idea what some of the data means.
  • 3. What is XML Used For? XML documents are used to transfer data from one place to another often over the Internet. XML subsets are designed for particular applications. One is RSS (Rich Site Summary or Really Simple Syndication ). It is used to send breaking news bulletins from one web site to another. A number of fields have their own subsets. These include chemistry, mathematics, and books publishing. Most of these subsets are registered with the W3Consortium and are available for anyones use.
  • 4. Advantages of XML XML is text (Unicode) based. Takes up less space. Can be transmitted efficiently. One XML document can be displayed differently in different media. Html, video, CD, DVD, You only have to change the XML document in order to change all the rest. XML documents can be modularized. Parts can be reused.
  • 5. Example of an HTML Document <html> <head><title>Example</title></head. <body> <h1>This is an example of a page.</h1> <h2>Some information goes here.</h2> </body> </html>
  • 6. Example of an XML Document <?xml version=1.0/> <address> <name>Alice Lee</name> <email>alee@aol.com</email> <phone>212-346-1234</phone> <birthday>1985-03-22</birthday> </address>
  • 7. Difference Between HTML and XML HTML tags have a fixed meaning and browsers know what it is. XML tags are different for different applications, and users know what they mean. HTML tags are used for display. XML tags are used to describe documents and data.
  • 8. XML Rules Tags are enclosed in angle brackets. Tags come in pairs with start-tags and end-tags. Tags must be properly nested. <name><email></name></email> is not allowed. <name><email></email><name> is. Tags that do not have end-tags must be terminated by a /. <br /> is an html example.
  • 9. More XML Rules Tags are case sensitive. <address> is not the same as <Address> XML in any combination of cases is not allowed as part of a tag. Tags may not contain < or &. Tags follow Java naming conventions, except that a single colon and other characters are allowed. They must begin with a letter and may not contain white space. Documents must have a single root tag that begins the document.
  • 10. Encoding XML (like Java) uses Unicode to encode characters. Unicode comes in many flavors. The most common one used in the West is UTF-8. UTF-8 is a variable length code. Characters are encoded in 1 byte, 2 bytes, or 4 bytes. The first 128 characters in Unicode are ASCII. In UTF-8, the numbers between 128 and 255 code for some of the more common characters used in western Europe, such as 達, 叩, 奪, or 巽. Two byte codes are used for some characters not listed in the first 256 and some Asian ideographs. Four byte codes can handle any ideographs that are left. Those using non-western languages should investigate other versions of Unicode.
  • 11. Well-Formed Documents An XML document is said to be well-formed if it follows all the rules. An XML parser is used to check that all the rules have been obeyed. Recent browsers such as Internet Explorer 5 and Netscape 7 come with XML parsers. Parsers are also available for free download over the Internet. One is Xerces, from the Apache open-source project. Java 1.4 also supports an open-source parser.
  • 12. XML Example Revisited <?xml version=1.0/> <address> <name>Alice Lee</name> <email>alee@aol.com</email> <phone>212-346-1234</phone> <birthday>1985-03-22</birthday> </address> Markup for the data aids understanding of its purpose. A flat text file is not nearly so clear. Alice Lee alee@aol.com 212-346-1234 1985-03-22 The last line looks like a date, but what is it for?
  • 13. Expanded Example <?xml version = 1.0 ?> <address> <name> <first>Alice</first> <last>Lee</last> </name> <email>alee@aol.com</email> <phone>123-45-6789</phone> <birthday> <year>1983</year> <month>07</month> <day>15</day> </birthday> </address>
  • 14. XML Files are Trees address name email phone birthday first last year month day
  • 15. XML Trees An XML document has a single root node. The tree is a general ordered tree. A parent node may have any number of children. Child nodes are ordered, and may have siblings. Preorder traversals are usually used for getting information out of the tree.
  • 16. Validity A well-formed document has a tree structure and obeys all the XML rules. A particular application may add more rules in either a DTD (document type definition) or in a schema. Many specialized DTDs and schemas have been created to describe particular areas. These range from disseminating news bulletins (RSS) to chemical formulas. DTDs were developed first, so they are not as comprehensive as schema.
  • 17. Document Type Definitions A DTD describes the tree structure of a document and something about its data. There are two data types, PCDATA and CDATA. PCDATA is parsed character data. CDATA is character data, not usually parsed. A DTD determines how many times a node may appear, and how child nodes are ordered.
  • 18. DTD for address Example <!ELEMENT address (name, email, phone, birthday)> <!ELEMENT name (first, last)> <!ELEMENT first (#PCDATA)> <!ELEMENT last (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT birthday (year, month, day)> <!ELEMENT year (#PCDATA)> <!ELEMENT month (#PCDATA)> <!ELEMENT day (#PCDATA)>
  • 19. Schemas Schemas are themselves XML documents. They were standardized after DTDs and provide more information about the document. They have a number of data types including string, decimal, integer, boolean, date, and time. They divide elements into simple and complex types. They also determine the tree structure and how many children a node may have.
  • 20. Schema for First address Example <?xml version="1.0" encoding="ISO-8859-1" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="address"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="email" type="xs:string"/> <xs:element name="phone" type="xs:string"/> <xs:element name="birthday" type="xs:date"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
  • 21. Explanation of Example Schema <?xml version="1.0" encoding="ISO-8859-1" ?> ISO-8859-1, Latin-1, is the same as UTF-8 in the first 128 characters. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> www.w3.org/2001/XMLSchema contains the schema standards. <xs:element name="address"> <xs:complexType> This states that address is a complex type element. <xs:sequence> This states that the following elements form a sequence and must come in the order shown. <xs:element name="name" type="xs:string"/> This says that the element, name, must be a string. <xs:element name="birthday" type="xs:date"/> This states that the element, birthday, is a date. Dates are always of the form yyyy-mm-dd.
  • 22. XSLT Extensible Stylesheet Language Transformations XSLT is used to transform one xml document into another, often an html document. The Transform classes are now part of Java 1.4. A program is used that takes as input one xml document and produces as output another. If the resulting document is in html, it can be viewed by a web browser. This is a good way to display xml data.
  • 23. A Style Sheet to Transform address.xml <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="address"> <html><head><title>Address Book</title></head> <body> <xsl:value-of select="name"/> <br/><xsl:value-of select="email"/> <br/><xsl:value-of select="phone"/> <br/><xsl:value-of select="birthday"/> </body> </html> </xsl:template> </xsl:stylesheet>
  • 24. The Result of the Transformation Alice Lee alee@aol.com 123-45-6789 1983-7-15
  • 25. Parsers There are two principal models for parsers. SAX Simple API for XML Uses a call-back method Similar to javax listeners DOM Document Object Model Creates a parse tree Requires a tree traversal
  • 26. References Elliotte Rusty Harold, Processing XML with Java, Addison Wesley, 2002. Elliotte Rusty Harold and Scott Means, XML Programming, OReilly & Associates, Inc., 2002. W3Schools Online Web Tutorials, http://www.w3schools.com.