Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 17
XML, Xschemas, XPATH
Spring 2011

group member 1: ______________ login:

group member 2: ______________ login:

group member 3: ______________ login:

group member 4: ______________ login:

group member 5: ______________ login:

In class we have described XML, XSchemas, and XPATH. Let's explore these in a bit more detail.

Consider the code:

 
<pets> 
 <cat name=Fred><food amount='1' brand='Friskies'>Tuna</food></cat> 
 <dog name=George><food amount='1' brand='Purina'>Chow</food> 
</pets>

This is not legal XML! Why? Correct it "in place" so that it becomes compliant XML.
Answer: strings must be quoted, and tags must be closed, e.g.:

 
<pets> 
 <cat name="Fred"><food amount='1' brand='Friskies'>Tuna</food></cat> 
 <dog name="George"><food amount='1' brand='Purina'>Chow</food> </dog> 
</pets>

What are the outputs of the following XPATHs when applied to the (corrected) XML above (as nodesets)?
1. dog//food
2. food[@brand='Purina']
3. dog/food[@amount > 1]
Answer:
1. ```
 
<food amount='1' brand='Purina'>Chow</food>
```
2. ```
 
<food amount='1' brand='Purina'>Chow</food>
```
3. Empty nodeset.

Consider the partial XML schema:

 
<xs:element name="food">
  <xs:complexType> 
    <xs:simpleContent> 
      <xs:extension base="xs:string">
        <xs:attribute name="amount" type="xs:number"/>
        <xs:attribute name="brand" type="xs:string"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType> 
</xs:element>

Does this describe the (corrected) version of the "food" elements in the above? Why or why not?
Answer: It matches the corrected text precisely.

One hidden "point" of today's lecture is that in order for an XPATH to make sense, some known schema has to be valid. Give an XML element and an XPATH that searches for something reasonable within the XML, and then give an example of a different XML element for which the XPATH does not function as expected.
Answer: Consider, e.g., an XML element where an attribute might have two meanings: Suppose an appropriate instance is:
```
 
<pets>
  <dog name='Fred' foods='2'/> 
</pets> 
```
and, e.g., an invalid entry:
```
 
<pets foods='3'>
  <dog name='Fred'/> 
</pets> 
```
And consider what the XPATH sum(@foods)>2 might mean. In the intended case, it is the sum of all counts of foods for pets, but in the strange case, it refers to a data element with a different meaning. There are an infinite number of examples like this.
(Advanced) Note that string facets are all defined in terms of regular expressions. Why is this computationally a good idea? What can't you do in a facet because of this? Hint: consider the language hierarchy in basic computation theory.
Answer: By the fundamental theorem of regular languages, a regular grammar can be parsed efficiently by a discrete finite automaton using linear time and exponential space. This means that matching regular patterns is very fast, though not particularly space-efficient. This means, though, that you cannot match parenthetical expressions (e.g., "((X (Y)))") in facets, because these require a pushdown automaton and a context-free (rather than a regular) grammar.

Comp150CPA: Clouds and Power-Aware Computing Classroom Exercise 17 XML, Xschemas, XPATH Spring 2011

group member 1: ____________________________ login: ______________

group member 2: ____________________________ login: ______________

group member 3: ____________________________ login: ______________