Student name: Lizzie DeYoung

Tufts COMP 150-IDS (Spring 2015):
Internet-scale Distributed Systems

An Analysis of XML and JSON

Contents

Background

The goal of data interchange formats is to enable data from one machine to be stored or processed on another machine. Initially, data interchange formats were mostly ad-hoc, pertaining to a specific set of data shared between a specific set of applications [FF]. This did not lend itself to interoperability or extensibility: it would be difficult for other applications to interpret the data as the data was not stored in a well-known format, and adding more information to the data such as additional fields would often break the existing implementations.

XML

The eXtensible Markup Language (XML) is both a document markup language and data interchange format. XML is a subset of Standard Generalized Markup Language (SGML), a well known widely used document markup language that is also a superset of HTML. A markup language is a system “for marking or tagging a document that indicates its logical structure (as paragraphs) and gives instructions for its layout on the page.”[MW] XML’s document markup also enabled it to define pieces of information in a structured format. An example of this is below.

<messages>
  <note id="501">
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
  </note>
  <note id="502">
    <to>Jani</to>
    <from>Tove</from>
    <heading>Re: Reminder</heading>
    <body>I will not</body>
  </note>
</messages>[W3S]
  

In this, one can see that information is separated into elements using start and end tags (<to>Jani</to>). Additionally, one can see that attributes can be used to store metadata or additional information (id=”502”).

The design goals of XML were as follows:

  1. “XML shall be straightforwardly usable over the Internet.
  2. XML shall support a wide variety of applications.
  3. XML shall be compatible with SGML.
  4. It shall be easy to write programs which process XML documents.
  5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
  6. XML documents should be human-legible and reasonably clear.
  7. The XML design should be prepared quickly.
  8. The design of XML shall be formal and concise.
  9. XML documents shall be easy to create.
  10. Terseness in XML markup is of minimal importance.”[XML]

The following goals are of particular note:

Not outlined in the goals but noted in the name itself, XML is extensible. This means that more information can be added without necessarily breaking existing applications. This is because the position of information doesn’t matter in XML. Information isn’t processed using byte or character offsets. This means that applications are able to use the principle of partial understanding, enabling support of backwards and forwards compatibility.

In addition to the XML features described, there is also a robust schema language for XML, XML Schema Definition (XSD). “The purpose of an XSD schema is to define and describe a class of XML documents by using schema components to constrain and document the meaning, usage and relationships of their constituent parts.”[XSC] Existing tools enable XML documents to be automatically validated against XSDs, ensuring that the XML matches the specifications for the data interchange and can be properly parsed by the end point.

XML also includes namespaces. This enables XML formats to be reused – if there’s a format that uses foot to describe a human body part and foot to describe a unit of measurement, namespaces allow these to formats to be combined and for the meaning of foot to be unambiguous.

JSON

JavaScript Object Notation (JSON) is a language independent data interchange format derived from JavaScript. JSON consists of the following structures:

An example demonstrating this is below:

{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}[JSON]

The ‘{…}’ denotes an object with name/value pairs. The name is in quotes and precedes the colon. The value could be a simple value denoted in quotes, another object, or an array, and follows the colon. Name/value pairs are comma separated. The ‘[…]’ denotes an array. The values of an array can be simple valued denoted in quotes, objects, or other arrays. Array values are also comma separated.

The design goals of JSON were to be

These four goals parallel the four goals of note selected from the set of XML goals.

Additionally, like XML and for similar reasons, JSON is extensible. You can easily add new information without breaking existing applications. There are some sources that claim JSON is not as extensible as XML.[JXML][MTJ] These will be explored in the analysis section.

A major difference between XML and JSON is that JSON parallels the object models found in many modern programming languages such as JavaScript, C#, and Java. This parallelism means that the deserialization of JSON encoded data into a particular language’s object model is often a direct mapping, further contributing to the adoption of JSON as a data interchange format.

Also different from XML, there is not yet an official schema description language for JSON. However, one of particular note, JSON Schema, is in the works. JSON Schema is currently an Internet Draft and will eventually become an RFC.

Current state

Both XML and JSON are in widespread use today. Both are used as data interchange formats and both have been adopted by applications as a way to store structured data. In addition, because XML is a document markup language, it is also used by document-based applications as a way to store documents.

XML

As a document markup language, XML is the language of choice for document editors such as Microsoft Word[MW] and Open Office[OO][OASIS]. Microsoft states that it used XML for this purpose to enable document data to be easily moved between applications and to enable the document text to be automatically processed and analyzed.[MW]

Additionally, XML is used as a data storage format for several SQL databases such as

XML is also being used to store data in several NoSQL databases such as

Finally, XML is used to support data interchanges for several major web application APIs such as:

JSON

JSON is widely used in NoSQL databases. Even databases that were originally XML based, such as MarkLogic[JML], are beginning to support JSON in their later versions. Some major NoSQL databases that support JSON include[NOSQL]

Additionally, JSON has been adopted as the data interchange format of choice by major web application APIs such as

Growth

The Programmable Web website has an interesting graphic depicting the growth of both XML and JSON overtime. This graphic shows that while XML was still the primary data format in use in 2013, the use of XML declined while the use of JSON continued to increase.

Graph showing declining growth of XML and increasing growth of JSON

Analysis

This analysis looks at the question, “which is better, XML or JSON?” The answer to this, like many questions, is, “it depends.” To better define “it depends,” this analysis will look at concrete differences between XML and JSON and discuss their advantages or disadvantages and explore counter arguments. Additionally, this analysis will explore how each language handles extensibility.

The concrete differences between XML and JSON that will be analyzed include:

XML is a document markup language. JSON is not.

As a document markup language, XML is suited to applications dealing with document markup such as Microsoft Word and Open Office. JSON does not lend itself to document markup and is not suited for this type of use. Even the JSON proponents including the author of JSON himself agree with this: “XML is a better document exchange format.”[JXML] Consequently, XML will likely continue to be the format of choice for document applications such as Microsoft Word and Open Office.

XML has a schema. JSON does not.

XSDs allow XML documents to be easily validated, ensuring that they conform to the specified format. Schemas are particularly important when the creator of a data interchange format won’t be in control of either the sending or receiving ends of the transaction. This is often the case with official data standards that are supposed to be adopted by a variety of third party systems. In order to better ensure interoperability between these third party systems, an exchange format that is easily validated is important.

However, if the creator of the data interchange format is responsible for either sending or receiving the data, the existence of a schema is not as necessary. In this situation, conformance can be enforced by an application either rejecting malformed data or by ensuring to only send well-formed data.

Because interoperability and consistency of implementation are important, when third parties will be responsible for both sending and receiving data, the easily validated XML is a good choice. However, a schema for JSON, JSON Schema, is currently in the works and may better enable JSON to be used in similar situations.

XML includes namespaces. JSON does not.

Namespaces enable the same names to be reused when two or more data interchange formats are combined. As described above, two data formats may use the same name to describe two different things, such as foot to describe a unit of measurement or a body part. If those two data formats are reused and combined without using namespaces, the use of foot is ambiguous. However, with namespaces, the two instances of foot are clearly disambiguated.

Because JSON lacks namespaces, reuse of existing JSON data interchange formats can cause conflict if modifications aren’t made. As a result, in a system where reuse of data interchange formats is desired, XML may be the more suitable format.

JSON maps directly to object models. XML does not.

As the name JavaScript Object Notation indicates, JSON aligns with JavaScript object models. It also aligns with object models for several other popular languages like Java, C#, and C++. This means that for a developer, serialization and deserialization of JSON code into whatever language is being used is simple and direct.

With XML, however, this is not always the case. Although XML can directly map to an object model, it doesn’t always. Several XML characteristics such as having both attributes and elements, allowing “choice” as a collection type, and allowing repeated element names, can cause XML documents to not align with object models. This can make XML serialization difficult and complex.

It is often assumed that XML’s impedance with programming language models not only causes overhead for the developer, but also for the system during runtime.[SO] However, a study by David Lee from MarkLogic demonstrates otherwise. Lee’s study analyzed performance using JSON and XML across common operating systems and browsers, including mobile browsers. In the conclusion of his study, Lee notes, “The choice of JSON vs XML is nearly indistinguishable from a performance perspective. There are a few outliers (such as Mobile and Desktop Firefox using jQuery) where there is a notable difference…. But for most uses the difference in markup choice will result in little or no user noticeable difference in performance and end user experience.”[LEE]

These results indicate that if quick prototyping is the goal for a particular web application or service, JSON may be the best data format due to the ease of use by developers. However, if performance is a main factor, both XML and JSON perform about equivalently.

JSON is less verbose than XML

XML’s start and end tag structure make it more verbose than a similar JSON representation, making XML message larger than JSON messages. Because message size is positively correlated with network transmission times, people sometimes choose JSON over XML for this reason. However, if network transmission times are a major system concern, then a binary format or gzipping the data may be the best option. As noted above in David Lee’s study, both JSON and XML gzip to approximately the same size. If gzipping is not an option but size matters, a binary format is probably the best option. If binary is not an option and the choice is between XML and JSON, then JSON would be the better choice. If gzipping is an option, than XML and JSON are equivalent.

Additionally, a common viewpoint reflected in a blog post from the mashery.com is: “The addition of tags and attributes lends extra weight to the data payload, which can significantly affect the performance of applications in constrained environments like mobile and embedded systems.”[MASH] Again, David Lee’s study refutes this point.

A look at extensibility

Earlier, this document stated that both XML and JSON support extensibility. This was stated because both can have elements added (as long as they conform to the language's syntax rules) without breaking other systems. However, some claim that JSON can be added to in such a way that processing applications would break. Even the JSON site says that JSON is not extensible: "JSON is not extensible because it does not need to be. JSON is not a document markup language, so it is not necessary to define new tags or attributes to represent data in it."[JXML] Other arguments against extensiblity include JSON's inability to support two name/value pairs at the same level with the same name, and that modifying a JSON value from a simple object to an array or object (or vice-versa) will break processing applications.[MTJ] However, it's important to note that it is still possible to add new unique name/value pairs to JSON without breaking processing applications.

Because XML is extensible in more situations that JSON, XML would likely be the langauge of choice if extensibility is a major factor in an application.

Conclusion

Based on the analysis above, the biggest argument for JSON is its direct mapping to programming language objects. While this may seem minor in comparison to the arguments for XML outlined above, it is in fact a major factor contributing to the choice of JSON over XML. From personal experience and from anecdotes online, the direct mapping to common programming language objects makes JSON significantly easier to work with as a developer. This in turn results in a simpler and more easily maintained code base and reduces development time. If developer ease of use is of primary concern for a system (as it might be for web service APIs such as Google Maps and Twitter, which are interested in gaining many API users), then JSON may be the data interchange format of choice.

However, if a well-defined, easily validated structure is needed as is often the case with data standards, XML is the way to go. Additionally, the use of namespaces makes XML formats easily reusable across a broad array of applications. Finally, as a document markup language, XML will always be chosen over object-based JSON for document-oriented applications such as Microsoft Word and Open Office.

Finally, if performance is a main concern, then JSON is not the obvious choice. David Lee’s study demonstrates that in most cases, XML and JSON are approximately equivalent performance-wise. Other considerations, such as those outlined above, should be taken into account before determine which format to use.

Future

Based on the analysis above, XML’s schema, use of namespaces, and document-based structure will prevent JSON from completely replacing XML as the data interchange format of choice. However, JSON is gaining ground in the areas of schema and namespace use. JSON Schema is an effort in the works that will make it so JSON can be validated similarly to how XML currently is. JSON Schema is currently in Internet Draft stage of development and is getting ready to be submitted as an RFP. Additionally, there’s another effort, JSON-LD, that supports disambiguation of terms similarly to how namespaces disambiguate terms in XML. Each element in JSON-LD has a URI identifier associated with it which uniquely describes that element. Like in XML with namespaces, you can now have two elements with the same name (though in different levels of the hierarchy) with different identifiers. These two elements can now be disambiguated in JSON with JSON-LD as they could be in XML with namespaces.

These two efforts will give JSON the schema capabilities and disambiguation capabilities of XML. This will make it easier for JSON to gain adoption over time. Even with these advances, JSON will never completely replace XML because it cannot serve as a document markup language for document-based applications.

Bibliography