Tufts CS 117 (Fall 2024):
Internet-scale Distributed Systems

Tufts CS 117
Final Project

As discussed in class, all CS 117 students are required to do a final project. A choice of projects is offered: you may do one of the written assignments proposed below, or with the permission of the instructor, you may do a report on some other topic. Please email Noah if you have another project you'd like approved.

The overview, detailed requirements and academic integrity sections below give instructions that apply to all projects. The remaining sections provide a preliminary list of specific project proposals; additional suggestions may be added within the next few weeks.

The assumption is that a week to 10 days is a reasonable amount of time to complete a project like this, but to give you flexibility I am posting the assignment very early.

Please see the assignments page for due dates. However, in some cases I can be a little flexible. I read all these myself and it takes over a week, so I don't need them all at once. If you have a good reason to be a little after the due date then that will likely be OK if some other students are giving me work that I can start grading on time. If you have scheduling questions email me.

Table of Contents

Overview

So far in CS 117 you have studied many principles of system design, and our programming projects have given you the opportunity to apply some of them in practice. This final project is intended to offer a complementary experience: a first-hand look at how difficult design questions are debated and resolved in the Internet community. Just as in the actual standards development process, your ability to communicate your ideas clearly will be as important as the quality of your technical work.

Many of the project proposals below focus on current areas of controversy; almost all the others are topics which were controversial at one point, but on which we can now look back to see what actually happened. Your goal is to explore the relevant specifications, but just as importantly, to understand and to explain the trade-offs and controveries that are being debated in the community. If you like you can express your own opinion as to what should be done or should have been done, but your main goal should be to clearly contrast in a balanced way the choices that are/were available for resolving the controversy or pursuing the proposed new technology.

A key part of this assignment is to demonstrate your ability to apply the particular principles we study in CS 117. Of course, not all will apply in interesting ways to any particular subject, but you should demonstrate your ability to discuss end-to-end issues, versioning, naming, leaky abstractions, etc. when they apply. Indeed, you should choose a topic that will allow you to explore an interesting set of topics from the course. One of the most common mistakes students make in doing these projects is to do a terrific job explaining some system or controversy, but then to neglect to explore connections to the principles we've studied in the course. You cannot get a strong grade unless you do both, and you must choose a topic that allows you to do both!

Part of the challenge of this assignment is to search the Internet and other published materieals for sources of information, and especially to find records of the discussions among those who are working actively on the problems. You are provided with a few hints, but in practice, all of us who do this work need to learn to find our own reference material. The key to success is being diligent in finding multiple sources offering multiple perspectives (please don't just parrot Wikipedia, though it's often a good place to find overviews and links to primary source material). In most cases, there are public e-mail archives in which you can find the debates the experts are having, but there may also be blog postings, research papers, news articles, etc. in which people explore the problems and trade-offs. Almost always you will want to carefully study the applicable specifications.

Please remember that what you submit must be your own original work. Using Chatbots or other AI-based tools to suggest ways of explaining things is not acceptable for this project. Thus, doing so is a violation of Tufts' rules on academic integrity and can have serious consequences.

A detailed bibliography is required. In any case where you use or adapt the work or explanations of others, explicit credit must be given (see below). As in any professional technical writing, occasional quotation of some other source is often desirable, but it is not acceptable for significant portions of your explanation to be cobbled together from the writing of others. The goal is to demonstrate that you understand the material and can explain it clearly.

Detailed requirements

Please consider the following in preparing your report:

Your report should open with a brief overview of the problem, very briefly summarizing the background and the various choices to be made. Later sections should explore in more detail all the points listed above. The report should conclude with a brief summary stating or restating your conclusion. Be sure to include the bibliography. Web resources should be linked.

Although your report is expected to be much more formal in tone and presentation than a blog posting, you might be interested as a point of reference in reading Jeni Tennison's post from 2011 (http://www.jenitennison.com/blog/node/154) discussing the controversy regarding the use of fragment identifiers to identify application states. I think you'll find that Jeni's post is informed, well reasoned, and balanced, although in an informal response (http://blog.arcanedomain.com/2011/03/jenis-terrific-post-on/) I disagreed with one of her conclusions. Hers is the sort of careful, balanced analysis and writing that you should strive to imitate in your work.

Academic Integrity

Since this is the largest and most important written project you will be doing, it's important that you understand the pertinent rules from the Tufts Policy on Academic Integrity. Almost all of our students in CS 117 are honest and trying to do the right thing, but we have unforunately had a number of cases in recent years in which students have, often unintentionally, violated the rules. Please don't. We are required to report all such incidents to the responsible authorities in the Deans' office, and the consequences for you can be very, very serious.

The essence of the rules are simple:

These are, more or less, the same rules that would apply if you were writing an academic paper for submission to a scholarly journal.

Project Suggestions

No pre-approval is needed if you choose one of the projects suggested below. You may also feel free to email Noah with a proposal for a different project. Which of these projects is best may depend on your background and your interests. You may wish to briefly investigate a few before making a choice. These are not in any particular order.

NOTE: it is not practical for me to review all of these in detail every year, so it's possible that you well find explanations or references that are a little out of date. If you do please let me know. I do check each year to ensure that these are still topics that are worth exploring.

JavaScript vs. HTML

As we discussed in class, Tim Berners-Lee designed HTML to be a declarative, non-Turing-complete language. This was done primarily for the reasons discussed in The Rule of Least Power, and also to maximize compatibility with earlier SGML systems. JavaScript was introduced to the Web by Netscape, and originally was used to provide limited "smarts" for HTML pages. For example, JavaScript could dynamically highlight a field when "mousing over" (CSS did not exist then), and JavaScript could validate data input to HTML forms.

JavaScript is a Turing-complete programming language, and over time it has become an important aspect of the Web's content model. JavaScript has enabled the deployment of a wide range of very dynamic Web applications and games. For better or worse, even content that could easily be published using declarative HTML is now often encoded using JavaScript logic.

Furthermore, APIs for various JavaScript frameworks are starting to displace HTML TAGs as the language in which Web content is being written and on which developers are trained. There are numerous competing frameworks, all coexisting.

Write a report exploring the tradeoffs involved in these developments. If you choose this project, the following subtopics might merit attention:

You may also do a report focusing on one or more aspects of the JavaScript-based Web, e.g. you could focus on the rise of frameworks, etc. Be sure you have the opportunity to demonstrate your mastery of principles covered in CS 117.

Web Security and/or Privacy

The case can be made that the problems relating to security and privacy on the Web are getting worse. You may choose one or more specific issues relating to security or privacy and explore them in detail. If you believe that security and privacy are connected you may also do a report that explores issue relating to both.

Whatever specific aspects of security or privacy you explore, be sure to go into detail and remember the specific requirements of this assignment. There are many directions in which you could take an investigation like this, but we need to see you explore some slice of this problem in detail, and we need to see your creative application of the principles discussed in the course.

Blockchain, Bitcoin, etc.

Bitcoin was the first widely visible development based on blockchain technology, but now we see a variety of developments based on similar distributed ledgers. Some are at Internet-scale and some not, but almost all are interesting distributed systems

You may choose either Bitcoin tself, or perhaps some of the other blockchain-based systems, or maybe just investigate the properties of blockchain distributed systems overall.

Whichever you choose, be sure to dig deeply and explore a broad range of issues, and of course do a careful job of connecting to the topics we discussed in CS 117. Obviously you should explain the technology and associated technical challenges, but you should also look for other opportunities and pitfalls. As one example, just recently (spring of 2018) there was a report that images that are in some jurisdications considered to be illegal child pornography have been added to some widely used ledgers. It's been asserted that storage of such ledgers (e.g. for mining) may now be illegal.

HTTP/3

In 2022, the IETF issued a revised version of HTTP/3 as a standard: HTTP/3. Just as SPDY was an experimental technology that led to development of HTTP/2 (see below), QUIC was (as of spring 2018) a proposal from Google to deploy a much more radical change to the Web protocol stack. Quoting from the project description:

QUIC is a new transport which reduces latency compared to that of TCP. On the surface, QUIC is very similar to TCP+TLS+HTTP/2 implemented on UDP. Because TCP is implemented in operating system kernels, and middlebox firmware, making significant changes to TCP is next to impossible. However, since QUIC is built on top of UDP, it suffers from no such limitations.

The standard carries forward much of the QUIC technology onto the standards track as the next official version of HTTP.

For this report, you should at minimum:

HTTP/2 and SPDY

When this project was first proposed several years ago, the official specification for HTTP had undergone only one significant functional revision, when the original HTTP 1.0 was replaced with a HTTP 1.1. A while later, several organizations had experimented with either replacements for or enhancements to HTTP. The best-known and most widely deployed of these experiments was SPDY, which originated at Google. SPDY was supported natively in the Chrome browser, and more recently in new versions of Firefox.

Although there was controversy about the technology to adopt, the IETF and the Web community came together to adopt a new HTTP standard. In 2015 (if I'm reading the dates right), HTTP/2 was released as an IETF RFC and it was widely deployed.

Write a report that explains the differing views in the Web community as to whether the revision to HTTP was merited, if so why, and the pros and cons of adopting a SPDY-based approach. Explore early deployment experience and current debates about HTTP/2. Is it achieving it's goals? Are problems arising, and if so were they anticipated or did they come as a surprise? Are further changes to HTTP being proposed? Why or why not?

Be sure to explain in detail the technical characteristics of SPDY and HTTP/2. Describe the ways that HTTP/2 is and is not compatible with earlier versions of HTTP, and how it will operate when only some of the software involved is aware of the new protocol. Be sure to explore the consequences of HTTP/2's use of transport-level security. Also list and briefly describe the more significant alternative technologies that have been proposed by other organizations. Do you believe that the choice of SPDY is being made for good reasons, or has it been driven in part by inappropriate politics? Do you believe there are better alternatives available, or is it too early to tell? If SPDY is adopted, what benefits will it bring to the Web and the Internet, and what problems will it cause?

Should you write about HTTP/2 or HTTP/3? HTTP/3 is newer and in many ways a deeper and more interesting change to the architecture of the Internet as well as the Web (if you write about it, you should explain why!) However, HTTP/2, on the other hand, is an example of a proposal that was radical in its day but that is now very widely deployed. So, writing about that gives an opportunity to explore the evolution of an interesting technology all the way to the point wher it becomes a de-facto and widely deployed standard. Either HTTP/2 or HTTP/3 is a great topic (and if you choose one, you can always briefly compare to the other.)

Signed HTTP Exchanges

As of 2021, Google is promoting a somewhat new proposal titled Signed HTTP Exchanges, which has been submitted for consideration by the IETF. The abstract reads:

This document specifies how a server can send an HTTP exchange--a request URL, content negotiation information, and a response--with signatures that vouch for that exchange's authenticity. These signatures can be verified against an origin's certificate to establish that the exchange is authoritative for an origin even if it was transferred over a connection that isn't. The signatures can also be used in other ways described in the appendices.

In short, it allows a Web site to bypass the usual transport-based means of authentication (using HTTPS and Certificates) so that, in a sense, one sense can not only masquerade as another, but actually pass all certification checks. Explained differently, it allows a site to serve as a package content that was prepared and signed by another.

Like so many of the technologies mentioned here, this one represents a significant change to the way the Web and the Internet ensure that you are talking to the site that you think you are.

Explore what problem this proposal purports to solve, explain how this technology would work, discuss related technologies (e.g. AMP), and of course, explain hot the principles we've discussed and the insights we've gained in CS 117 apply to analyzing this proposal.

Digital rights management and EME

Many commercial computing platforms include protected encryption systems used to implement what is often called "Digital Rights Management", or DRM. With such a system, it is possible for publishers of music, movies, games, books and computer programs to encrypt their content in such a way that only authorized clients will be able to decode it. Major content providers such as Netflix have depended on the DRM support in Flash and Silverlight to protect their content.

The Web has traditionally been a system that promotes open sharing of information. Although it is possible for sites to ask for authentication before providing information, and to use SSL to protect it during transmission, the open standards of the Web have intentionally made it possible for users to access, manipulate, and repurpose the content that they receive through the Web.

In 2013, the W3C took the controversial step of enhancing the scope of the HTML working group's charter to allow for inclusion of DRM-related interfaces in the W3C's HTML standards: allowed for its working groups to explore the pros and cons. Even this limited step resulted in a firestorm of criticism and debate. Nonetheless, in September of 2017 the Encryped Media Extensions Recommendation was published, and the technology is therefore officially endorsed by the W3C.

Write a report that explains the issues, and gives an overview of the history of DRM and EME. Include a basic explanation of the technology, how keys are managed, etc. Also discuss the ways in which DRM limits the use of content by end-users and by programs such as crawlers. Present in your report details of the debate, with references to respected authorities on both sides. Also explain how the principles we've studied in the class relate to the question of DRM for the Web.

Be sure to explain what has happened since 2013. What further moves has the W3C made? Is the new DRM support being exploited in practice? How and by whom? What if any are the benefits and what are the costs?

Spectre and Meltdown

In 2018 the computing community was shaken by news that the majority of CPUs deployed over a period of perhaps 20 years have serious vulnerabilities which allow untrusted code to access data that should be protected. The details of the so called Spectre and Meltdown vulnerabilities are complex and subtle, but the net result is that many and perhaps the majority of the computing systems we use are far less secure than even the best experts believed. What's worse, the required fixes are difficult to develop and in many cases very hard to deploy (e.g. among the fixes is new CPU microcode that should be installed on every home computer and laptop).

Of course, Spectre and Meltdown relate to our COMP 117 discussions of security, but a closer look reveals that they involve many of the other principles we have discussed. Write a report that explains how the Spectre and Meltdown vulnerabilities work. Assume that your readers have roughly a COMP 40-level knowledge of CPU architecture (e.g. they know what machine code is and what caches are, and they may have a general idea that modern CPUs work on several instructions at the same time), but no more. Explain in commonsense terms what's going on with these vulnerabilities. Most importantly: explore in as much detail as you can the relationship to a wide variety of principles from CS 117.

Web Architecture and the Edward Snowden revelations

Many around the world have been surprised by the government surveillance programs disclosed by Edward Snowden and others. As a result, suggestions have been made for changes to the architeture of the Internet, the Web, DNS, etc.

Choose one or more of these proposals, and provide a balanced report covering the technical and social issues. Try to present "both sides" of any arguments. Be sure to cover technical issues in some detail (refer to specifications and their characteristics, not just to news reports of issues and proposals).

Content-addressible networking

Van Jacobsen is famous for his many contributions to the development of TCP/IP networking. More recently, working at the famed Xerox PARC research lab, Jacobsen is promoting a new model of Internet-scale document networking. Unlike traditional URI's, which include the DNS name of a server hosting the designated content, the names in a content-centric network are a function only of the content itself. Jacobsen claims that the Web as we know it won't scale well enough, and that the new content-centric model has advantages.

Your job in this report is to explain how the two models differ, to describe the ways in which a content-centric Web would work, and to explore the possibility that the Web could actually evolve to this new architecture.

You must identify the specifications of the Web that would have to change to support content-centric networking, as well as those that could be retained. You should explain the ways in which the operation and performance of the Web would change. In what ways with things improve, and in what ways would new problems introduced? Whose code would have to change, and who would have to deploy the new code? Overall, do you believe this is a practical change, and if so in roughly what time frame?

Internationalized Resource Names

This proposal was first made in 2012. I have not followed developments in this space since, so some of the descriptions and references below are out of date. This is still a good topic for anyone who is interested.

The URIs described in RFC 3986 and used in specifications like HTTP (RFC 2616) are ASCII strings. As you know, URI's are the main global naming mechanism for the Web. Although browsers and other tools use various techniques to create the illusion of names that include characters for a broad range of languages such as Arabic, Chinese, Thai, etc. URI's themselves do not directly convey such names. Supporting names and resource identifiers in their local languages is important hundreds of millions of people around the world, but we still do not have a good way to do it.

Over the years, a number of efforts have been made to improve the situation. In particular, IDNA introduced a convention for representing so-called "Internationalized Domain Names", and RFC 3987 proposed so-called "Internationalized Resource Identifiers" (IRIs). IRI's were promoted as a unified path for making Web names work around the world, but in practice it's been difficult to get agreement on what should be done. The purpose of your report is to explore the promise and the problems of IRIs, and of other approaches to creating names using Unicode.

IRI's are not a direct replacement for URI's, because as noted above many important specifications call specifically for URI's, and corresponding software is widely deployed. Rather, the proposal is for a two layer system, in which IRI's are used in browsers and in data fields where Unicode is acceptable, and a translation is done into URI's when necessary. The hope had been that this approach would provide consistency across a range of places where internationalized names can appear, including in the address bar of a browser and is a namespace name in XML. RFC 3987 did not prove entirely acceptable, and so an effort was made to create a so-called RFC 3987bis (successor to RFC 3987) that would be widely adopted (see http://tools.ietf.org/html/draft-ietf-iri-3987bis-13). Recently, it's become clear that the browser implementors among others are unwilling to adopt this approach. As of this week (mid-November 2012), the IETF working group devoted to IRI's is in the process of closing down, and what will happen next isn't entirely clear.

Your report should explore the use cases and requirements for internationalized resource naming, it should provide details of the pertinent technologies including IDNA and IRIs (but also punycode and other related developments). Be sure to explore the (re)specification of the term URL in the HTML5 specification, and explain the controversy that is causing. You should explain why getting an agreeable resolution has proven difficult, state what you think is likely to happen in the future, and give any additional insights you may have on the problem. Be sure to explore some of the interesting security problems that arise with internationalized names.

There are many places on the Web to find information about this topic. Of course, you should read the pertinent RFCs and drafts. Another good place to get a sense of the debate is in the archives of the IRI public e-mail discussion (available at http://lists.w3.org/Archives/Public/public-iri/).

Internationalization of the Web Today

With or without IRI's, the Web is used all over the world today. Partly due to Tim Berners-Lee's insistence that the Web be an international system, many capabilities are provided for creating and browsing content in a wide variety of languages. Nevertheless, the Web was first widely deployed in Western countries, and as noted above, many of the building blocks used for the Web such as DNS were originally ASCII only.

Furthermore, supporting content in a wide variety of languages is only one of the challenges of building international system: different communities differ in their cultural expectations, in the types of devices that they use to access the Internet, and even in the calendar systems that they use to record dates; for example Hebrew, Buddiest, Tibetan, traditional Chinese, Japanese and other cultures use lunar or lunisolar calendars. Many languages are traditionally written right-to-left or vertically, and Japanese has ruby (or rubi) markings. Different cultures may have differing legal restrictions.

Prepare a report that explores how Web browsers, HTML, HTTP and other tools meet (or sometimes don't meet) the requirements for people around the world to use the Web in a way that feels natural to them. Also explain some of the challenges that arise when connecting users and systems from around the world.

HTML, XML, XHTML and the need for distributed extensibility

HTML was first built with simple tag names like <p>, <h1>, <title>, and so on. The first extensions were also introduced with simple names like <img>. Names like this are convenient, but many people and organizations around the world experiment all the time with creating new features for the Web and for HTML. There is a risk that the names they choose for these features will conflict. Furthermore, it may be difficult to find in an automated way resources, descriptions, and code relating to the new features if all one has to go on is the fact that the tag name is spelled <img>.

In part for these reasons, the inventors of XML built a more elaborate system that was documented in the XML Namespaces Recommendation. With Namespaces, names are managed in groups, and each group is identified by a URI. In an XML document using namespaces, short prefixes (maybe "tufts") can be associated with a URI (http://www.tufts.edu/classassignments) and then the prefixes can be used to create two-part names <tufts:student>. Namespaces help to avoid the collision problem described above: if two different organizations create a tag "student", it's clear which one any particular document is using. On the other hand, namespaces introduce complexities, both for humans typing a document, and for software that must process structured, two-part names.

Early in the 21st century, the W3C embarked on a strategy of unifying HTML and XML. The XHTML Recommendation provided a standard for representing HTML as conforming XML documents (note that conventional HTML is much more forgiving than XML about things like explicitly closing all tags: e.g. HTML allows just a <br> while XHTML requires <br> </br> or <br />.) XHTML was created in part so that common tools could be used with both HTML and XML documents, but also so that HTML could benefit from the more robust naming and extensibility model provided by XML.

Many implementers of browsers and other users of HTML were unhappy with this decision. The reasons varied, but included concerns about complexity and the feeling that perhaps distributed extensibility was not needed in the first place. Indeed, the current work on HTML5 grew in part out of that rebelion against the use of XML and namespaces for HTML.

Write a detailed report that explains the technological and other trade-offs relating to distributed extensibility in HTML, and XHTML in particular. Explain the pros and cons of different approaches, and the positions taken by various key players in this discussion. Explain what was actually done, and what its consequences are likely to be. Be sure to explain the solution actually adopted, citing the pertinent parts of the HTML5 specification. Also explain, and if possible give pointers to quotes from, people who agree and disagree with the solutions adopted. If you look, you should also be able to find online records of e-mails, IRC logs and other places where the debate actually took place.

There are many important references that you will want to read and cite in doing this work. One that might be helpful as a guide to some of the others is the slides for a presentation that Noah gave on this topic to the W3C Technical Plenary in November of 2009 (ppt, pdf — note that the content is the same, but the PowerPoint is easier to follow, and you should use F5 to run it in slideshow mode, as there are some animations.)

SVG vs. HTML5 Canvas

As we discussed briefly early in the class, SVG is a standard XML-compatible markup language for creating Web graphics. SVG can be used in conjunction with HTML to add graphics to Web pages, and support for SVG is now available in most browsers.

Several years ago, the designers of HTML5 decided to support an additional facility for creating Web graphics using the new HTML <canvas> element. <canvas> and SVG provide overlapping capabilities, yet they are implemented in very different styles, and both are supported in HTML5.

Write a report exploring the decision to support both of these facilities, and the controversy about the decision. Consider principles that we have studied in our course, such as the Rule of Least Power. Explain how SVG and/or canvas can be used with other Web technologies like CSS. The decision to support both of these was made several years ago, and both have been in modern browsers for some time. Report on which is actually being used in practice, and why.

Overall, do you think that a wise decision was made?

CSS Vendor Prefixes

This is another project relating to extensibility. While the HTML and XHTML communities struggled to find the right compromises for making HTML extensible, the CSS working group adopted an approach to CSS extensions that initially seemed more successful. Vendors producing experimental CSS facilities were encouraged to use selectors like:

-Webkit-border-radius

which is supported by the Webkit rendering engine that's used in browsers on the iPhone, Mac Safari, and Chrome. The idea is that vendor-specific names like this will be used during an experimental period, and if the facility becomes widely adopted, a more appropriate vendor-neutral name will be used:


border-radius

Unfortunately, things aren't quite working out that way. Users have started to build important Web sites using the experimental names; to support those sites, non-Webkit browser such as Firefox has started supporting some of the Webkit-specific names, creating a likelihood or at least a risk that they will emerge as de facto standards over the long-term. All this demonstrates yet again why extensibility architectures are difficult to create, and why they aren't always used in the ways intended.

Write a report that explains in detail the technical characteristics of the vendor-specific prefix names adopted by the CSS working group. Present a careful analysis of what has actually happened, and the various opinions about what should be done now. Contrast CSS with HTML and XML, and explain whether a solution like vendor-specific prefixes might be suitable for creating HTML tags as well. Do vendor-specific prefixes provide a good solution for CSS after all? Are there better alternatives that might have been adopted, or that should be adopted now?

Buffer Bloat

Warning: this topic is probably best suited to those who have studied networking in a course like COMP 112, and who have an interest in low-level networking architectures.

In principle, TCP and UDP provide very flexible facilities that multiple computers and programs can use to run a wide variety of networking applications simultaneously. Most of the time, this works surprisingly well. Unfortunately, as we studied in class, even the best abstractions tend to "leak", and a combination of subtle problems are causing significant failures that are affecting the operation of everything from massive Internet packet switches, to simple home networks.

Jim Gettys, a well-known expert in computer networking (and by the way the original author of the X Window System) has been exploring and documenting a problem that he refers to as "buffer bloat". You can find a variety of presentations by Jim and others on the Web, but probably the best introduction is the ACM Queue article that he wrote with Kathleen Nichols titled Bufferbloat: Dark Buffers in the Internet.

Sub-optimal or even nonconforming buffering strategies allow TCP streams or UDP application sending large number of packets to create unacceptable delays for other applications and computers using the same network paths. This is not just an esoteric network design question: it is very possibly the reason that your Skype connections glitch while you are transferring files. The concern is that the problem will continue to get worse, that more and more important applications will be affected, and that it's not entirely clear whether the problem can be corrected without simultaneously replacing massive amounts of hardware, and redeploying millions of copies of operating systems and other networking software.

Write a report explaining buffer bloat and the problems that it is likely to cause for the Web. Are there changes that should be made to the Internet technologies we have studied, such as HTTP, that would help with buffer bloat? Are modifications needed to Web browsers, Web servers, and/or Web proxies? How does buffer bloat relate to the various principles that we have studied in our course? Do you expect the buffer bloat will significantly impact the performance of the Internet and the Web in coming years, or do you expect that the problem will be addressed before it becomes much more serious?

JSON vs. XML

XML was promoted, starting in the mid-1990s as a data and document format that would unify the Internet. One of the most important use cases for XML was to provide a common model for publishing data on the Web. A tremendous investment was made, and indeed, XML is very widely used today for many purposes.

Nonetheless, for data publishing to AJAX applications on the Web, and for a growing range of other purposes, JSON has emerged as a more popular alternative to XML (in the case of AJAX, this is particularly ironic since the "X" in "AJAX" refers to XML).

Write a report that compares the advantages of XML and JSON. Clearly explain what each one is good for, giving concrete examples. Explain the history of each of the technologies and explore the different purposes for which they are used today. You should take some care with this, as it's easy to find conflicting claims about which technologies are actually in use for what. Try to make sure that your assertions are supported by facts (though no answers can be completely accurate, some interesting bounds on the actual adoption rates of these technologies can sometimes be obtained by doing Google searches for resources in particular formats; there are also a variety of studies you can find discussing adoption). Explain which technologies are supported by widely used products such as desktop office software (think Microsoft Word, or OpenOffice). Which are supported by databases like Oracle, IBM DB2, Microsoft SQL Server or MySQL? Does this matter? Which principles of Web architecture should be considered in deciding how to publish data on the Web? What is the role of schema languages in supporting various use cases, and is the availability or lack thereof of schema languages significant to which of these technologies is being adopted? Are both technologies equally useful for documents as well as data?

Overall, do you expect both technologies to be widely used in coming decades, or do you expect that one will come to completely replace the other?

Submitting Your Report for Grading

As noted above, you may create your own HTML format for your report. If the entire report is contained in a single file named FinalProject.html, then just "provide" that html file.

provide comp117 finalreport FinalReport.html

If your report needs multiple files, e.g. if there are image files or separate CSS files then you must put them all in a single directory named FinalProject, and you must make sure they reference each other with relative URIs like <img src="./greatpicture.jpg">. If you create a directory, just provide that:

provide comp117 finalreport FinalReport

As always, you may provide an explain.txt file if there are lateness excuses or other information we need to know for grading.

.