Tufts CS 117 (Fall 2024):
Internet-scale Distributed Systems

Tufts CS 117 Reading Assignment:
Naming, URIS, and RFCs

Overview

Naming is one of the most important aspects of computer system design, and we will spend several days diving deeply into issues relating to naming. The purpose of this assignment is to give you some background on naming issues, focusing mostly but not entirely on the Web, and to encourage you to start thinking about design choices relating to naming.

In the second part of the assignment we ask you to take a more detailed look at RFC 3986, which for many years was the normative specification for URIs. There are several reasons for studying RFC 3986:

(By the way, RFC 3986 has been replaced by a newer set of specifications that's broken into more documents and that clarifies some important details. For pedagogic purposes, the older RFC 3986 is more convenient, so we continue to use it.)

Assignment

You will note that there are two sets of questions for this assignment. That is because we will have (at least) two in-class sessions to discuss naming. As usual, we ask you to consider each set of questions twice: you provide each set in time for the corresponding class discussion, but then have the opportunity to update your answers after class discussion is complete. Here's a summary of what's expected when:

Required reading — First set of questions

Please read the following:

For the first set of questions, your focus should be on the design of naming schemes, and the general lessons we can learn from the design of both the telephone number system and URIs. Later, we will go back and explore in more detail issues that relate to URIs in particular.

Regarding the reading on telephone systems: note that prior to the introduction of automatic dialing, telephone exchanges were known and requested by name, such as "Medford", and connections were made manually by operators plugging wires. To maintain a degree of compatibility, the automated system put three letters on each number of the newly invented "dial", so that numbers were listed in the form MED-1234, which we would now quote as 633-1234. Interestingly, the paper claims that the all numeric 7 digit form would be too difficult for people to remember, yet today we mostly work with 10 digit numbers! Keep in mind the need they had for compatibility and coexistence with an existing system, and also the relationship between the design of the names (phone numbers) and the newly invented switching equipment.

By the way, it's interesting to note the various dated and undated URIs given to the Metadata in URIs finding (see "This version", "Latest version", etc. at the front); this is standard practice at the W3C. See if you can figure out why.

Required reading — Second set of questions

Please read the following sections of RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax::

How carefully should you read? Plan on spending a couple of hours, but no more. We will discuss the important points in class, but you will only understand the classroom discussion if you've first made a serious effort to figure things out on your own. Remember, you have lots of examples of URIs to look at from browsing the Web. One good way to check your knowledge of the spec is to look first at simple URIs (like http://www.tufts.edu/) and convince yourself of how RFC 3986 explains the syntactic components. Then go on to something a bit more complicated, like the section links above, and then to something even messier like the following map link: https://maps.google.com/maps?q=restaurants+near+tufts+university&hq=restaurants&hnear=Tufts+University,+Medford,+Middlesex,+Massachusetts+02155&z=15. Which parts of each URI are fixed in the generic syntax, and which were chosen by the person or organization that created the particular URI?

Try to understand the general idea of the ABNF grammars, and how URIs like the examples above are accepted by the grammar. Note that ABNF language itself is specified in RFC 2234.

Optional reading

If you find these naming issues interesting, you might want to at least skim the Naming Chapter (Chapter 3) from the book Principles of Computer System Design by Salter and Kaashoek (yes, the same Jerry Saltzer who wrote the end-to-end paper). This book is used as the text for a classic systems course taught at MIT (available to you at at O'Reilly Books Online ... see instructions on course info page.) Unfortunately, the chapter is quite long, and it mixes some very important insights with other points that are less important. So, I recommend you take a look, and at least skim it, as it may prove a useful reference occasionally. In any case, the fact that this important textbook devotes an entire early chapter to naming (actually it devotes two to naming), may convince you that it is indeed a deeply important aspect of system design, and worth careful thought. Again, this is not required reading

The following additional optional reading is for fun and to stretch your knowledge a bit.

RFC 3986 was not the first specification for Web identifiers. The introduction to RFC 3986 gives links to the earlier ones:

This document obsoletes [RFC2396], which merged "Uniform Resource Locators" [RFC1738] and "Relative Uniform Resource Locators" [RFC1808] in order to define a single, generic syntax for all URIs. It obsoletes [RFC2732], which introduced syntax for an IPv6 address. It excludes portions of RFC 1738 that defined the specific syntax of individual URI schemes; those portions will be updated as separate documents. The process for registration of new URI schemes is defined separately by [BCP35]. Advice for designers of new URI schemes can be found in [RFC2718].

None of those are required reading, but you might enjoy skimming them to get a sense of how things evolved from Tim BL's earliest RFC 1630

Getting the References

All references are available online:

Optional:

Getting the Questions

As usual, questions are provided in an HTML files, copies of which you can download. The instructions above tell you which ones to answer and submit when (you can always submit the second set early if you like, and resubmit repeatedly until the final due date...serious grading doesn't begin until then). You must supply your answers by inserting them in the spaces provided in the downloaded HTML file, and when you are done, you must submit your answers using the usual Tufts CS department "provide" command. See instructions below.

For full credit, your file should validate as HTML5 using the official validator for uploaded files. It may not be possible in all cases for the graders to check the validity of every submission, but we reserve the right to do so when we suspect trouble, and to deduct credit for validation failures. You can ignore the warning about "Using experimental feature: HTML5 Conformance Checker"; that's just because the HTML5 validator is still experimental at W3C (because the specification for HTML5 isn't final).

Review questions for this assignment (first set) - Download questions (first set) for this assignment

Review questions for this assignment (second set) - Download questions (second set) for this assignment

Submitting your answers

Download the HTML files with the questions using the link above. Fill in your answers, use your local browser to check formatting, and the HTML validator to make sure your HTML is correct. You may ignore warnings about character encodings. Then use provide to submit (first time):

provide comp117 naming namingquestions.html 

Later submissions:

provide comp117 naming namingquestions.html namingquestions2.html

Note that comp117 is lowercase; provide will choke if you get that wrong. Again, it is OK to include both sets on the first submission if you happen to have them done. You must include both sets on the final submission, even if the earlier one has not changed. Detailed grading will be done only on your final submission.