Overview
Naming is one of the most important aspects of computer system design, and we will spend several days diving deeply into issues relating to naming. The purpose of this assignment is to give you some background on naming issues, focusing mostly but not entirely on the Web, and to encourage you to start thinking about design choices relating to naming.
In the second part of the assignment we ask you to take a more detailed look at RFC 3986, which for many years was the normative specification for URIs. There are several reasons for studying RFC 3986:
- As the normative specification, RFC 3986 is the official reference for URIs.
- RFC's are the main way that the IETF documents the core protocols of the Internet, and RFC 3986 is a great example of a technical specification in RFC form. Indeed, it's interesting to see the detailed issues that must be addressed in a specification of this sort. RFC 3986 was written with great care by the best experts in the field, and it tries to address ambiguities that had caused confusion in earlier URI specifications.
- RFC 3986 illustrates the use of grammars for specifying the syntax of string-based constructs like URIs.
Assignment
You will note that there are two sets of questions for this assignment. That is because we will have (at least) two in-class sessions to discuss naming. As usual, we ask you to consider each set of questions twice: you provide each set in time for the corresponding class discussion, but then have the opportunity to update your answers after class discussion is complete. Here's a summary of what's expected when:
- Your preliminary answers to the first set of questions is due on Thursday September 26, for our first in-class discussion of naming. Your provide should include just that file (and an
explain.txt
if you have any messages for us about late work or other problems). - Your preliminary answers to the both sets of questions is due in time for class on Thursday October 03, when we will have our discussion of URIs and RFC 3986.
Please include both your (preliminary) answer sets when you
provide
, as that makes it easier for us to find them. - Final answers to both sets of questions are due by 11:59 PM on Saturday October 05. As usual, your grade for both sets will be based primarily on this final submission. You must provide both together with your final submission, even if your answers to the first haven't changed; just provide them again please. If neither of your answer files have changed since the second interim submission deadline, then there is no need to resubmit.
Required reading — First set of questions
Please read the following:
- Machine Telephone Switching System for Large
Metropolitan Areas, a paper on the invention of the dial telephone
system.
Yes, this really is your chance to
read a research paper published in 1923!
- Skim but don't spend much time on the beginning of the paper: pp.53 - 60
- Read carefully the section General Plan of Operation starting on p.60 and continue through to the end of the section Numbering System at the bottom of p.64
- You need not read the rest (though some of it is interesting technology history), but you might look a bit at the "Detailed Plan of Operation" on p.75
- Tim Berners-Lee's Design notes on URIs
- The TAG's Architecture of the World Wide Web, through the end of Chapter 2.
- The TAG Finding: Metadata in URIs
- Skim RFC 3986 — the specification for URIs. Your goal for this first phase is just to spend just a few minutes getting a general idea of the URI syntax and how it's presented. You do want to leave some time to do the more detailed RFC 3986 reading we require for the second set of questions, so don't put all of that off for the last two days. For this first set of questions, a quick skim focusing on main features of URIs will do.
For the first set of questions, your focus should be on the design of naming schemes, and the general lessons we can learn from the design of both the telephone number system and URIs. Later, we will go back and explore in more detail issues that relate to URIs in particular.
Regarding the reading on telephone systems: note that prior to the introduction of automatic dialing, telephone exchanges were known and requested by name, such as "Medford", and connections were made manually by operators plugging wires. To maintain a degree of compatibility, the automated system put three letters on each number of the newly invented "dial", so that numbers were listed in the form MED-1234, which we would now quote as 633-1234. Interestingly, the paper claims that the all numeric 7 digit form would be too difficult for people to remember, yet today we mostly work with 10 digit numbers! Keep in mind the need they had for compatibility and coexistence with an existing system, and also the relationship between the design of the names (phone numbers) and the newly invented switching equipment.
By the way, it's interesting to note the various dated and undated URIs given to the Metadata in URIs finding (see "This version", "Latest version", etc. at the front); this is standard practice at the W3C. See if you can figure out why.
Required reading — Second set of questions
Please read the following sections of RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax::
- Read carefully: start of document through Section 2 (i.e. up to the beginning of section 2.1).
- Skim: section 2.1 through section 2.5 (there's lots of detail you don't need to learn now...just get a general sense of the sorts of things they're worrying about, e.g. "percent encoding" characters that are either reserved or aren't ASCII)
- Read carefully: section 3 through section 3.2: be very sure you understand the general way the ABNF defines the syntax, and understand the main syntactic components like scheme, hier-part, authority, query and fragment
- Skip section 3.2 - user information (User information is useful for schemes like mailto:. You should know that it's in the syntax, but don't worry about other details)
- For the remaining sections in chapter 3: read carefully to get a sense of the main purpose and syntax of each part of the syntax, but don't burn time on messy details. For example, you should have a general sense of what a host subcomponent is for, and that there's an option to use IP addresses in place of DNS names. You should definitely understand what path, query and fragment are for, and recognize those in real URIs. You should not worry about the pages of detailed syntax for things like IPV4 and IPV6 addresses.
- Read carefully: all of Section 4, that is going through section 4.5 - Suffix Reference
How carefully should you read? Plan on spending a couple of hours, but no more. We will discuss the important points in class, but you will only understand the classroom discussion if you've first made a serious effort to figure things out on your own. Remember, you have lots of examples of URIs to look at from browsing the Web. One good way to check your knowledge of the spec is to look first at simple URIs (like http://www.tufts.edu/) and convince yourself of how RFC 3986 explains the syntactic components. Then go on to something a bit more complicated, like the section links above, and then to something even messier like the following map link: https://maps.google.com/maps?q=restaurants+near+tufts+university&hq=restaurants&hnear=Tufts+University,+Medford,+Middlesex,+Massachusetts+02155&z=15. Which parts of each URI are fixed in the generic syntax, and which were chosen by the person or organization that created the particular URI?
Try to understand the general idea of the ABNF grammars, and how URIs like the examples above are accepted by the grammar. Note that ABNF language itself is specified in RFC 2234.
Optional reading
If you find these naming issues interesting, you might want to at least skim the Naming Chapter (Chapter 3) from the book Principles of Computer System Design by Salter and Kaashoek (yes, the same Jerry Saltzer who wrote the end-to-end paper). This book is used as the text for a classic systems course taught at MIT (available to you at at O'Reilly Books Online ... see instructions on course info page.) Unfortunately, the chapter is quite long, and it mixes some very important insights with other points that are less important. So, I recommend you take a look, and at least skim it, as it may prove a useful reference occasionally. In any case, the fact that this important textbook devotes an entire early chapter to naming (actually it devotes two to naming), may convince you that it is indeed a deeply important aspect of system design, and worth careful thought. Again, this is not required reading
The following additional optional reading is for fun and to stretch your knowledge a bit.
- Meet the Man Who Invented the Instructions for the Internet
- Requiem for Jon Postel (if you're following any of the political fuss over ICANN and new top level domains, it's worth thinking about different things were in the early days...many have suggested that if Jon Postel were alive today, we would not be having these troubles)
- RFC 3987 provides for IRIs, which are URI's that allow non-ASCII characters like Japanese Kana. In practice, getting IRIs to coexist with software such as HTTP that depends on URIs has been very difficult. An effort known as RFC 3987bis is making slow progress toward revising RFC 3987.
- RFC 1630 - Universal Resource Identifiers in WWW is Tim BL's first IETF specification for URIs. Consider: RFC 3986 is much longer and more detailed. How much has really changed since Tim's initial proposal?
RFC 3986 was not the first specification for Web identifiers. The introduction to RFC 3986 gives links to the earlier ones:
This document obsoletes [RFC2396], which merged "Uniform Resource Locators" [RFC1738] and "Relative Uniform Resource Locators" [RFC1808] in order to define a single, generic syntax for all URIs. It obsoletes [RFC2732], which introduced syntax for an IPv6 address. It excludes portions of RFC 1738 that defined the specific syntax of individual URI schemes; those portions will be updated as separate documents. The process for registration of new URI schemes is defined separately by [BCP35]. Advice for designers of new URI schemes can be found in [RFC2718].
None of those are required reading, but you might enjoy skimming them to get a sense of how things evolved from Tim BL's earliest RFC 1630
Getting the References
All references are available online:
- Machine Telephone Switching System for Large Metropolitan Areas Bell System Technical Journal, 1923 Vol. 2 pp. 53 - 89 (pdf) Note that only some pages are required reading...see above. ,
- Axioms of Web Architecture: Universal Resource Identifiers, (part of Tim Berners-Lee's Design Issues series) ,
- Architecture of the World Wide Web, Volume One, W3C Recommendation, December, 2004 ,
- The use of Metadata in URIs, (this is a "finding" of the W3C Technical Architecture Group) ,
- RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
Optional:
- Principles of Computer System Design: Chapter 3 (The Design of Naming Schemes) (you will probably need to log on to O'Reilly Books using the Tufts proxy before this link will show you more than a preview) ,
- Other "for fun" articles are linked above.
Getting the Questions
As usual, questions are provided in an HTML files, copies of which you can download. The instructions above tell you which ones to answer and submit when (you can always submit the second set early if you like, and resubmit repeatedly until the final due date...serious grading doesn't begin until then). You must supply your answers by inserting them in the spaces provided in the downloaded HTML file, and when you are done, you must submit your answers using the usual Tufts CS department "provide" command. See instructions below.
For full credit, your file should validate as HTML5 using the official validator for uploaded files. It may not be possible in all cases for the graders to check the validity of every submission, but we reserve the right to do so when we suspect trouble, and to deduct credit for validation failures. You can ignore the warning about "Using experimental feature: HTML5 Conformance Checker"; that's just because the HTML5 validator is still experimental at W3C (because the specification for HTML5 isn't final).
Review questions for this assignment (first set) - Download questions (first set) for this assignment
Review questions for this assignment (second set) - Download questions (second set) for this assignment
Submitting your answers
Download the HTML files with the questions using the link above. Fill in your answers, use your local browser to check formatting, and the HTML validator to make sure your HTML is correct. You may ignore warnings about character encodings. Then use provide to submit (first time):
provide comp117 naming namingquestions.html
Later submissions:
provide comp117 naming namingquestions.html namingquestions2.html
Note that comp117 is lowercase; provide will choke if you get that wrong. Again, it is OK to include both sets on the first submission if you happen to have them done. You must include both sets on the final submission, even if the earlier one has not changed. Detailed grading will be done only on your final submission.