Tufts CS 117 (Fall 2024):
Internet-scale Distributed Systems

More Course Information & Resources

COVID and Health

We ask you to be very careful in class, and please do not come to class or meet in groups if you even suspect you have been exposed to COVID. Your instructor is not young, and members of his family are at unusually high risk for complications if they get COVID. For both those reasons, and although Tufts no longer requires it, we would be particularly grateful if you would wear a mask while in class, especially if you are coming up to talk to Noah one-on-one. Similarly, if you come to office hours in Noah's office then you must wear a mask, though if you have a problem with that we can try to find a more open and better ventilated place to meet.

If you are sick or suspect you might have been exposed to COVID then do not come to class, but email noah@cs.tufts.edu who will arrange for you to get a link to a recording of the lecture(s) you've missed. Note that lectures are not live streamed: recordings become available the evening after each lecture, once echo360 has processed them.

If you are healthy and do not have some other particular good reason for being absent (e.g. job interviews, weddings, etc.) then you are expected to attend class in person. This is a discussion class, and it doesn't work if everyone stays home to watch recordings of a mostly empty classroom.

Late Homework

Late Homework Overview

As explained in the introductory lecture on grading policy all penalties for late homework are at the discretion of the instructor. There is no fixed penalty for each day that an assignment is late. Indeed, typically, your numeric grade is not affected at all, but some separate "lateness points" may or may not be noted, depending on how late your work is, whether you have an acceptable reason for being late, whether answers have been discussed in class prior to your submission, etc. At the end of the year, your grade may be reduced if you have a pattern of submitting work late frequently.

Do Not Email the Instructor for Late Homework Approval (Usually)

Except in unusual situations, you should not email the professor or TA to ask for permission to submit homework late. There is one very important exception, which is if you have made a submission you don't want us to grade at the deadline. To summarize:

Submission procedure for late homework

If your homework is late, then along with the other files you provide, you must provide an additional one called explain.txt in which you give the reason for the late submission. This will make it easy for us to find the explanation at the time grading is being done. You must do this whether or not you have approval in advance by e-mail. Indeed, if you have e-mailed the instructor, then you should include the text of your mail and any response in the explain.txt file. If you merely e-mail the instructor, then it's a lot of work for us to find the e-mail that matches your submission. By providing the explanation with your submission, you make it easy to find, and also easy to keep with your work if there are questions at the end of the year. In general, the only excuses or explanations that will be considered when grading will be those included in your explain.txt file.

A few more notes on late homework and explain.txt.

Again: the course policy is that penalties for late work are at the instructor's discretion. If your overall track record is good, you probably will do fine even if you occasionally slip up without a great excuse. Obviously, grades will suffer for those whose work is late more than occasionally without a good reason, and there may be cases where grades will be reduced if answers are explained in class before your (unexcused) late submission is received.

Rationale for the Late Homework Policy

There are a several reasons for doing things this way:

If you have questions or concerns about late homework, ask in Piazza so that everyone will see your question and the answer (unless your question involves matters that are private, such as your health).


Recommended Books

There are no books that you are required to buy, but you must get access to a copy of Tim Berners-Lees' book Weaving the Web which tells the story of the invention of the Web. Note that we have loaner copies, and if you are flexible on when you do the reading, we should be able to lend you one for a couple of weeks. Any edition, paperback or hardcover, is fine and used online copies are inexpensive. The page numbering appears to be the same in all of them. Used copies are often available inexpensively from online booksellers, sometimes just for the cost of shipping.

No other books are required, but several you might want to consider. Michael Kerrisk's book The Linux Programming Interface: A Linux and UNIX System Programming Handbook is a great reference on Unix/Linux programming, and its introductions to topics like TCP/IP and Sockets are among the best I've seen. A few chapters will be assigned as reading. It is available on the online O'Reilly Books system, and you can read it there for free if you prefer.

We'll be doing some advanced C++ Programming. You should be able to learn what you need from online sources, but some students may prefer a good paper reference. The most detailed, if not necessarily the easiest to navigate, is The C++ Programming Language: Special Edition by Bjarne Stroustrup, who is the inventor of C++. It's available for free to Tufts students on O'Reilly Books too.

Similarly, though it will not be assigned reading, the book Unix and Linux System Administration Handbook, 4th Edition by Evi Nemeth, Garth Snyder, Trent Hein, and Ben Whaley is a great source of information about Unix/Linux command level programming. Its stated goal is to teach you to administer a Unix system, but it's got tons of useful information for anybody trying to do serious work with Unix or Linux. To contrast their strengths: Kerrisk will tell you how to use program APIs in your C/C++ code to do things like create files, set permissions, etc; Nemeth et. al. are more likely to tell you how to do the same things from the command prompt.

To re-emphasize: you should not have to buy any of these books to do well in the course. Then again, if you're like me and like to get good books that teach you things you didn't even know you should learn about, all of the above are great options. You'll probably be referring to several using O'Reilly Books; Kerrisk and Weaving the Web are the only ones from which I expect to assign reading.

Using O'Reilly Books Online

O'Reilly books can be accessed from any modern Web browser and access is free for Tufts students. O'Reilly has hundreds of excellent books on computer science, including several to which we will refer in CS 117.

Getting an O'Reilly Account

To get a free account go to the Access Tufts O'Reilly Books Online Learning Page. Once you've done that, you will have an opportunity to enter a password to create an account, if you don't already have one.Use your tufts.edu email, not your CS dept email.

Once that's done, you can use the links provided below or on our assignments to get to O'Reilly material. If you can only get to short selections from the book, you are not signed in properly. Please report any problems with O'Reilly access on Piazza.

Please sign out from O'Reilly when you are done; most years there is a limit on number of simultaneous users.

O'Reilly links to CS 117 Books

O'Reilly provides URIs for each book (see below), or you can just search for the author's name in the O'Reilly search box. Be sure that either: 1) you are on the campus network -or- 2) you have used the VPN to tunnel into the compus network -or- 3) that you have logged onto O'Reilly Books using the Tufts Access Page before attempting to follow these links. .

Nemeth, Et. al.https://learning.oreilly.com/library/view/unix-and-linux/9780134278308/

TAG Recommendations and Findings

The W3C Technical Architecture Group is the senior technical steering committee for the Web. One of its responsibilities is to educate the community on principles of Web Architecture, and also to explore or resolve important architectural problems.

The TAG has written a W3C "Recommendation" titled Architecture of the World Wide Web, Volume One (AWWW — by the way, there is so far no Volume Two). AWWW is probably the best available exposition of the Web's architecture and of its correct use. Sections will be assigned from time to time throughout the term, but you are encouraged to go beyond the assignments too.

AWWW is a formal W3C Recommendation, which means that it was subject to extensive community review before being finalized, and specifically that the entire member ship of the W3C agreed to its publication. The TAG also writes smaller "Findings"; these represent the considered opinion of the TAG on specific issues, and in some cases the findings provide either more detail or even corrections to sections presented in AWWW. A list of TAG findings is available, and we will study several findings later in the term.

Research Papers

Several academic research papers will be assigned during the term. All will be provided online, and linked from the pertinent assignment. There is typically a charge for publications of the Association for Computing Machinery (ACM), but Tufts has a paid up license for unlimited student access. To access the ACM Digital Libary from inside tufts.edu, go to http://www.acm.org/dl; from outside, use https://login.ezproxy.library.tufts.edu/login?auth=test&url=http://www.acm.org/dl/. Unfortunately (and ironically given the principles we study in CS 117), links to ACM publications that you find using search engines like Google may not work directly with the ezproxy login; you may have to log in through the proxy, then use the ACM digital library search facilities to find the same paper. Once you do, access (typically to a .pdf) should be free. In any case, the ACM digital library is probably the the most important resource for scholarly publications in computing. The free access that you have is a terrific asset!

Standards documents and other Online Resources

As we will discuss in detail, Internet-scale systems like the Web typically interoperate not by requiring identical code at all nodes, but by requiring agreement on data formats and protocols among multiple implementations that are built to meet different needs. For example: the default Web server in Windows (IIS) is a different code base than than the Apache server that is preferred on many other systems, and both of those are different from the embedded servers found in some small devices. Nonetheless, all servers conform to (more or less) the same HTTP and other standards, and so all should work well with conforming Web clients.

Usually, these standards are formalized under the auspices of non-profit organizations. The two most important such organizations for the Web and Internet are the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF). The W3C produces what the call Recommendations; the IETF produces so-called Requests for Comments (RFCs) which, unlike what you'd guess from the designation, often are the normative documentation for technologies like TCP/IP and HTTP. We will study a number of RFCs and Recommendations; all are available on the Web.


A list of assignments is maintained on the assignments page. In most cases, your submissions will be made using, provide with the class code comp117 (note the lowercase). So, a typical submission will be made like this:

provide comp117 <assignmentname> file [more files...]

Our grading framework depends on having submissions made using provide; if you are preparing written work on your personal machine, you must copy it to one of the Halligan servers and use provide to make your submission.

Most of our distributed programming assignments depend on code libraries and networking infrastructure that is available only on our Halligan servers, and testing them requires use of multiple virtual machines at the same time. While it's probaby possible in principle to copy the necessary support frameworks to your own machines, in practice you will want to do your debugging using the Halligan servers, logging in remotely if necessary. See the section below Access from Off Campus for hints on remote access.

In many cases, you will be asked to respond to questions provided in an HTML form. Two links will be provided, one to allow you to browse the questions in advance, and one to download the HTML for you to edit. You will modify the HTML to include your answers, and use provide provide to submit it. Please do not rename the HTML file before submitting it.

Access From Off Campus

You will typically need to compile and test your distributed programs on the Halligan Linux servers. The easiest way to access these is by using one of the lab machines in Halligan, or by connecting your own computer to the Halligan network.

Although we do not officially support it in CS 117 (I.e. if you have trouble we don't have TA resources to investigate), it's often possible to use the VPN software provided by the CS department to access the Halligan network from home or work using your personal machine. This is done by installing on your machine a trusted VPN program that routes your network traffic to the Hallian network and that resolves host names using the Halligan DNS.

Getting the VPN software

Use your Web browser to go to the CS Dept Introduction to the CEAS VPN. There are instructions there for downloading the VPN client and installing it on your own computer. You will need your CS Dept login and password (not your Tufts password!) to do the download, and again to authenticate your connection to the Halligan network.

What you can do with the VPN

Programming environment

Environment Variable

The programs and support frameworks we use for programming assignments will sometimes need to locate configuration files and other information. Makefiles will need to find shared library code. To make all this easier, most of these programs look in Linux environment variable COMP117 to find the area in the filesystem where all this is stored. Therefore, for most of these programs and build scripts to work, it is essential that the environment variable be set.

Since you'll want the variable set all the time, the best way to do this is add the necessary code to your .cshrc file (if you use the standard tcsh shell, which runs at system startup. Use an editor to add the following line to your ~/.cshrc (which lives in your home directory):

setenv COMP117 /comp/117

The command must be specified exactly that way. If you put it in the .cshrc, then it usually won't take effect until you log off or log on. Log off then log on (or if you know how to use "source .cshrc" that's OK too), and use this command to check if it worked:

echo $COMP117
This should respond with: /comp/117

If it does, you're all set. If not, check everything, and if you need help, ask our TA.

If your login is setup to use bash as your shell, then the details are a little different. Instead of ~/.cshrc you edit ~/.bashrc and add the line:

export COMP117=/comp/117

Typically, regardless of which shell you use the changes won't take effect until you log off and log in again. To avoid doing that when you first update the script you can also do:

source ~/.cshrc   <— (or ~/.bashrc if using bash)

Doing that will rerun your startup script and should set the environment variable.

If our COMP117 sample programs won't build or won't run, failing to set the environment variable is a likely cause.

Virtual servers

You will be writing programs that communicate using TCP/IP. Running such programs on a Linux system can interfere with normal operation of the system, especially if such programs are untested and may behave erroneously. Therefore, you are not to run your distributed programming class projects on homework.cs.tufts.edu, or on any other ordinary system connected to the campus network! Instead we have two virtual servers that you will use, comp117-01 and comp117-02. These are not visible from the public Internet, or necessarily from all parts of campus. You can ssh into them from machines on the Halligan network. When you do, you use your usual login id and password, and you will share your usual home directories. Also: be sure that the COMP117 variable is set when you log onto the virtual servers. To check, use the echo command as described above. Note that the full hostnames are comp117-01.eecs.tufts.edu and comp117-02.eecs.tufts.edu so if the short forms don't work, try those.

The first time you ssh you migth see a warning like (details will likely be different):

The authenticity of host 'comp117-02.eecs.tufts.edu (' can't be established.
ECDSA key fingerprint is SHA256:EgqOaXgBQ+svbW6RvmnDvj9RU5k+SvpImSgzRRm9TuY.
ECDSA key fingerprint is MD5:14:49:b7:1a:59:84:05:12:a8:e3:81:da:de:ad:e2:14.
Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'comp117-02.eecs.tufts.edu' (ECDSA) to the list of known hosts.
Warning: the ECDSA host key for 'comp117-02.eecs.tufts.edu' differs from the key for the IP address ''
Offending key for IP in /h/noah/.ssh/known_hosts:27
Are you sure you want to continue connecting (yes/no)? yes

As shown above, you can typically answer "yes" to the prompts to continue connecting.

If you don't use XWindows, then an alias like this may be helpful. See the man page for ssh:

alias 117-01 ssh yourusername@comp117-01.eecs.tufts.edu

You can define that in your ~/.cshrc file, which gets processed at login (log off and login to make sure it's reprocessed, or do source ~/.cshrc to kick it without logging off. If you've done this right, the command 117-01 should take you to COMP117-01. You'll have to find a way to run multiple command windows on your desktop, so you can switch among homework.cs.tufts.edu and the virtual servers.

A few hints on logging in remotely to virtual servers using X Windows

X Windows is widely used with Unix systems to provide a graphical environment for running applications. You do not need to use X Windows; if you prefer, just open two extra shells as described above and use your local windowing environment to switch and arrange them. This is the easiest way, and the only one for which we will provide formal TA support.

X Windows is very useful, but tricky. A basic tutorial on XWindows is beyond the scope of this info page; if you're using X, we'll assume here that you know how to get as far as running an xterm window remotely from homework.cs.tufts.edu to your local machine. To do that, you'll need to have a local X Server installed and running (instructions are different for Windows native, Cygwin under windows, and for Macs; Macs come with X built in, but details are different depending on how old your OSX is). FWIW, Noah uses Cygwin under Windows — if you aren't already comfortable Cygwin you'll have some learning to do to figure out how to install it, and especially how to set up XWindows. That may be more trouble than it's worth unless you can find help and, unfortunately, we do not have the resources to help you with this, but there's a lot on the Web. Be sure to search for help on installing XWindows under Cygwin, and how to use startxwin to get it going when you log in. If that's too hard, just use the character mode options described in the sections above.)) On Cygwin's X, and perhaps on others, you'll have to be sure your DISPLAY environment variable is set (in .bashrc it's export DISPLAY=":0.0" and in .cshrc it's probably export DISPLAY :0.0. So, you'll have to figure out, perhaps with help from friends, how to get as far as setting all that up so you can get xterm windows open on the homework.cs.tufts.edu servers.

You'll eventually want to open three xterms when you work. One will be used in the obvious way to run compiles and other commands on the main homework.cs.tufts.edu servers. Open the first one in the obvious way, and then use the following technique to create the other two, which will be ssh sessions making one more hop into COMP117-01 and COMP117-02 respectively. You may find ways that are more convenient for you, but as an example you could make an alias in your .cshrc file like this:

alias ids01 ssh -fY yourusername@comp117-01 xterm -T "COMP117-01"

When you log in to homework.cs.tufts.edu this will define a command ids01 that will take you to the comp117-01 virtual machine, log you in, and then open a shell in an XWindow back on your client. You can do the same for "ids02". If you open a window on each, then you can do things like testing a client program on one, and a server program on the other, all controlled from your one display. After doing this, and perhaps logging in again to make sure your alias got defined at startup, issue the command ids01. If all goes well, a new window should pop up, with COMP117-01 in the title bar. Type the hostname command to make sure you're on the server you think you are. Since the filesystem is shared, you can do your compiles and edits on homework.cs.tufts.edu, or on the lab machines in Halligan.

X-Window hints for Mac users

(Thanks to Tyler Heck for providing these)

Ensure that either the XQuartz or the X11 application is installed on your Mac. If neither are you can install XQuartz through an Apple supported package at: xquartz.macosforge.org.

To connect to a remote server via xterm, open a Terminal window on the Mac and enter:

$ ssh -fY [username]@[servername] xterm [-T [titlename]]

This will generate a new remote session for the user at the server in a new X window, with the ability to interact with remote GUI programs.

A breakdown of the connection command follows:

An example of a full command for the user jdoe01, attempting to connect to the sunfire servers (homework.cs.tufts.edu) with the title "SUNFIRE SERVER":

ssh -fY jdoe01@homework.cs.tufts.edu xterm -T "SUNFIRE SERVER"

Editing and compiling of CS 117 code can be done on the usual Sunfire servers, but testing must be done on virtual servers COMP117-01 and COMP117-02. You will likely want windows open on all three. First create window on Sunfire. An example connection from a remote location may look like (On John Doe's local machine):

jdoe@jdoesbox $ ssh -fY jdoe01@homework.cs.tufts.edu xterm -T "SUNFIRE SERVER"
(A new xterm window appears, prompting for login credentials)

Then, do the following twice to get windows on the virtual servers:

Within new window:
jdoe01@sunfire32 $ ssh -fY jdoe01@comp117-01 xterm -T "COMP117-01"
jdoe01@sunfire32 $ ssh -fY jdoe01@comp117-02 xterm -T "COMP117-02"

Each command creates an additional window, one for each of the virtual servers. You will likely have to enter your password twice. Adding an alias for the login commands to your local machine and the sunfire servers can be helpful in creating shorthand versions of the above commands. Make sure to do both of the above commands from Sunfire. If you do the ssh from one of the virtual servers to the other it will likely work, but your Window traffic will be making hops through many machines on its way to and from your Mac. It will be slow, and will add load to the Halligan network (X is a high overhead protocol).

TCP/UDP port assignments

TCP and UDP servers typically listen for connections or incoming data on what are called "ports", each of which is identified by an integer. Unfortunately, allocation of these ports is a problem. Some ports are reserved, e.g. port 80 is the standard for Web servers. If you managed to start a server listening on port 80 on homework.cs.tufts.edu, yours would be the Web server that responded to requests for pages at https://homework.cs.tufts.edu, something we don't want our student software to be doing. In fact, that port is protected, and an attempt to listen on it would fail. What's a bigger problem for us is that all of you will be trying your servers at the same time; so each of you needs to use a different port number. One of the reasons we're making sure on day 1 that we have your CS Dept login right is so we can associate a port number with it. More details on this will be provided with your programming assignments, but be sure you are following the rules we publicize to have your code listening and talking to the right port number. If you have any doubt, check before running your code!