Tufts CS 117 (Fall 2023):
Internet-scale Distributed Systems
We ask you to be very careful in class, and please do not come to class or meet in groups if you even suspect you have been exposed to COVID. Your instructor is not young, and members of his family are at unusually high risk for complications if they get COVID. For both those reasons we would be particularly grateful if you would wear a mask while in class. If you come to office hours in Noah's office then you must wear a mask, though if you have a problem with that we can try to find a more open and better ventilated place to meet.
If you are sick or suspect you might have been exposed to COVID then do not come to class, but email firstname.lastname@example.org who will arrange for you to get a link to a recording of the lecture(s) you've missed. Note that lectures re not live streamed: recordings become available the evening after each lecture, once echo360 has processed them.
If you are healthy and do not have some other particular good reason for being absent (e.g. job interviews, weddings, etc.) then you are expected to attend class in person. This is a discussion class, and it doesn't work if everyone stays home to watch recordings of a mostly empty classroom.
There are no books that you are required to buy, but you must get access to a copy of Tim Berners-Lees' book Weaving the Web which tells the story of the invention of the Web. Note that we have loaner copies, and if you are flexible on when you do the reading, we should be able to lend you one for a couple of weeks. Any edition, paperback or hardcover, is fine and used online copies are inexpensive. The page numbering appears to be the same in all of them. Used copies are often available inexpensively from online booksellers, sometimes just for the cost of shipping.
No other books are required, but several you might want to consider. Michael Kerrisk's book The Linux Programming Interface: A Linux and UNIX System Programming Handbook is a great reference on Unix/Linux programming, and its introductions to topics like TCP/IP and Sockets are among the best I've seen. A few chapters will be assigned as reading. It is available on the online O'Reilly Books system, and you can read it there for free if you prefer.
We'll be doing some advanced C++ Programming. You should be able to learn what you need from online sources, but some students may prefer a good paper reference. The most detailed, if not necessarily the easiest to navigate, is The C++ Programming Language: Special Edition by Bjarne Stroustrup, who is the inventor of C++. It's available for free to Tufts students on O'Reilly Books too.
Similarly, though it will not be assigned reading, the book Unix and Linux System Administration Handbook, 4th Edition by Evi Nemeth, Garth Snyder, Trent Hein, and Ben Whaley is a great source of information about Unix/Linux command level programming. Its stated goal is to teach you to administer a Unix system, but it's got tons of useful information for anybody trying to do serious work with Unix or Linux. To contrast their strengths: Kerrisk will tell you how to use program APIs in your C/C++ code to do things like create files, set permissions, etc; Nemeth et. al. are more likely to tell you how to do the same things from the command prompt.
To re-emphasize: you should not have to buy any of these books to do well in the course. Then again, if you're like me and like to get good books that teach you things you didn't even know you should learn about, all of the above are great options. You'll probably be referring to several using O'Reilly Books; Kerrisk and Weaving the Web are the only ones from which I expect to assign reading.
O'Reilly books can be accessed from any modern Web browser and access is free for Tufts students. O'Reilly has hundreds of excellent books on computer science, including several to which we will refer in CS 117.
To get a free account go to the Access Tufts O'Reilly Books Online Learning Page. Once you've done that, you will have an opportunity to enter a password to create an account, if you don't already have one.Use your tufts.edu email, not your CS dept email.
Once that's done, you can use the links provided below or on our assignments to get to O'Reilly material. If you can only get to short selections from the book, you are not signed in properly. Please report any problems with O'Reilly access on Piazza.
Please sign out from O'Reilly when you are done; most years there is a limit on number of simultaneous users.
O'Reilly provides URIs for each book (see below), or you can just search for the author's name in the O'Reilly search box. Be sure that either: 1) you are on the campus network -or- 2) you have used the VPN to tunnel into the compus network -or- 3) that you have logged onto O'Reilly Books using the Tufts Access Page before attempting to follow these links. .
|Nemeth, Et. al.||https://learning.oreilly.com/library/view/unix-and-linux/9780134278308/|
The W3C Technical Architecture Group is the senior technical steering committee for the Web. One of its responsibilities is to educate the community on principles of Web Architecture, and also to explore or resolve important architectural problems.
The TAG has written a W3C "Recommendation" titled Architecture of the World Wide Web, Volume One (AWWW — by the way, there is so far no Volume Two). AWWW is probably the best available exposition of the Web's architecture and of its correct use. Sections will be assigned from time to time throughout the term, but you are encouraged to go beyond the assignments too.
AWWW is a formal W3C Recommendation, which means that it was subject to extensive community review before being finalized, and specifically that the entire member ship of the W3C agreed to its publication. The TAG also writes smaller "Findings"; these represent the considered opinion of the TAG on specific issues, and in some cases the findings provide either more detail or even corrections to sections presented in AWWW. A list of TAG findings is available, and we will study several findings later in the term.
Several academic research papers will be assigned during the term. All will be provided online, and linked from the pertinent assignment. There is typically a charge for publications of the Association for Computing Machinery (ACM), but Tufts has a paid up license for unlimited student access. To access the ACM Digital Libary from inside tufts.edu, go to http://www.acm.org/dl; from outside, use https://login.ezproxy.library.tufts.edu/login?auth=test&url=http://www.acm.org/dl/. Unfortunately (and ironically given the principles we study in CS 117), links to ACM publications that you find using search engines like Google may not work directly with the ezproxy login; you may have to log in through the proxy, then use the ACM digital library search facilities to find the same paper. Once you do, access (typically to a .pdf) should be free. In any case, the ACM digital library is probably the the most important resource for scholarly publications in computing. The free access that you have is a terrific asset!
As we will discuss in detail, Internet-scale systems like the Web typically interoperate not by requiring identical code at all nodes, but by requiring agreement on data formats and protocols among multiple implementations that are built to meet different needs. For example: the default Web server in Windows (IIS) is a different code base than than the Apache server that is preferred on many other systems, and both of those are different from the embedded servers found in some small devices. Nonetheless, all servers conform to (more or less) the same HTTP and other standards, and so all should work well with conforming Web clients.
Usually, these standards are formalized under the auspices of non-profit organizations. The two most important such organizations for the Web and Internet are the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF). The W3C produces what the call Recommendations; the IETF produces so-called Requests for Comments (RFCs) which, unlike what you'd guess from the designation, often are the normative documentation for technologies like TCP/IP and HTTP. We will study a number of RFCs and Recommendations; all are available on the Web.
A list of assignments is maintained on the assignments page. In most cases, your submissions will
be made using,
provide with the class code
comp117 (note the lowercase). So, a typical submission will be made like this:
provide comp117 <assignmentname> file [more files...]
Our grading framework depends on having submissions made using
provide; if you
are preparing written work on your personal machine, you must copy it to one of the Halligan servers and use
provide to make your submission.
Most of our distributed programming assignments depend on code libraries and networking infrastructure that is available only on our Halligan servers, and testing them requires use of multiple virtual machines at the same time. While it's probaby possible in principle to copy the necessary support frameworks to your own machines, in practice you will want to do your debugging using the Halligan servers, logging in remotely if necessary. See the section below Access from Off Campus for hints on remote access.
In many cases, you will be asked to respond to questions provided in an HTML form. Two links will be provided, one to allow you to browse the questions in advance, and one to download the HTML for you to edit. You will modify the HTML to include your answers, and use
provide provide to submit it. Please do not rename the HTML file before submitting it.
As explained in the introductory lecture on grading policy all penalties for late homework are at the discretion of the instructor. There is no fixed penalty for each day that an assignment is late. You may lose some credit or you may not, depending on how late your work is, whether you have an acceptable reason for being late, whether answers have been discussed in class prior to your submission, etc. At the end of the year, your grade may be reduced if you have a pattern of submitting work late frequently.
If you have questions about submitting late work, or want to get an excuse approved in advance, then e-mail the instructor. In other cases e-mail just wastes your time and ours. In such cases skip the e-mail, follow the procedure below and trust us to do something reasonable.
Except in unusual situations, you should not email the professor or TA to ask for permission to submit homework late. There is one very important exception, which is if you have made a submission you don't want us to grade at the deadline. To summarize:
explain.txtfile as described below.
If your homework is late, then along with the other files you provide, you must provide an additional
explain.txt in which you give the reason for the late submission.
This will make it easy for us to find the explanation at the time grading is being done.
You must do this whether or not you have approval in advance by e-mail. Indeed, if you have e-mailed the
instructor, then you should include the text of your mail and any response in the
If you merely e-mail the instructor, then it's a lot of work for us to find the e-mail that matches your submission.
By providing the explanation with your submission, you make it easy to find, and also easy to keep
with your work if there are questions at the end of the year.
In general, the only excuses or explanations that will be considered
when grading will be those included in your
A few more notes on late homework and
explain.txtif your work is on time, unless there's something you need to tell us about grading it
Again: the course policy is that penalties for late work are at the instructor's discretion. If your overall track record is good, you probably will do fine even if you occasionally slip up without a great excuse. Obviously, grades will suffer for those whose work is late more than occasionally without a good reason, and there may be cases where grades will be reduced if answers are explained in class before your (unexcused) late submission is received.
You will typically need to compile and test your distributed programs on the Halligan Linux servers. The easiest way to access these is by using one of the lab machines in Halligan, or by connecting your own computer to the Halligan network.
Although we do not officially support it in CS 117 (I.e. if you have trouble we don't have TA resources to investigate), it's often possible to use the VPN software provided by the CS department to access the Halligan network from home or work using your personal machine. This is done by installing on your machine a trusted VPN program that routes your network traffic to the Hallian network and that resolves host names using the Halligan DNS.
Use your Web browser to go to the CS Dept Introduction to the CEAS VPN. There are instructions there for downloading the VPN client and installing it on your own computer. You will need your CS Dept login and password (not your Tufts password!) to do the download, and again to authenticate your connection to the Halligan network.
The programs and support frameworks we use for programming assignments will sometimes need to locate configuration files and other information. Makefiles will need to find shared library code. To make all this easier, most of these programs look in Linux environment variable COMP117 to find the area in the filesystem where all this is stored. Therefore, for most of these programs and build scripts to work, it is essential that the environment variable be set.
Since you'll want the variable set all the time, the best way to do this is
add the necessary code to your .cshrc file (if you use the standard
tcsh shell, which runs at system
startup. Use an editor to add the following line to your ~/.cshrc (which lives in your home directory):
setenv COMP117 /comp/117
The command must be specified exactly that way. If you put it in the .cshrc, then it usually won't take effect until you log off or log on. Log off then log on (or if you know how to use "source .cshrc" that's OK too), and use this command to check if it worked:
echo $COMP117 This should respond with: /comp/117
If it does, you're all set. If not, check everything, and if you need help, ask our TA.
If your login is setup to use
bash as your shell, then the details
are a little different. Instead of
~/.cshrc you edit
~/.bashrc and add the line:
Typically, regardless of which shell you use the changes won't take effect until you log off and log in again. To avoid doing that when you first update the script you can also do:
source ~/.cshrc <— (or ~/.bashrc if using bash)
Doing that will rerun your startup script and should set the environment variable.
If our COMP117 sample programs won't build or won't run, failing to set the environment variable is a likely cause.
You will be writing programs that communicate using TCP/IP. Running such programs on a Linux system can interfere with normal operation of the system, especially if such programs are untested and may behave erroneously.
Therefore, you are not to run your distributed programming class projects on homework.cs.tufts.edu, or on any other ordinary system connected to the campus network!
Instead we have two virtual servers that you will use, comp117-01 and comp117-02.
These are not visible from the public Internet, or necessarily from all parts of campus.
You can ssh into them from machines on the Halligan network.
When you do, you use your usual login id and password, and you will share your usual home directories.
Also: be sure that the COMP117 variable is set when you log onto the virtual servers. To check, use the
echo command as described above.
Note that the full hostnames are
comp117-02.eecs.tufts.edu so if the short forms don't work, try those.
The first time you ssh you migth see a warning like (details will likely be different):
The authenticity of host 'comp117-02.eecs.tufts.edu (10.4.2.2)' can't be established. ECDSA key fingerprint is SHA256:EgqOaXgBQ+svbW6RvmnDvj9RU5k+SvpImSgzRRm9TuY. ECDSA key fingerprint is MD5:14:49:b7:1a:59:84:05:12:a8:e3:81:da:de:ad:e2:14. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'comp117-02.eecs.tufts.edu' (ECDSA) to the list of known hosts. Warning: the ECDSA host key for 'comp117-02.eecs.tufts.edu' differs from the key for the IP address '10.4.2.2' Offending key for IP in /h/noah/.ssh/known_hosts:27 Are you sure you want to continue connecting (yes/no)? yes
As shown above, you can typically answer "yes" to the prompts to continue connecting.
If you don't use XWindows, then an alias like this may be helpful. See the man page for ssh:
alias 117-01 ssh email@example.com
You can define that in your ~/.cshrc file, which gets processed at login (log off and login to make sure it's
reprocessed, or do
source ~/.cshrc to kick it without logging off.
If you've done this right, the command
117-01 should take you to COMP117-01.
You'll have to find a way to run multiple command windows on your desktop, so you can switch among
homework.cs.tufts.edu and the virtual servers.
X Windows is widely used with Unix systems to provide a graphical environment for running applications. You do not need to use X Windows; if you prefer, just open two extra shells as described above and use your local windowing environment to switch and arrange them. This is the easiest way, and the only one for which we will provide formal TA support.
X Windows is very useful, but tricky.
A basic tutorial on XWindows is beyond the scope of this info page; if you're using X, we'll assume here that you
know how to get as far as running an
xterm window remotely from homework.cs.tufts.edu to your local machine.
To do that, you'll need to have a local X Server installed and running (instructions are different for Windows native,
Cygwin under windows, and for Macs; Macs come with X built in, but details are different depending on how old your OSX is).
FWIW, Noah uses Cygwin under Windows — if you aren't already comfortable Cygwin you'll have some learning to do to
figure out how to install it, and especially how to set up XWindows. That may be more trouble than it's worth
unless you can find help and, unfortunately, we do not have the resources
to help you with this, but there's a lot on the Web. Be sure to search for help on installing XWindows under Cygwin,
and how to use startxwin to get it going when you log in. If that's too hard, just use the character mode options described
in the sections above.))
On Cygwin's X, and perhaps on others, you'll have to be sure your DISPLAY environment variable is set (in .bashrc it's export DISPLAY=":0.0" and in .cshrc it's probably export DISPLAY :0.0. So, you'll have to figure out, perhaps with help from friends, how to
get as far as setting all that up so you can get xterm windows open on the homework.cs.tufts.edu servers.
You'll eventually want to open three xterms when you work. One will be used in the obvious way to run compiles and other commands on the main homework.cs.tufts.edu servers. Open the first one in the obvious way, and then use the following technique to create the other two, which will be ssh sessions making one more hop into COMP117-01 and COMP117-02 respectively. You may find ways that are more convenient for you, but as an example you could make an alias in your .cshrc file like this:
alias ids01 ssh -fY yourusername@comp117-01 xterm -T "COMP117-01"
When you log in to homework.cs.tufts.edu this will define a command
ids01 that will take you to the comp117-01 virtual machine, log you in, and then open a shell in an XWindow back on your client.
You can do the same for "ids02". If you open a window on each, then you can do things like testing a client program on one, and a server program on the other, all controlled from your one display.
After doing this, and perhaps logging in again to make sure your alias got defined at startup, issue the
command ids01. If all goes well, a new window should pop up, with COMP117-01 in the title bar. Type the
command to make sure you're on the server you think you are.
Since the filesystem is shared, you can do your compiles and edits on homework.cs.tufts.edu, or on the lab machines in Halligan.
(Thanks to Tyler Heck for providing these)
Ensure that either the XQuartz or the X11 application is installed on your Mac. If neither are you can install XQuartz through an Apple supported package at: xquartz.macosforge.org.
To connect to a remote server via
xterm, open a
Terminal window on
the Mac and enter:
$ ssh -fY [username]@[servername] xterm [-T [titlename]]
This will generate a new remote session for the user at the server in a new X window, with the ability to interact with remote GUI programs.
A breakdown of the connection command follows:
ssh -fY: Starts the ssh command with flags f, which passes off user authentication to the xterm window, and Y, which allows X windows forwarding in trusted mode. Hopefully you trust the Tufts servers.
[username]@[servername]: Your specific username and servername will vary depending on who you are and the server you are connecting to.
xterm [-T [titlename]]: creates a new xterm window for your remote session. the
-T [titlename]portion is optional, it gives your xterm window the title specified by
[titlename], which may make distinguishing multiple xterm windows easier.
An example of a full command for the user jdoe01, attempting to connect to the sunfire servers (homework.cs.tufts.edu) with the title "SUNFIRE SERVER":
ssh -fY firstname.lastname@example.org xterm -T "SUNFIRE SERVER"
Editing and compiling of CS 117 code can be done on the
Sunfire servers, but testing must be done on
You will likely want windows open on all three.
First create window on
Sunfire. An example connection from a remote
location may look like (On John Doe's local machine):
jdoe@jdoesbox $ ssh -fY email@example.com xterm -T "SUNFIRE SERVER" (A new xterm window appears, prompting for login credentials)
Then, do the following twice to get windows on the virtual servers:
Within new window: jdoe01@sunfire32 $ ssh -fY jdoe01@comp117-01 xterm -T "COMP117-01" jdoe01@sunfire32 $ ssh -fY jdoe01@comp117-02 xterm -T "COMP117-02"
Each command creates an additional window, one for each of the virtual servers. You will likely have to enter your password twice. Adding an alias for the login commands to your local machine and the sunfire servers can be helpful in creating shorthand versions of the above commands. Make sure to do both of the above commands from Sunfire. If you do the ssh from one of the virtual servers to the other it will likely work, but your Window traffic will be making hops through many machines on its way to and from your Mac. It will be slow, and will add load to the Halligan network (X is a high overhead protocol).
TCP and UDP servers typically listen for connections or incoming data on what are called "ports", each of which is identified by an integer. Unfortunately, allocation of these ports is a problem. Some ports are reserved, e.g. port 80 is the standard for Web servers. If you managed to start a server listening on port 80 on homework.cs.tufts.edu, yours would be the Web server that responded to requests for pages at https://homework.cs.tufts.edu, something we don't want our student software to be doing. In fact, that port is protected, and an attempt to listen on it would fail. What's a bigger problem for us is that all of you will be trying your servers at the same time; so each of you needs to use a different port number. One of the reasons we're making sure on day 1 that we have your CS Dept login right is so we can associate a port number with it. More details on this will be provided with your programming assignments, but be sure you are following the rules we publicize to have your code listening and talking to the right port number. If you have any doubt, check before running your code!