Tufts COMP 117 (Spring 2019):
Internet-scale Distributed Systems

More Course Information & Resources

Reading

Recommended Books

There are no books that you are required to buy, but several you might want to consider. Michael Kerrisk's book The Linux Programming Interface: A Linux and UNIX System Programming Handbook is a great reference on Unix/Linux programming, and its introductions to topics like TCP/IP and Sockets are among the best I've seen. A few chapters will be assigned as reading. It is available on Safari books, and you can read it there for free if you prefer.

We'll be doing some advanced C++ Programming. You should be able to learn what you need from online sources, but some students may prefer a good paper reference. The most detailed, if not necessarily the easiest to navigate, is The C++ Programming Language: Special Edition by Bjarne Stroustrup, who is the inventor of C++. It's available for free to Tufts students on Safari too.

Similarly, though it will not be assigned reading, the book Unix and Linux System Administration Handbook, 4th Edition by Evi Nemeth, Garth Snyder, Trent Hein, and Ben Whaley is a great source of information about Unix/Linux command level programming. Its stated goal is to teach you to administer a Unix system, but it's got tons of useful information for anybody trying to do serious work with Unix or Linux. To contrast their strengths: Kerrisk will tell you how to use program APIs in your C/C++ code to do things like create files, set permissions, etc; Nemeth et. al. are more likely to tell you how to do the same things from the command prompt.

To re-emphasize: you should not have to buy any of these books to do well in the course. Then again, if you're like me and like to get good books that teach you things you didn't even know you should learn about, all of the above are great options. You'll probably be referring to several using Safari; Kerrisk is the only one from which I expect to assign reading.

Using Safari books online

Safari books can be accessed from any modern Web browser with Flash support, and access is free for Tufts students. Safari has hundreds of excellent books on computer science, including several to which we will refer in COMP 117.

To get free access to Safari, whether from home or on campus, you can go to the Access Page that's been set up by the Tufts libraries. If you're off campus, you'll likely have to log in with your UTLN and Tufts Tools passwords; from on campus you will likely be taken either directly into Safari, or to a page where you are given a button you press to activate your free academic access.

Safari provides URIs for each book (see below), or you can just search for the author's name in the Safari search box. Be sure that either: 1) you are on the campus network -or- 2) you have used the VPN to tunnel into the compus network -or- 3) that you have logged onto Safari using the Tufts Access Page before attempting to follow these links. .

Kerriskhttp://proquest.safaribooksonline.com.ezproxy.library.tufts.edu/book/programming/linux/9781593272203
Stroustruphttp://proquest.safaribooksonline.com.ezproxy.library.tufts.edu/book/programming/cplusplus/0201700735
Nemeth, Et. al.http://proquest.safaribooksonline.com.ezproxy.library.tufts.edu/book/operating-systems-and-server-administration/linux/9780132117371

WARNING/HINT: In past years, some of us had intermittent trouble getting into Safari. Lately it seems OK. I think the tricks for reliable access are:

If you discover any more about beating the Safari bugs, or have more problems to report that are different from those already noted above, please e-mail me (Noah).

TAG Recommendations and Findings

The W3C Technical Architecture Group is the senior technical steering committee for the Web. One of its responsibilities is to educate the community on principles of Web Architecture, and also to explore or resolve important architectural problems.

The TAG has written a W3C "Recommendation" titled Architecture of the World Wide Web, Volume One (AWWW — by the way, there is so far no Volume Two). AWWW is probably the best available exposition of the Web's architecture and of its correct use. Sections will be assigned from time to time throughout the term, but you are encouraged to go beyond the assignments too.

AWWW is a formal W3C Recommendation, which means that it was subject to extensive community review before being finalized, and specifically that the entire member ship of the W3C agreed to its publication. The TAG also writes smaller "Findings"; these represent the considered opinion of the TAG on specific issues, and in some cases the findings provide either more detail or even corrections to sections presented in AWWW. A list of TAG findings is available, and we will study several findings later in the term.

Research Papers

Several academic research papers will be assigned during the term. All will be provided online, and linked from the pertinent assignment. There is typically a charge for publications of the Association for Computing Machinery (ACM), but Tufts has a paid up license for unlimited student access. To access the ACM Digital Libary from inside tufts.edu, go to http://www.acm.org/dl; from outside, use https://login.ezproxy.library.tufts.edu/login?auth=test&url=http://www.acm.org/dl/. Unfortunately (and ironically given the principles we study in COMP 117), links to ACM publications that you find using search engines like Google may not work directly with the ezproxy login; you may have to log in through the proxy, then use the ACM digital library search facilities to find the same paper. Once you do, access (typically to a .pdf) should be free. In any case, the ACM digital library is probably the the most important resource for scholarly publications in computing. The free access that you have is a terrific asset!

Standards documents and other Online Resources

As we will discuss in detail, Internet-scale systems like the Web typically interoperate not by requiring identical code at all nodes, but by requiring agreement on data formats and protocols among multiple implementations that are built to meet different needs. For example: the default Web server in Windows (IIS) is a different code base than than the Apache server that is preferred on many other systems, and both of those are different from the embedded servers found in some small devices. Nonetheless, all servers conform to (more or less) the same HTTP and other standards, and so all should work well with conforming Web clients.

Usually, these standards are formalized under the auspices of non-profit organizations. The two most important such organizations for the Web and Internet are the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF). The W3C produces what the call Recommendations; the IETF produces so-called Requests for Comments (RFCs) which, unlike what you'd guess from the designation, often are the normative documentation for technologies like TCP/IP and HTTP. We will study a number of RFCs and Recommendations; all are available on the Web.

Assignments

A list of assignments is maintained on the assignments page. In most cases, your submissions will be made using, provide with the class code comp117 (note the lowercase). So, a typical submission will be made like this:

provide comp117 <assignmentname> file [more files...]

In many cases, you will be asked to respond to questions provided in an HTML form. Two links will be provided, one to allow you to browse the questions in advance, and one to download the HTML for you to edit. You will modify the HTML to include your answers, and use provide provide to submit it. Please do not rename the HTML file before submitting it.

Late Homework

Late Homework Overview

As explained in the introductory lecture on grading policy all penalties for late homework are at the discretion of the instructor. There is no fixed penalty for each day that an assignment is late. You may lose some credit or you may not, depending on how late your work is, whether you have an acceptable reason for being late, whether answers have been discussed in class prior to your submission, etc. At the end of the year, your grade may be reduced if you have a pattern of submitting work late frequently.

If you have questions about submitting late work, or want to get an excuse approved in advance, then e-mail the instructor. In other cases e-mail just wastes your time and ours. In such cases skip the e-mail, follow the procedure below and trust us to do something reasonable.

Submission procedure for late homework

If your homework is late, then along with the other files you provide, you must provide an additional one called explain.txt in which you give the reason for the late submission. This will make it easy for us to find the explanation at the time grading is being done. You must do this whether or not you have approval in advance by e-mail. Indeed, if you have e-mailed the instructor, then you should include the text of your mail and any response in the explain.txt file. If you merely e-mail the instructor, then it's a lot of work for us to find the e-mail that matches your submission. By providing the explanation with your submission, you make it easy to find, and also easy to keep with your work if there are questions at the end of the year. In general, the only excuses or explanations that will be considered when grading will be those included in your explain.txt file.

A few more notes on late homework and explain.txt.

Again: the course policy is that penalties for late work are at the instructor's discretion. If your overall track record is good, you probably will do fine even if you occasionally slip up without a great excuse. Obviously, grades will suffer for those whose work is late more than occasionally without a good reason, and there may be cases where grades will be reduced if answers are explained in class before your (unexcused) late submission is received.

Access From Off Campus

You will typically need to compile and test your distributed programs on the Halligan Linux servers. The easiest way to access these is by using one of the lab machines in Halligan, or by connecting your own computer to the Halligan network.

Although we do not officially support it in COMP 117 (I.e. if you have trouble we don't have TA resources to investigate), it's often possible to use the VPN software provided by the CS department to access the Halligan network from home or work using your personal machine. This is done by installing on your machine a trusted VPN program that routes your network traffic to the Hallian network and that resolves host names using the Halligan DNS.

Getting the VPN software

Use your Web browser to go to the CS Dept Introduction to the CEAS VPN. There are instructions there for downloading the VPN client and installing it on your own computer. You will need your CS Dept login and password (not your Tufts password!) to do the download, and again to authenticate your connection to the Halligan network.

What you can do with the VPN

Programming environment

Environment Variable

The programs and support frameworks we use for programming assignments will sometimes need to locate configuration files and other information. Makefiles will need to find shared library code. To make all this easier, most of these programs look in Linux environment variable COMP117 to find the area in the filesystem where all this is stored. Therefore, for most of these programs and build scripts to work, it is essential that the environment variable be set.

Since you'll want the variable set all the time, the best way to do this is add the necessary code to your .cshrc file (if you use the standard tcsh shell, which runs at system startup. Use an editor to add the following line to your ~/.cshrc (which lives in your home directory):

setenv COMP117 /comp/117

The command must be specified exactly that way. If you put it in the .cshrc, then it usually won't take effect until you log off or log on. Log off then log on (or if you know how to use "source .cshrc" that's OK too), and use this command to check if it worked:

echo $COMP117
This should respond with: /comp/117

If it does, you're all set. If not, check everything, and if you need help, ask our TA.

If your login is setup to use bash as your shell, then the details are a little different. Instead of ~/.cshrc you edit ~/.bashrc and add the line:

export COMP117=/comp/117

Typically, regardless of which shell you use the changes won't take effect until you log off and log in again. To avoid doing that when you first update the script you can also do:

source ~/.cshrc   <— (or ~/.bashrc if using bash)

Doing that will rerun your startup script and should set the environment variable.

If our COMP117 sample programs won't build or won't run, failing to set the environment variable is a likely cause.

Virtual servers

You will be writing programs that communicate using TCP/IP. Running such programs on a Linux system can interfere with normal operation of the system, especially if such programs are untested and may behave erroneously. Therefore, you are not to run your distributed programming class projects on linux.eecs.tufts.edu, or on any other ordinary system connected to the campus network! Instead we have two virtual servers that you will use, comp117-01 and comp117-02. These are not visible from the public Internet, or necessarily from all parts of campus. You can ssh into them from machines on the Halligan network. When you do, you use your usual login id and password, and you will share your usual home directories. Also: be sure that the COMP117 variable is set when you log onto the virtual servers. To check, use the echo command as described above. Note that the full hostnames are comp117-01.eecs.tufts.edu and comp117-02.eecs.tufts.edu so if the short forms don't work, try those.

The first time you ssh you migth see a warning like (details will likely be different):

The authenticity of host 'comp117-02.eecs.tufts.edu (10.4.2.2)' can't be established.
ECDSA key fingerprint is SHA256:EgqOaXgBQ+svbW6RvmnDvj9RU5k+SvpImSgzRRm9TuY.
ECDSA key fingerprint is MD5:14:49:b7:1a:59:84:05:12:a8:e3:81:da:de:ad:e2:14.
Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'comp117-02.eecs.tufts.edu' (ECDSA) to the list of known hosts.
Warning: the ECDSA host key for 'comp117-02.eecs.tufts.edu' differs from the key for the IP address '10.4.2.2'
Offending key for IP in /h/noah/.ssh/known_hosts:27
Are you sure you want to continue connecting (yes/no)? yes

As shown above, you can typically answer "yes" to the prompts to continue connecting.

If you don't use XWindows, then an alias like this may be helpful. See the man page for ssh:

alias 117-01 ssh yourusername@comp117-01.eecs.tufts.edu

You can define that in your ~/.cshrc file, which gets processed at login (log off and login to make sure it's reprocessed, or do source ~/.cshrc to kick it without logging off. If you've done this right, the command 117-01 should take you to COMP117-01. You'll have to find a way to run multiple command windows on your desktop, so you can switch among linux.eecs.tufts.edu and the virtual servers.

A few hints on logging in remotely to virtual servers using X Windows

X Windows is widely used with Unix systems to provide a graphical environment for running applications. You do not need to use X Windows; if you prefer, just open two extra shells as described above and use your local windowing environment to switch and arrange them. This is the easiest way, and the only one for which we will provide formal TA support.

X Windows is very useful, but tricky. A basic tutorial on XWindows is beyond the scope of this info page; if you're using X, we'll assume here that you know how to get as far as running an xterm window remotely from linux.eecs.tufts.edu to your local machine. To do that, you'll need to have a local X Server installed and running (instructions are different for Windows native, Cygwin under windows, and for Macs; Macs come with X built in, but details are different depending on how old your OSX is). FWIW, Noah uses Cygwin under Windows — if you aren't already comfortable Cygwin you'll have some learning to do to figure out how to install it, and especially how to set up XWindows. That may be more trouble than it's worth unless you can find help and, unfortunately, we do not have the resources to help you with this, but there's a lot on the Web. Be sure to search for help on installing XWindows under Cygwin, and how to use startxwin to get it going when you log in. If that's too hard, just use the character mode options described in the sections above.)) On Cygwin's X, and perhaps on others, you'll have to be sure your DISPLAY environment variable is set (in .bashrc it's export DISPLAY=":0.0" and in .cshrc it's probably export DISPLAY :0.0. So, you'll have to figure out, perhaps with help from friends, how to get as far as setting all that up so you can get xterm windows open on the linux.eecs.tufts.edu servers.

You'll eventually want to open three xterms when you work. One will be used in the obvious way to run compiles and other commands on the main linux.eecs.tufts.edu servers. Open the first one in the obvious way, and then use the following technique to create the other two, which will be ssh sessions making one more hop into COMP117-01 and COMP117-02 respectively. You may find ways that are more convenient for you, but as an example you could make an alias in your .cshrc file like this:

alias ids01 ssh -fY yourusername@comp117-01 xterm -T "COMP117-01"

When you log in to linux.eecs.tufts.edu this will define a command ids01 that will take you to the comp117-01 virtual machine, log you in, and then open a shell in an XWindow back on your client. You can do the same for "ids02". If you open a window on each, then you can do things like testing a client program on one, and a server program on the other, all controlled from your one display. After doing this, and perhaps logging in again to make sure your alias got defined at startup, issue the command ids01. If all goes well, a new window should pop up, with COMP117-01 in the title bar. Type the hostname command to make sure you're on the server you think you are. Since the filesystem is shared, you can do your compiles and edits on linux.eecs.tufts.edu, or on the lab machines in Halligan.

X-Window hints for Mac users

(Thanks to Tyler Heck for providing these)

Ensure that either the XQuartz or the X11 application is installed on your Mac. If neither are you can install XQuartz through an Apple supported package at: xquartz.macosforge.org.

To connect to a remote server via xterm, open a Terminal window on the Mac and enter:

$ ssh -fY [username]@[servername] xterm [-T [titlename]]

This will generate a new remote session for the user at the server in a new X window, with the ability to interact with remote GUI programs.

A breakdown of the connection command follows:

An example of a full command for the user jdoe01, attempting to connect to the sunfire servers (linux.cs.tufts.edu) with the title "SUNFIRE SERVER":

ssh -fY jdoe01@linux.cs.tufts.edu xterm -T "SUNFIRE SERVER"

Editing and compiling of COMP 117 code can be done on the usual Sunfire servers, but testing must be done on virtual servers COMP117-01 and COMP117-02. You will likely want windows open on all three. First create window on Sunfire. An example connection from a remote location may look like (On John Doe's local machine):

jdoe@jdoesbox $ ssh -fY jdoe01@linux.cs.tufts.edu xterm -T "SUNFIRE SERVER"
(A new xterm window appears, prompting for login credentials)

Then, do the following twice to get windows on the virtual servers:

Within new window:
jdoe01@sunfire32 $ ssh -fY jdoe01@comp117-01 xterm -T "COMP117-01"
jdoe01@sunfire32 $ ssh -fY jdoe01@comp117-02 xterm -T "COMP117-02"

Each command creates an additional window, one for each of the virtual servers. You will likely have to enter your password twice. Adding an alias for the login commands to your local machine and the sunfire servers can be helpful in creating shorthand versions of the above commands. Make sure to do both of the above commands from Sunfire. If you do the ssh from one of the virtual servers to the other it will likely work, but your Window traffic will be making hops through many machines on its way to and from your Mac. It will be slow, and will add load to the Halligan network (X is a high overhead protocol).

TCP/UDP port assignments

TCP and UDP servers typically listen for connections or incoming data on what are called "ports", each of which is identified by an integer. Unfortunately, allocation of these ports is a problem. Some ports are reserved, e.g. port 80 is the standard for Web servers. If you managed to start a server listening on port 80 on linux.eecs.tufts.edu, yours would be the Web server that responded to requests for pages at http://linux.eecs.tufts.edu, something we don't want our student software to be doing. In fact, that port is protected, and an attempt to listen on it would fail. What's a bigger problem for us is that all of you will be trying your servers at the same time; so each of you needs to use a different port number. One of the reasons we're making sure on day 1 that we have your CS Dept login right is so we can associate a port number with it. More details on this will be provided with your programming assignments, but be sure you are following the rules we publicize to have your code listening and talking to the right port number. If you have any doubt, check before running your code!