Tufts CS 117 (Fall 2024):
Internet-scale Distributed Systems

Should I Take
CS 117: Internet-scale Distributed Systems?

This page intended to provide information for students who are considering taking CS 117: Internet-scale distributed systems in Fall of 2024. Note that there is detailed information below about prerequisites, which include an ability to program in C++ (or C) at roughly the level we teach in CS 15 and CS 40. We also require the ability to read articles and papers, and to write fluently in English about technical matters.

Course Content

Q. What is CS 117 about?

A. CS 117 explores the most important principles and rules of thumb for large-scale software system design. Many of these are core principles that every system designer should know. Understanding these principles will help you identify the key design points and architectural structures that will be most important to the success of the systems that you build.

The course covers a combination of some of the most famous principles (e.g. the End-to-End Principle, Postel's law, etc.), and also some less well known challenges that have proven important in practice (e.g. Leaky Abstractions). Several programming assignments provide experience with these principles, and with implementation of distributed systems.

The course focuses especially on the World Wide Web which embodies in particularly clean, comprehensible form many of the most important architectural principles we explore. The Web is an extraordinarly large and successful system, but its core constructs are surprisingly simple, powerful and scalable. In addition to the Web itself, we explore principles that enabled the success of the Internet (on which the Web is built), and other important systems such as the Unix/Linux operating system.

Q. Can you give me a more detailed example of a topic discussed in the course?

A. Sure. Idempotence is a somewhat forbidding name for a simple concept. Operations that are idempotent do the same thing regardless of whether you try them just once or many times. Setting your bank balance to $100 is idempotent (do it twice and your balance is still $100); adding $10 to your balance is not. Idempotent operations are typically easier to implement, to optimize, and to reason about. Not everything can conveniently be done in an idempotent way, but when designing a prototocol or interface it's often worth asking which operations can or should be idempotent. Other important course topics include naming, designing for evolution (versioning), and the end-to-end principle.

Q. What are distributed systems?

A. Distributed systems use multiple computers working together to solve a problem or implement an application. We focus mainly on Internet-scale systems, such as the Web, e-mail, etc., but the principles we study apply to many smaller systems and to non-distributed systems as well.

Q. Is this a new course?

A. No. The course was first offered in fall of 2012 and has been taught regularly since then.

Who should take the course?

This course traditionally attracts a roughly equal mix of undergraduates, regular masters students, and part-time evening students (the latter are allowed to do programming projects individually). Most students who took the course reported that they liked it, and success did not correlate highly with graduate/undergraduate status. We do occasionally admit interested PhD candidates.

Q. What are the formal prerequisites?

A. CS 40 or permission of the instructor (please email noah@cs.tufts.edu if you have not taken CS 40, or if you have questions about this prerequisite). The conceptual material should be accessible to anyone who has taken CS 40 and who is interested in principles of system design. The programming projects are challenging, but most students last year felt they were worthwhile; if you enjoyed and did reasonably well with the harder CS 40 assignments, you should do fine. If you still feel uncertain programming at the CS 40 level, you may have trouble.

As discussed in more detail below, students are expected to read technical papers and to do their own technical writing in English, so fluency in English is required. Anyone who grew up speaking English should be well qualified; if your English is not strong, then see the section below on reading and writing assignments, and contact the professor in advance if you have doubts.

Q. Does it matter if I took CS 112: Networking?

A. There's a little bit of overlap, but the emphasis in the two courses is quite different. CS 112 mainly teaches the multi-layer stack of network protocols. CS 117 teaches principles of system design. CS 112 is definitely not required for CS 117; conversely, if you have taken 112, then CS 117 should still be very worthwhile. Last year, perhaps 1/4 of students in CS 117 had already completed 112. Occasionally, alternate versions of assignments will be offered to those who have already taken 112.

Q. Do I really need to know how to program in C++ or C?

A. Yes, at the level we teach in CS 15 and 40; if you did reasonably well in that course you should be just fine. If you did not do your undergrad work at Tufts then it's essential that you know the basics of programming in C or C++. No, that does not mean that programming in Java or C# or some similar language qualifies you; the languages are quite different.

In CS 117 you will be working with binary packet data at the byte level. You must be comfortable with pointers, C-style character arrays, null terminated strings, what data of various types looks like in hexadecimal when memory is inspected, and the basics of declaring and using classes in C++. It's OK if you are a little rusty on these things and if you're not otherwise a C++ expert, but you should have experience with with those basics. You do not need to know about inheritance or interfaces (we teach you a little of that), or many of the other advanced features of C++.

Why not just let students who learned other languages take their chances? As in CS 40, programming assignments are done by pairs of students. Both students in a pair are required to contribute equally to all phases of the project, including design, coding, debugging, testing, etc., and both students get the same grade. If you don't have the necessary experience, then that's not fair to your partner, who can reasonably expect to work with someone who does have the course prerequisites. We don't care where or how you learned C or C++, but it's only fair to your partners that you arrive prepared.

Again, if you are unsure of whether you are qualified, please just email noah@cs.tufts.edu and I will be glad to discuss with you.

Instructor

The course is taught by Professor of the Practice Noah Mendelsohn. Noah has been doing research and development on distributed systems since the 1970s. He helped design the XML stack of document technologies, and for several years he co-chaired (with Tim Berners-Lee) the W3C Technical Architecture Group, which is the senior technical steering committe for the World Wide Web. This course is, to a significant degree, designed to share insights gained from working with the designers of the Web on the most challenging technical problems facing the Web and the Internet today. The lessons should be valuable for anyone designing large software systems.

Assignments and workload

Q. What are the assignments and tests like?

A. Although there are some challenging programming assignments, we also read several classic papers in computer science, selected chapters from textbooks (all available from Safari), and part of the autobiography of Tim Berners-Lee. Most weeks there is a short assignment asking you to provide written answers to questions about the reading. A rough estimate for this work is 2-4 hours/week, including reading and writing.

There are a few (2-4) team programming assigments . For many students, these will be a first opportunity to write distributed systems — doing that is hard, but very exciting! Writing two programs that communicate with each other is an important and rewarding experience. The programming assignments are designed to illustrate the principles and rules of thumb that are the subject of the course.

Students in previous terms have reported that the larger programming assignments are similar in complexity and challenge to the harder CS 40 assignments, but more time is given for each assignment.

Several exams will be given in class, including typically one in the last week, but there is no formal final exam during finals period. However, there is a final paper that's assigned a few weeks before the end of the term, and many studets choose to finish that during reading or early in finals week (other students finish it before reading period begins...it's up to you).

Q. Overall, how much time will the course take?

A. Much less than CS 40, but it's still a significant course. A few times during the term, you will be very busy for a week or two with team programming, and you will also likely spend a few days during reading/finals period on the final paper.

Q. Are there labs?

A. No. Occasionally, the instructor may suggest an optional lab-scale exercise to help you learn some topic, but doing those is up to you.

Q. What programming language is used?

A. As noted above, mostly C++. No, it's not a particularly beautiful language, and it's much less handy than Python/Ruby/Javascript, etc. Nonetheless, most scalable networking systems and most of the browsers you use are written in C, C++ or a variant. Just as CS 40 gives you experience with machine-level programming, programming in CS 117 gives you a sense of how many large scale distributed systems are built today, and of how core networking APIs like sockets are used. Actually, for part of one project we allow you to use Python or Ruby if you prefer, but knowledge of those languages is not required.

Q. Are there reading and writing assignments as well as programming?

A. Yes. Compared to the introductory sequence of CS courses at Tufts, CS 117 involves more reading and writing about concepts and principles. There are several reasons for this. First of all, for some students this will be their first opportunity to read and analyze important research papers that were influential in setting out fundamental concepts of computer science. Furthermore, the Internet and the Web are designed and managed by a worldwide community of programmers and system designers who exchange ideas about how to evolve the system. The reading and writing in CS 117 are designed to help you build the skills to participate effectively in communities like this. As noted above, there are also some very interesting and rewarding programming assignments.

Note: Some students who aren't fluent in English occasionally find some of the reading and writing to be beyond what they can do well. The professor will try to help you succeed, and occasional lapses in grammar or usage won't reduce your grade, but it's essential that you be able to learn and to clearly explain the concepts covered in the readings. If you have any concerns about your ability to do the reading and writing in this course, please check in advance with the instructor, who can show you some of the reading materials and assignments from last year.