Tufts COMP 117 (Spring 2019):
Internet-scale Distributed Systems
The fundamental goal of COMP 117 is to uncover and explore a variety of principles, guidelines, and other useful insights relating to computer systems in general, and to large-scale distributed systems in particular. As we encounter each one, a brief entry will be recorded here. The entries below will be grouped into major themes, but in many cases the most important principles are indeed cross-cutting, in that they impact many facets of system architecture simultaneously.
Some of the items listed below are well understood fundamentals of good system design. Others are controversial and remain the subject of debate in the system design community.
* The term cross-cutting issues is lifted from one of the best textbooks ever written on computer hardware architecture: Computer Architecture: A Quantitative Approach, by John Hennessy and David Patterson. That book has sections throughout discussing cross-cutting issues relating to hardware design. For anyone who wants to do some self-study on hardware architecture, the book is highly recommended!
When in doubt, choose the simplest approach to solving a problem. Keep interfaces simple and straightforward. As the designers of UNIX advised: have each component of the system do one thing well.
Adding features or options to a system tends to create complexity that is out of proportion to the individual function that's added. Such complexity makes systems slower to develop and harder to maintain, and it often results in worse runtime performance as well. Even small additions to a system can have far-reaching ramifications.
Unicode allows the Ö (O with Umlaut) character to be represented as either the single value
U+00D6 or the two combining
characters U+004F (O) and U+0308 ( ̈ ).
Allowing such alternative representations in documents can facilitate integration with certain systems (on some old typewriters, the
two were typed separately), but it greatly complicates the comparison of otherwise similar strings from two documents.
This has been a major headache in many Unicode-based systems, including the Web.
Extra code is needed in the comparison routines, and the pathlengths when doing comparisons may be longer.
Such overhead is typically incurred for all comparisons, as it's usually
hard to tell in advance which strings actually contain combining characters.
Furthermore, the added comparison
code can fill up processor caches, causing other parts of the system to run slower too. Similar complications arise if
combining characters are allowed in filenames or variable names.
History: The KISS principle was first articulated (according to Wikipedia) by the legendary aircraft designer Kelly Johnson. According to the same source, the word "stupid" was not referring to those making the design decisions. Rather, the goal was to build a system that you didn't have to be very smart to understand and maintain. So, the principle is "Keep it simple (and) stupid", not "Keep it simple, stupid".
Tradeoffs: Sometimes a complex solution is needed to achieve the required result. In such situations, trying to make things overly simple can actually result in unnecessary complication or a poor design (as an example, the optimizers in programming language compilers often perform quite tricky transformations that result in code that is difficult to understand and debug, but that runs very fast.) There is a related principle attributed to Albert Einstein that "everything should be as simple as it can be, but not simpler".
Almost every decision in the design of a system involves compromises. Even when applying the most fundamental of good practices, there are usually tradeoffs. A well layered system maybe easier to understand, evolve and maintain, but it might be slower than one in which layers are integrated. Clean abstractions are essential to good system design, but such abstraction often prevents control of specifics that may be important. Be alert for tradeoffs and compromises when applying the principles and guidelines described here.
Example: A filesystem provides many useful facilities for storing, organizing and retrieving information using hard disk, but access through a filesystem prevents the use of certain low-level optimizations that may be important in unusual cases (that's why many high-performance databases directly control the hard disk, rather than working through a filesystem).
The term idempotent describes an operation that produces the same results regardless of how many times it is executed. Setting someone's bank account balance to $100 is an idempotent operation; adding $10 to their account is not. The second deposit will add to the first.
In many cases, idempotent operations can be supported more efficiently, and by simpler protocols, than otherwise similar operations that are not idempotent. For example, if the (idempotent) balance setting operation above fails, it can be retried until the desired result is retrieved. If the (non-idempotent) bank deposit operation is sent through a network and clear confirmation is not received, then any retry must be labeled as a duplicate so the receiving system will know to perform the update exactly once. In order to do this, the receiving system will normally have to keep a record of operations that have already been performed, and all requests must be checked against that record.
Not all operations can be designed to be idempotent, but when practical, idempotent operations are usually simpler to implement and incur less overhead than others. Not always, but sometimes, a choice is available: if we know that the bank account has $90 before the deposit, then either an "add $10" or a "set balance to $100" operation will yield the same final result. The latter is idempotent, but the former is not.
Metcalfe's Law states that for many networks, there is value in each connection that can be made. The number of such connections grows as the square of the number of things that can be connected, so the value of the network grows quickly as nodes or resources are added. Because growth is much faster than linear, a single large network tends to be much more valuable then multiple smaller networks, even though the smaller networks are often easier to build and maintain. The term network effects is often used to refer to the super-linear gain in value from integrating a node or resource into an existing network.
Example: the telephone system became more valuable when international direct dialing became possible. Phones in Europe could be used to directly call the United States, something that had not been possible when the networks were separate.
Example: Metcalfe's law is often used to explain why there "can be only one World Wide Web". A new page added to the Web potentially adds value to the many other pages that can now link it. Similarly, the new page itself may be more valuable as part of a Web in which it too can link to many other things. For example, an airline page can link to weather reports for the cities to which it flies, and the weather site can offer links to flights. Neither could be done if the weather reports and flights were in separate networks.
Implications: Metcalfe's law gives useful perspective on a surprising range of issues relating to the Web, including:
Tradeoffs: building and maintaining large networks may be more difficult than building multiple smaller ones.
History and references: this law is attributed to Ethernet inventor Bob Metcalfe, who supposedly made the point in a talk in 1980. His focus at the time was more on the value of a network of compatibly connected devices than on users (of a phone system) or Web pages, but the essence of the argument applies broadly. George Guilder has also apparently contributed to the formulation of the law, but everone calls it Metcalfe's law. Boston-area entrepreneur Sim Simeonov offers some clarification of the history in a 2006 blog posting.
Having noted Metcalfe's law, we must observe that building very large systems is inherently difficult. Many of the challenges are obvious: it's hard to design algorithms that work at global scale, hard to manage a large system running under control of many separate organizations, hard to keep the software running as updates are incrementally installed in different locations, etc., etc.
One might therefore be tempted to assume that the largest systems are the most complex in their design and interfaces, but often the opposite is true. If the interface required for a system to join a network is simple, then the barriers to connecting are much lower. Thus, among the reasons that the Web's key architectural elements (URIs, HTTP, and HTML) are quite simple to learn and implement is to reduce the barrier to connecting a wide range of systems to the Web. We all know that many large organizations run Web servers, but simple Web servers are also now found in printers, home appliances, and even traffic lights! Building a simple but completely standards-compliant Web server is quite easy to do.
Tradeoffs: as noted above under KISS, some things just can't be properly implemented using simple techniques. One could argue, for example, that public key cryptography is conceptually complex and also difficult to implement properly. Nonetheless, this form of crypto is the best way we've found to implement (relatively) secure communication and identification checking on a global scale.
One of the most famous and influential papers relating to the design of distributed systems is End to End Arguments in System Design by Jerry Salzer, David Reed, and David Clark. The Association for Computing Machinery's (ACMs) SIGOPS group voted this paper to its hall of fame, stating:
"This paper gave system designers, and especially Internet designers, an elegant framework for making sound decisions. A paper that launched a revolution and, ultimately, a religion."
Maybe not a religion, but this paper has helped a lot of designers to think clearly about tradeoffs in organizing distributed systems, and it neatly explains some of the tradeoffs embodied in the architecture of the Internet. This is a surprisingly easy paper to read and understand.
From the paper:
This paper discusses one class of function placement argument that has been used for many years with neither explicit recognition nor much conviction. However, the emergence of the data communication network as a computer system component has sharpened this line of function placement argument by making more apparent the situations in which and reasons why it applies. This paper articulates the argument explicitly, so as to examine its nature and to see how general it really is. The argument appeals to application requirements, and provides a rationale for moving function upward in a layered system, closer to the application that uses the function.
In short, the principle encourages clear thinking as to which functions or guarantees must be provided at the endpoints of a system, and which might be handled by interior layers or communication nodes. It argues that, in many cases, it is most effective to enforce correctness or other essential characteristics at the endpoints. Indeed, there are many cases in which end-to-end checks are the only ones that can guarantee correctness. Furthermore, once end-to-end correctness is guaranteed, any processing internal to the network can be viewed as an optimization; such internal processing or recovery should be done if necessary to ensure that the chances of end-to-end success are reasonably good (e.g. it's generally not worthy doing error recovery on an individual network link if the link is already highly reliable; such per-hop recovery may be very important on a link that encounters frequent errors).
Not only is this a wonderfully visionary paper on how to think about reliability and error recovery in a wide variety of computer data manipulation systems, it forms the key philosophical underpinning for the Internet as we know it. These were the "arguments" used to justify building a reliable system out of unreliable datagrams, and crucially, to justify putting the smarts at the periphery of the Internet vs. in the network itself (traditional AT&T phone systems). Thus, indirectly, this paper justifies building networks in which you can innovate by providing new functions (e.g. the Web, BitTorrent, VOIP, etc.) without necessarily rearchitecting or redeploying the infrastructure that implements the network fabric.
Thus, end-to-end arguments are closely related to the principle of network neutrality, I.e. that providers of network infrastructure should carry packets indiscriminately for any application, and should provide qualities of service based only on what's paid for and the underlying capabilities of the system. Tim Berners-Lee cites network neutrality as key to his ability to create and deploy the Web at a time when few others believed such a system was necessary or even possible (video)
It's often necessary to build protocols in which each message is interpreted in the context of the ones that came before: for example, an initial message might authenticate a customer of a bank so that a subsequent "getBalance" message will implicitly return the balance for the intended user. To implement such protocols, runtimes must maintain state between the processing of successive messages. In the example, the system must maintain for each connection the identify of an authenticated user. If the connection is lost and re-established, then either some way must be found to correlate the new connection with the old, or the state must be re-created.
A different class of so-called stateless protocols avoids such complexity by transmitting in each message the information necessary for its independent processing. One way to adapt the example above would be to include user authentication information with all balance requests and other such operations. Each message can then be processed completely independently.
Stateless protocols are not always practical, but they when they are, they tend to have many benefits. They tend to be easier to implement and test. Although performance of individual requests is not necessarily improved, the flexibility in processing requests often makes it easier to gain performance in other ways, e.g. by dynamically allocating workload across multiple servers (that's harder to do if session state has to be moved.) In the example above, the individual stateless messages are larger and more complex to process, but fewer round trips are involved if only a single query is to be done.
Examples: The Internet and the Web use a mix of stateless and stateful protocols. UDP itself can be implemented in a stateless manner (if IP-level fragmentation is ignored); each UDP packet can be routed and delivered completely independently of others. TCP is a highly stateful protocol: as a TCP session operates, the two endpoints maintain coordinated information about how many packets are "in flight", which packets have been successfully received, etc. If a connection is "lost", then equivalent state must be established for a new session.
Stateless protocols are often used to support stateful protocols and vice versa. IP (which is mostly stateless) is used to support UDP and TCP. TCP, which is stateful, is used to support HTTP (stateless, if we ignore cookies, etc.), which in turn is often used to build stateful applications (Amazon remembers what you've put in your cart between HTTP requests.)
A global name is one that resolves the same way everywhere. Although there are exceptions, most URIs are global in this sense, and this location independence is fundamental to the success of the Web: URI's can be passed from person to person, copied from e-mails or even billboards into HTML links, all with reasonable confidence that the right resource will be located if the link is dereferenced.
Examples: In addition to URIs, widely used examples of global names include telephone numbers (assuming country code is included), postal addresses (if the country is included), and ISBN numbers for books.
Counterexamples: The file: URI scheme is not global in this sense: the same URI resolves differently according to the machine on which it is used. Unix filenames are not global, because they resolve to different files (typically) on each Unix system. Relative URIs are not themselves global, but once resolved to an absolute URI, the result is typically a global name.
Things with names tend to be usable in more powerful ways than things that can be
found only indirectly.
Thus, in this document, it's easy to create references to
this section, but much harder to reference the 9th character of this
paragraph. The section has an
id attribute, which results in it being
identified by a URI, and that URI can be used in hyperlinks, in Semantic Web RDF statements,
or even for reference in textual documents such as books. As Tim Berners-Lee states in his Axioms of Web Architecture
An information object is "on the web" if it has a URI. Objects which have URIs are sometimes known as "First Class Objects" (FCOs). The Web works best when any information object of value and identity is a first class object. If something does not have a URI, you can't refer to it, and the power of the Web is the less for that.
Related to: Metcalfe's Law
When possible, give each thing one name, not more. When a system allows for a given thing to have more than one name, complications result. For example, cache management becomes more difficult: an update made using one name may require invalidation of the cache for entries under all names. Even the possibility of such aliasing tends to complicate code and to slow systems down.
Even in systems where aliases are allowed, their use should generally be minimized. If a particular thing is given only one name, then it's always straightforward to determine that multiple references are indeed to that same thing.
In practice, aliasing can be difficult to avoid. For example, some systems have a history of treating names in a case-independent manner. Partly to facilitate interoperation with such systems, certain components of a URI are compared independent of case.
HTTP://example.com/ always identify the same Web resource, because scheme names are case-independent (though lowercase is preferred; having a preferred form increases the chance that the same URI will in fact be used in all cases.)
Uniform naming systems support uniform manipulation of a broad range of resources. For example: Unix and Linux use uniform, slash-deliminated names (/a/b/c/d) not just for files, but also for devices, pipes, processes and other resources. In many cases, the same operations (e.g. sort) can be applied to all of them, using common code.
URIs provide uniform naming not just for Web documents, but also for mailboxes (mailto scheme), telephone numbers (tel scheme) etc.
Links to any of these can be created using the same
<a href=...> construct in HTML.
Though details vary for accessing such resources, most modern browsers will follow links to any of them, I.e. by retrieving the Web page, preparing mail to the mailbox, or dialing the specified phone number.
Furthermore, the same semantic Web constructs allow statements to be made about all these
classes of resource (for example, one could state that a phone number, a mailbox, or a document is "owned by" Mary Smith).
Related to: Metcalfe's Law
Many names are defined as opaque: all that is significant is that each name can be distinguished from every other name, and that each name has (or might have) a referent.
Other naming systems convey additional information in the name itself. For example, on some operating systems a filename
someprog.exe is presumed or even required to be an executable file,
because the filename extension says it is.
All absolute URIs convey at least the name of a scheme, and many schemes define URIs that provide
additional information, such as hierarchical paths and query parameters.
In some cases, users will document URI assignment policies that convey yet more information, such as that:
is in all cases the weather for the named city.
Providing such "metadata" in URIs or other names can be very handy, but it has drawbacks. Let's say you are building a system about historical figures, and you get the idea to use three letter name suffixes derived from the subjects' initials. You therefore, perhaps without thinking, try to use the extension "jpg" for files relating to historical figure J. Paul Getty. On many operating systems, those files will mistakenly be assumed to be .jpg images. Opaque names can typically be allocated and processed in a uniform way. Furthermore, if one encodes into a name a characteristic that may change (e.g. an author's name for a document that later gets amended by someone else), then it may be necessary to rename an item, and thus to break references to its original name.
Related to: KISS
Reference: W3C TAG Finding Metadata in URIs
One of the most important tradeoffs in designing a data or message format is the choice of text or binary representation. By text representation we mean one in which the bits of the format are interpreted as encodings of characters that are suitable for presentation to users: ASCII or Unicode are the most common choices. Any information to be conveyed is first represented as characters, and then those are encoded as bits for storage or transmission. For example, the number 321 would be represented either as:
Text: 321 (e.g. the 3 ascii characters 0x33 0x32 0x31) Binary: 0x0141 (i.e. the two bytes with the decimal values 1 and 65)
Actually, in the case of binary, the order of the bytes may vary according to whether the machine is
big-endian or little-endian. I.e. they may
be stored in memory or transmitted as either
Indeed, this incompatibility results in one of the important tradeoffs favoring the
use of text: widely used text encodings such as ASCII and Unicode tend to be stored
and transmitted the same way by all systems; binary formats have multiple incompatible representations.
In the early days of computing, when memory was scarce, binary formats were almost always preferred when practical, as they tend to be more compact. To a striking degree, the application-level protocols used on the Internet, including those for e-mail (RFC 2822), Web HTTP (RFC 2616) and Web HTML (HTML) tend to be text-based.
History: This tradeoff has existed since the early days of computing. As noted above, the rapid drop in the price of computer memory and processing power has facilitated the widespread use of text-based formats.
Tradeoffs: As noted above, binary formats tend to be more compact. When all systems involved use the same binary formats (word size, endianness, etc.), then binary can be much faster as well. In such cases, in-memory data structures can often be written and read directly to or from a disk or network.
Text formats have many important advantages, including:
One interesting characteristic of a document, data format, or programming language is its ability or lack of ability to easily handle what SGML and XML call mixed content, I.e. text strings that contain structured markup. The most familiar example for most users is HTML, which allows markup like this:
<p>This is a <em>really</em> important point!</p>
In the example above, the
<em>really</em> markup turns the string into mixed content,
because there is markup within the text.
Mixed content tends to be important for the machine-processable representation of highly structured documents. Systems that lack mixed content are fine for simple text documents (e.g. plain text e-mail), but tend not to represent with fidelity the logical structure of a more complex document. For example, both XML and JSON are useful for storing simple typed key/value pairs, but XML (which supports mixed content) is better able to convey the structure of, an insurance policy document that has many nested sections, specific fields with the customer's name and amount of coverage, etc. JSON is, however, simpler to implement and use for the cases it does handle well.
History: as far as I know, mixed content was first popularized in the (precursors of) SGML. Specifically: Charles Goldfarb developed a series of document formats at IBM in the 1970s; these evolved into SGML, which was widely used outside of IBM, and is indeed the precursor of XML. Tim Berners-Lee chose to make HTML a variant of SGML, in part to leverage the skills, expertise, and expectations of those who already knew SGML, and to avoid "reinventing the wheel"
Tradeoffs: Mixed content is powerful and useful, but it greatly complicates the handling of text strings in a programming language. Almost all popular languages have built in support for ASCII or Unicode strings; the only ones that handle mixed content really well tend to be languages like XQuery and XSLT that were designed with systems like XML in mind.
References: mixed content in the XML Recommendation
When designing a document or data representation format, the content (sometimes referred to as the model) should, to the extent practical, be stored separately from the presentation formatting instructions (the view). A number of benefits result from this approach, including:
Many systems include not just model and presentation information, but also logic for managing interactions with a user and transitions between program states. There are benefits to separating such control logic from both the model and the view. The resulting architecture is known as Model/View/Controller (MVC). For example, in a user interface framework that's used to fill out data forms, the stored data (the model) should be separate from the form layout (the view), and both should be separate from the logic that decides which form to present next after the first is filled in.
Examples: The separation of CSS stylesheets from HTML is the most obvious example on the Web. It should be noted that this separation was incompletely realized for many years. For a long time, the preferred way of styling HTML content was with attributes in the HTML; only over time did CSS emerge and become sufficiently capable to cover most styling needs. Other examples include XSLT for styling XML (though in practice, XSLT is often used to produce HTML, that in turn uses CSS!). To see great examples of Model/View separation with HTML and CSS, check out the CSS Zen Garden.
Many presentation tools such as PowerPoint provide ways of changing the "Design" of a document after it has been created. As noted above, the User Interface frameworks for many languages including Smalltalk embody an MV or MVC layering.
Tradeoffs: Tradeoffs in supporting MV or MVC include
Copyright 2012, 2013, 2014, 2015, 2016, 2017, & 2018 — Noah Mendelsohn & Tufts University