Tufts CS 117 (Fall 2024):
Internet-scale Distributed Systems

Tufts CS 117 Programming Assignment
Remote Procedure Call

Table of Contents

Overview

One of the most important achievements of distributed system research is the ability to provide transparency: the network "disappears", so things that are remote appear to be local. In this assignment, you will implement your own remote procedure call system which will provide transparency for the invocation of simple C++ functions including those that accept as arguments and that return structured values.

You will do this in phases: first, after studying extensive sample code that is provided for you, you will learn to build the necessary proxies and stubs by hand, and you will use those to demonstrate some simple programs transparently invoking functions remotely. Later, you will implement your own rpcgenerate program, which will take as input an Interface Definition Language file (IDL) in the form of a very simple C++ compatible header (.h), and which will output the C++ source code for the proxy and the stub automatically.

This assignment should be interesting for several reasons. First of all, there is something almost magical about invoking an ordinary function and finding that it's actually executed on another computer at the other side of the network. Second, RPC illustrates the power of providing simple abstractions to wrap complex capabilities, and you will find out that what goes on under the covers to achieve even a simple remote procedure call is both complex and subtle. This is true of many distributed systems, including the Web. Finally, this will be an opportunity to do meta-programming, which involves having one program (your RPC generator) create or manipulate the code for other programs. Going back to the time of Alan Turing, computers have blurred in fascinating ways the line between programs and data, and it's when we do meta-programming that the power of this integration becomes clear.

A note on terminology

Many references use the terms like client stub and server stub for what we're calling proxy (client stub) and stub (server stub).

Language Choices: C++, Python, Ruby

The functions you call remotely and the code that invokes them are written C++, but we give you a choice of programming languages to use when building your rpcgenerate proxy and stub generator.

Either way, you will modify the provided Makefile to automatically generate proxies and stubs using your rpcgenerate program. If everything works right, you'll be able to write simple C++ programs that completely transparently and automatically invoke functions on remote computers!

Note: in your generated C++ proxies you may use C/C++ features like sscanf, stringstreams and << if they are helpful for your formatting and parsing, and you can use STL constructs like vector and map. You must must not use things like Boost regexp libraries, libraries that generate or serialize structured data types for you, etc. So, you must not use libraries that generate or parse JSON in your C++ proxies or stubs (you will want to use the built in JSON parsing facilities in Python or Ruby if you use those languages to build your rpcgenerate program; your proxies and stustubs must do their own message formatting and parsing). If you are not sure whether a particular library or feature may be used, please ask.

JSON

You will want to spend just a little time learning about JSON, which is a widely used data format for the Web and elsewhere. For the C++ option, the idl_to_json.cpp program we give you will be your model for learning to use the C++ interfaces to our IDL parser; if you are using Python or Ruby, then you will be using the JSON output of that program. Many of you who have done Web programming will already know JSON; for those who haven't seen it, learning will take you about 10 min. and JSON is one of those file format technologies that every 21st century programmer should know.

Project Success criteria

To succeed in this project you must:

You must adapt the supplied Makefile to successfully build all the samples you want us to test.

Your generator program must be called rpcgenerate and it must open filenames passed on the command line; we may try it with our own test IDL when grading! The proxies and stubs you generate must follow the naming convention: <idlname>.proxy.cpp and <idlname>.stub.cpp, as we did for simplefunction.idl, and these must compile to create the corresponding .o files using your Makefile, regardless of the idl file name. You may find it helpful to uncomment the lines invoking $(RPCGEN) in the Makefile we provide you; this will cause make to call your rpcgenerator on IDL files when you ask to build things like xxxx.proxy.o. Be careful, you might also wind up calling your generator on simplefunction.idl, and thereby writing over the sample proxies and stubs we give you. You might want to make spare copies just in case (or you can always retrieve from git.) If we test with our own IDL, we will link your proxies and stubs with our own test application framework.

Note: although we provide a few IDL files and functions to start you on your testing, your goal is to support any legal IDL interface! Therefore, you should test your code not just with the provided samples, but with others that you create as unit tests for your code. To do that, you will have to write some IDL, and some test functions and clients to play with. (You can create yours by adapting the ones provided.)

Getting the Code

The code for this project is in three distribution directories, but most likely you'll only need to copy one of them; the others are referenced automatically from the make files.

Note that the directory that you ultimately submit with your implementaiton must be named RPC, but you will likely create it by adapting what's in RPC.samples.

Strategy

This project will be much easier if you approach it in an orderly way, making progress every week. We suggest you do the following steps in roughly the following order. This section provides a quick overview of the steps you will take; many are explored in much more detail later in this writeup.

Experiment with our code

Start by looking over and understanding our sample code. A detailed guide is provided below, and you should also look again at the slides from our in-class introduction.

In practice, you will almost surely want to learn to handle each case manually first. So, you will probably want to start by manually implementing some proxies and stubs, first for the very simple cases like arithmetic, and then for an example using structs or arrays.

To help you learn how things work, we provide a fully worked example of an idl file called simplefunction.idl and just for that one we give you simplefunction.proxy.cpp and simplefunction.stub.cpp, as well as simplefunctionclient.cpp. We also give you a Makefile that builds runnable executables simplefunctionclient and simplefunctionserver. In short, we give you a completely worked example of runnable code that creates a network connection, and remotely invokes a trivial function.

If you understand how the code works you will figure out that we build a simplefunctionserver executable but there is in fact no simplefunctionserver.cpp source file, indeed, that will teach you why why all servers are built from the same source file for the main function. You will also see how our code uses the c150streamsocket class, which is the TCP equivalent of the UDP support you used for filecopy.

Manually build some proxies and stubs

The next step is for you to manually build some proxies and stubs. To get you started on that we've provided arithmetic.idl, arithmetic.cpp (which implements the functions), and arithmeticclient.cpp. Floating-point equivalents with names like floatarithmetic.idl are also included. We have not provided proxies and stubs for these, that's your job! Doing that much should be simple, since it's a very small change beyond what we show you how to do for simplefunction.idl.

Then go on to generate some proxies and stubs to handle structures and arrays. You can either adapt some of the sample IDL that we provide, or create your own test cases. You will have to provide your own unit tests similar to the ones we have created for arithmetic and floatarithmetic (I.e. we aren't implementing the functions or the client for you at this point).

Learn about types, our IDL parser and the JSON generator.

This is a good time to learn about the type system and the IDL parser that we provide for you (you will want to carefully study https://www.cs.tufts.edu/comp/117/assts/rpc_typesystem). The parser is a C++ framework that we have written to process IDL files. Remember that for this project, IDL files are just a very restricted subset of C++ .h files, so the parser is basically parsing .h files.

Whether you are planning to use Ruby, Python or C++, you should become familiar with the program we supply called named idl_to_json. That program calls the parser on IDL file(s) you supply, parses them, and prints the results of the parse as a JSON file. This program is valuable for at least three reasons:

Start planning your proxy and stub generator

As you better understand the type system, start thinking about how you will structure an rpcgenerate implementation to automatically generate the proxies and stubs you will need. Don't try to implement everything at once, but try from the start to build a structure that will extend to handle the more complex cases.

Think hard about invariants, pre-conditions and post-conditions:

Once you've got a strategy, and verified on paper that it's likely to work, start implementing the simple cases in rpcgenerate, demonstrating that it can automatically generate the same sorts of proxies and stubs you did manually. As noted above, you'll get partial credit for that.

The key to this project is thinking clearly: if you just start coding rpcgenerate you're likely to make a mess that won't scale; if you have a clear design in mind for how the pieces will work together, and how the message format is related to the function interfaces and your generated code structures, this assignment can be done quite cleanly and without terrible complexity!

Update your Makefiles

Assuming you've built an rpcgenerate you must update your Makefile so that it will correctly build xxx.proxy.cpp and xxx.stub.cpp given xxx.idl. Some of the commented code already in the Makefile may be helpful if uncommented (and if necessary adapted). There's already a rule that builds a .o from any .cpp, so those rules together automatically give you .o from .idl.

With all that in place you can have client applications depend on xxx.proxy.o and servers depend in the Makefile on xxx.stub.o. Any source files that include xxx.idl should also have dependencies on that too, of course. Whenever your xxx.idl changes and you redo your make, all of the following should happen automatically:

This is very typical of how the tooling in a "real" RPC system works. Almost completely automatically, the remoting code needed for your functions is generated, compiled and linked. The client and server source code has almost nothing in it to suggest that remoting is happening, except for the one call we add in the client to make a connection to the server!

Suggested schedule

Intermediate work will not be collected and graded, but this assignment is way too big to tackle in one big push at the end. We strongly suggest that you stick to the following schedule:

If there is anything you don't understand when reviewing the samples, please ask! It will be very difficult for you to complete your work successfully if you don't understand them. You do not have to understand the internals of the IDL parser framework, but for the RPC generator phase, you do have to know how to use it, either by calling the parser from C++, or by using idl_to_json to generate json that will be used by your Python or Ruby program..

Studying the sample code

As noted above, we give you some completely worked samples and of course the parsing code and JSON generator.

So, what's missing in the samples? Several things. Keep in mind the two main goals you will be trying to achieve: first to learn how to write proxies and stubs by hand, and second to write a program that will generate them automatically. Anyway, among the crucial things missing in the samples that you will eventually have to provide are:

Suggested order of study for sample code:

It's suggested that you work in the following order to learn what's been provided to you and plan your project:

Studying the sample remote application

Before reading this section, you might want to review the slides presented in class that explain what's discussed below.

It's inherent in the nature of RPC that multiple separately compiled files have to be brought together in ways that are sometimes subtle. This is because the client and the server need to agree on the interface to each function that is to be available remotely, yet they need different implementations. Specifically, for each IDL file to be prepared for remote use:

For our project, the IDL files are a very restricted subset of C++ .h file syntax, so the IDL files can be #include'd directly into the applications, proxies, and stubs to ensure that all have the correct interface. More information on the IDL files is provided below.

More details on the frameworks and sample code

The following sections briefly discuss the frameworks and samples that are new for this assignment:

c150streamsocket

For the file copy assignment you used a class named c150dgmsocket to provide convenient UDP services. For this assignment, a similar class named c150streamsocket is available and is used by the samples. It implements a simple TCP stream-based client/server. To help you learn it, slightly modified versions of the ping samples are provided: these are called pingstreamclient and pingstreamserver. If you study these, it should be trivial to figure out how to use c150streamsocket in the intended manner. Please make sure right away that the samples work for you, so we can debug problems early.

A couple of details may not be quite obvious about the stream classes: first of all, as is the case with all TCP streams, message boundaries are not preserved. If you do two writes of 50 bytes each it's quite possible that a read of 75 bytes will succeed, leaving 25 bytes to be read the next time. Also: UNIX and Linux sockets provide very flexible facilities for writing servers that deal with multiple clients at a time. The framework provided to you does not try: it implements a model in which you accept a connection, work with it until there is no more incoming data (read eof), then close the connection and accept another. Those of you who are familiar in detail with socket programming will know that doing all this usually requires an extra socket for accepting connections, plus one for each actual client connection. That is hidden from you by the c150streamsocket class: you do the accepts and the reads/writes on the same instance, and there can be at most one connected client at a time. Most of the other details should be familiar from the UDP framework, or else should be obvious from the provided samples.

Note that the sample proxies and stubs provided for you (discussed below) also show how c150streamsocket is used.

Sample remote application: proxies, stubs and IDL

As noted above, a complete pair of client and server, with proxies and stubs is provided to illustrate the remoting of three trivial functions. The applications are simplefunctionclient and simplefunctionserver. The client takes a server name as argument; the server takes no arguments.

The remoted functions are declared in the file named simplefunction.idl. Note that this is syntactically a C++ header file and is included several places in the sample. Provided for you are handwritten proxies and stubs corresponding to this IDL: simplefunction.proxy.cpp and simplefunction.stub.cpp.

Study these! Look at the way they are linked into the client and server applications. Look at how they are handled in the Makefile. Understand how the proxy appears to the client application so that it can make func1, func2 and func3 appear to be local. Look in simplefunction.stub.cpp and be sure you understand why it has routines with names like __func1. Why can't we just name those func1?

By the way, the simple protocol illustrated in these proxies and stubs isn't very robust. You will be modifying or replacing it anyway when you design your own promoting protocol to support arguments and return values. (The provided one just sends the function name as a null-terminated string. The server responds with null-terminated strings "DONE" or "BAD" depending on whether the function name was recognized. I suspect that, for reasons discussed in class, you may want to send explicit lengths rather than using nulls to delimit strings in your protocol.)

If you do a make all you will find two applications built using the proxy and the stub: simplefunctionclient and simplefunctionserver. Try these on our virtual servers, being sure to start the server end first. Note that the usual debug framework is available and the log files should show you quite a bit about what the proxies and stubs are doing.

You must write grading logs

To make it easier for us to grade your project, we are asking that you put useful information into grading log files just as you did with the file copy assignment. The general instructions are exactly the same.

Please record all significant events with data. Specifically, record at the server and the client when you are handling a request to invoke a function or return value, and provide useful information about the messages that you send using TCP.

Hints and Warnings

Here are a few additional hints and warnings regarding this project:

Hints and Warnings For Python and Ruby Users

We will be adding hints and warnings here as we get more experience using Python and Ruby. For now:

Teams

The rules regarding programming teams will be the same as for the file copy assignment. You may work with the same partner as last time if you prefer, but where practical you are encouraged to switch as you will probably learn more by working with different partners on different assignments. Remember: you and your partner must not split the work. You must do all design, coding, and debugging in person, together.

WARNING: this is a very substantial assignment and you will likely find it much easier to complete if you have a partner. As always, we expect that both partners will be substantially involved in all significant design work, but this RPC project can involve a fair amount of detail work, such as preparing the boilerplate text for proxies and stubs once your basic design has been established. It is acceptable to divide up such tasks as long as both students are fully aware of what is being done, and as long as the core architectural decisions, code generation architecture, and protocol design are done jointly. Where students work alone for good reason, e.g. if they are based off-campus, expectations will be adjusted accordingly. Students who choose to work alone in spite of partners being available will not benefit from such accommodation.

Preparing your report

As with the file copy assignment, the report you submit with your project will be a significant part of the grading process, but unless you've done something unusual, the necessary report will probably be much briefer. The template includes a few standard questions, and has a section where you can add any additional details we might need.

Template for report - Download template for report

Submitting your work for grading

Before you submit, we urge you to reread the section on Hints and Warnings. There might be a reminder there of something you need to fix before submitting.

The following are the steps you should take to prepare and submit your work. If you are on a team, one student should follow these steps, and the other should follow the instructions under team submission below. (same as for the file copy assignment)

Preparing your work for submission

Please consider the following checklist before submitting your work:

Submitting your work

One team member from each team should submit the code and report described above:

cd <parent-directory>
provide comp117 rpc RPC rpcreport.html <explain.txt>

RPC must be the name of the directory in which your code is built. rpcreport.html is your report. In most cases explain.txt need not be provided, but this is the place for the submitting team member to give explanations of any personal issues that might need attention (explanations for lateness, illness, etc.) Information related to actually running and grading the submission should be in the report itself.

Instructions for team members

If you are a member of a team, then one of you should submit your complete project as described above. Immediately after that's done, the other should:

provide comp117 rpc teamreport.txt <explain.txt>

teamreport.txt should be a short text file indicating your name, and your team member's name. It should indicate: "I hereby certify that the submission by <partner's name> on <date> and <time> is our joint submission. Both the code and the report included with that submission are our joint work, and should be the basis for my grade for the rpc assignment."

If you there is any additional information, e.g. relating to personal issues (illness etc.) then either partner can provide that in an explain.txt file, as usual. As noted above, information related to actually running and grading the submission should be in the report itself.

Commenting and code quality

Please be sure to follow the CS 117 coding standards. In programs of this complexity it's particularly important that you organize and comment your code so that a grader can figure out how it works. If your code is not pleasant to read, well organized, and reasonably well commented, you will lose credit. Even code that works may not be judged well if we can't easily figure out why it works.

This assignment is a little unusual, because there is C++ code you will write by hand in the obvious way, and C++ that will be written by your RPC generator. The former must conform to the course coding standards. We understand that things like indenting for generated code can be tricky, and we don't expect you to burn lots of time on that. We do suggest, for your benefit and ours, that you make the generated code as easy to read and debug as you conveniently can.

WARNING: in past years, we were disappointed by the code quality of many of the file copy submissions (we are still working on this year's Filecopy grading). Deductions for that assignment were modest, on the assumption that expectations might not have been clear.