Tufts CS 117 (Fall 2024):
Internet-scale Distributed Systems

Tufts CS 117:
Type System and Tools
for the RPC Programming Assignment

Table of Contents

Overview

The RPC assignment uses a type system that is a very restricted subset of C++ types. Interface definitions are C++ type and function declarations, packaged into includable files; these have the extension .idl but otherwise resemble and are usable like C++ .h files.

This document describes:

Types and IDL

This section describes the types that must be supported for the arguments and return values of functions.

Built-in primitive types

The following atomic types are built in, and cannot be re-declared in IDL:

Structured types (structs & arrays)

IDL is used to create struct and array types. Members of structs and arrays may be of the primitive types listed above, or maybe other structures or arrays. In general, forward references to types declared later in the IDL are not permitted.

Detailed syntax is provided below, but informally, structs are declared using a limited version of C++ syntax, for example:

struct Person {
  string firstName;
  string lastName;
  int age;
};

Note that semicolons are required after each member declaration, as well as following the curly brace that completes the entire structure. The isStruct() method on the corresponding TypeDeclaration object created by the IDL parser (described below) will return true, and getStructMembers will return a C++ vector with three members, one for firstName, one for lastName, and one for age.

Array types have fixed bounds and are created as a byproduct of their use in structures or function signatures. For example:

struct takesTwoArrays {
  int x[10];
  int y[20];
};

This declares a structure with two arrays of integers, one with 10 elements and one with 20. When this declaration is encountered, the IDL parser will create not just the type declaration for the structure, but also declarations for two types named __int[10] and __int[20]. (The names are mostly not important, but may show up in debugging output if you ask the type for its name.) The isArray() method on the corresponding TypeDeclaration objects returns true. The getArrayMemberType() method returns a pointer to the TypeDeclaration objection for the member type (int in both examples above), and the getArrayBound() methods returns the number of elements (10 or 20 respectively).

Note that duplicate types are not created; the following struct would result in a single type __int[10] shared by both the members:

struct takesTwoArrays {
  int x[10];
  int y[10];
};

All of this is handled for you by the IDL parser. The idl_to_json program offered for use with Ruby or Python versions of rpcgenerate uses the same IDL parser, so the JSON will contain information for the same type names described above.

Note that arrays of arrays are supported, just as in C and C++:

struct takesTwoArrays {
  int x[10][20];
};

This results in a type named __int[10][20]; note that the under bars (_) are not doubled, but again, you will mainly see these type names in debugging output, since they are maintained for you by the parser. Note that multidimensional arrays are indeed modeled as arrays of arrays. The above struct would implicitly define two array types:

  1. Typename: __int[20] Membertype: int Bound: 20
  2. Typename: __int[10][20] Membertype: __int[20] Bound: 10

There are no pointers or pointer types supported by this IDL or RPC framework; all you need to handle are the built in types, structures, and arrays, though as shown, types can be composed to arbitrary depth: arrays of structures and structures containing array members should be supported.

Functions and IDL

Functions in IDL are declared using traditional C/C++ prototype syntax:

int multiply(int x, int y);           // accept two ints, return an int
void addEmployee(Person newEmployee); // Uses Person struct from above
Person getEmployee(string lastName);  // Return  values can be structs
int max(int numbers[100]);            // Array types can be declared

Functions can return structures, as shown above, but not arrays. All arguments to functions, including arrays, are passed by value, not by reference; if a routine like max were for some reason to update its input array, that update would not be sent back to the client. Function arguments have names as well as types, but as with C and C++, parameter passing is by position. No means is provided for setting parameters by name on a function call. Pointer types are not supported. Neither are C++ references (like int&).

At most one function with a given name is allowed in each IDL file; overloaded functions are not supported.

IDL File Syntax

The IDL file syntax is:


TOKEN = ...see C++ rules for identifier names
NUMBER = 1*DIGITS      ; NUMBER is one or more digits

;
;   Type declarations
;
PREDEFINEDTYPE = "void" / "float" / "int" / "string"
USERDEFINEDTYPE = TOKEN   ; must be declared in the IDL to be accepted   
TYPE = PREDEFINEDTYPE / USERDEFINEDTYPE

;
;   Function declarations
;
ARGUMENTNAME = TOKEN
ARGUMENT = TYPE ARGUMENTNAME *( "[" NUMBER "]" )
RETURNTYPE = TYPE
FUNCTIONNAME = TOKEN
FUNCTIONDECL =  RETURNTYPE FUNCTIONNAME "("  ")" ";" 
             /  RETURNTYPE FUNCTIONNAME "(" ARGUMENT  *("," ARGUMENT ) ")" ";" 

;
;   Structure declarations
;
MEMBERNAME = TOKEN
MEMBER = TYPE MEMBERNAME *( "[" NUMBER "]" ) ";" 
STRUCTNAME = TOKEN
STRUCTDECL = "struct" STRUCTNAME "{" *MEMBER "}" ";"

;
;   Whole IDL file
;

IDLFILE = *(STRUCTDECL / FUNCTIONDECL)

The grammar does not illustrate it, but whitespace may be used freely between separately named non-terminals and/or separately quoted terminals. Except for white space, absolutely nothing else can be in an IDL file. In particular, NO COMMENT SYNTAX IS RECOGNIZED. The only reason for this is that there has not yet been time to upgrade the parser to skip the comments. DO NOT EXPECT THE PARSER TO ACCEPT OTHER CONSTRUCTIONS EVEN IF THEY ARE PERFECTLY LEGAL C OR C++!

Type implementation details

IMPORTANT: For this assignment, you MUST make the following assumptions:

Again, the main implication of all of the above is that you should plan on serializing each field individually, and in a machine-independent form.

Exceptions Raised by Functions

In C++ the possibility that a function will throw an exception is considered part of its signature. See, for example this informal tutorial. You should assume that the functions described by IDL are implicitly declared as noexcept(true), which means that they will not throw exceptions. Note that this is different from the C++ default, which is that unless otherwise declared a function may raise an exception.

An obvious consequence of the fact that functions must not raise exceptions is that implementations of RPC for CS 117 need not handle functions that raise exceptions. The behavior of your system when a function does raise an exception (e.g. if a remote call is made to a function that divides by zero, or that explicitly throws an exception) is undefined; you may do whatever you like in such a case.

The IDL Parser

This section of the instructions will mainly be of interest to those building rpcgenerate using C++. If you are using Python or Ruby, then you can just use the JSON produced by idl_to_json.

Although no RPC generator is provided for you, a moderately sophisticated IDL parser is provided. This parser is used by idl_to_json.cpp, so reading that source file is the best way to learn to call the parser from C++. You give this parser any C++ input stream containing IDL (typically from opening a file in the obvious way — idl_to_json.cpp has what you need), and it constructs an object of class Declarations.

  #include "declarations.h"

  //
  // Open the file
  //
  ifstream idlFile(fileName);        // open 

  if (!idlFile.is_open()) {
        ... Error handling code here...
  }
  // The following line does all the work parsing the file into
  // the variable parseTree
  Declarations parseTree(idlFile);
This object is the root of a parse tree that contains the following:

Each of the declaration objects mentioned supports a getName() method giving the name of the type, function, or argument respectively. As described above, methods are provided for struct and array types that allow you to find the array bounds, struct member types, array member types, etc.

Remember, your job will be to write an RPC generator program that reads in an IDL file and produces automatically proxies and stubs similar to the handwritten ones in the samples (you don't have to make them look the same, you have to make them work!) To do that, you will almost surely want to work through the parse tree for functions in very much the same way that the idl_to_json does. As noted above, you may adapt idl_to_json.cpp to become the main program for your rpcgenerate, but you will lose credit if you leave lots of old misleading comments in the source!

General Hints and Warnings

Here are a few additional hints and warnings regarding this project:

Hints and Warnings For Python and Ruby Users

The following sections discuss the sample Ruby and Python code we provide for accessing IDL type information. If you are using those languages, you will likely want to adapt the samples for use in your RPCgenerator.

A Ruby Example

The following demonstration code is in the RPC.samples directory for you to play with. It shows how a Ruby program can easily invoke idl_to_rpc and use the output. The program is named print_functions.rb.

#!/bin/env ruby
#
#          print signatures of all the functions named in supplied IDL file
#

require 'json'

IDL_TO_JSON_EXECUTABLE = './idl_to_json'

#
#     Make sure invoked properly
#

abort "Usage: #{$PROGRAM_NAME} <idlfilename>" if ARGV.length != 1


#
#     Make sure file exists and is readable
#
filename = ARGV[0]
abort "#{$PROGRAM_NAME}: no file named #{filename}" if not File.file? filename
abort "#{$PROGRAM_NAME}: #{filename} not readable" if not File.readable? filename

#
#     Parse declarations into Ruby hash
#
if !File.executable?(IDL_TO_JSON_EXECUTABLE)
  abort "#{IDL_TO_JSON_EXECUTABLE} does not exist or is not executable..."
end
json_string =`idl_to_json #{filename}`
abort "#{$PROGRAM_NAME}: Failed to parse IDL file #{filename}" if $? != 0
decls =  JSON.parse(json_string)

#
#     Print the function signatures
#
decls["functions"].each do |name, sig|

  # Ruby Array of all args (each is a hash with keys "name" and "type")
  args = sig["arguments"]

  # Make a string of form:  "type1 arg1, type2 arg2" for use in function sig
  argstring = args.map{|a| "#{a["type"]} #{a["name"]}"}.join(', ')

  # print the function signature
  puts "#{sig["return_type"]} #{name}(#{argstring})"

end

Most of this should be straightforward if you know Ruby. One possible exception is the line:

argstring = args.map{|a| "#{a["type"]} #{a["name"]}"}.join(', ')

The map call, as you might expect, maps over the items in list args creating a new list. Each item in that list is of the form "type argname". On that new list a call is made to join(', '), which joins all the strings into a single string, using ", " as the glue. The resulting string might look like:

int width, float distance, string s

A Python Example

Here is a version in Python, named print_functions.py. The logic in this is intentionally as similar as possible to the Ruby, so you can compare them, and also use one as a guide to learning the other.

#!/bin/env python3
#
#          print signatures of all the functions named in supplied IDL file
#

import subprocess
import json
import sys
import os

IDL_TO_JSON_EXECUTABLE = './idl_to_json'

try:
    #
    #     Make sure invoked properly
    #
    assert len(sys.argv) == 2, "Wrong number of arguments"

    #
    #     Make sure IDL file exists and is readable
    #
    filename = sys.argv[1]
    assert os.path.isfile(filename), f"Path {filename} does not designate a file"
    assert os.access(filename, os.R_OK), f"File {filename} is not readable" 

    #
    #     Make sure idl_to_json exists and is executable
    #
    assert os.path.isfile(IDL_TO_JSON_EXECUTABLE), f"Path {IDL_TO_JSON_EXECUTABLE} does not designate a file...run \"make\" to create it" 
    assert os.access(IDL_TO_JSON_EXECUTABLE, os.X_OK), f"File {IDL_TO_JSON_EXECUTABLE} exists but is not executable"

    #
    #     Parse declarations into a Python dictionary
    #
    decls = json.loads(subprocess.check_output([IDL_TO_JSON_EXECUTABLE, filename]))

    #
    # Loop printing each function signature
    #
    for  name, sig in decls["functions"].items():

        # Python List of all args (each is a dictionary with keys "name" and "type")
        args = sig["arguments"]

        # Make a string of form:  "type1 arg1, type2 arg2" for use in function sig
        argstring = ', '.join([a["type"] + ' ' + a["name"] for a in args])

        # print the function signature
        print(f"{sig['return_type']} {name}({argstring})")

except Exception as e:
    print(str(e), file=sys.stderr)
    print(f"Usage: {sys.argv[0]} <idlfilename>", file=sys.stderr)

If you are new to Python, a few of the constructions here may seem a bit tricky:

decls = json.loads(subprocess.check_output(["idl_to_json", filename]))

subprocess.check_output() runs the supplied command and returns as a string the standard output. So, here we are running the idl_to_json program on the named file. json.loads() interprets the resulting string as json, creating the dictionary we need.

argstring = ', '.join([a["type"] + ' ' + a["name"] for a in args])

Start with the inner [a["type"] + ' ' + a["name"] for a in args]. This is a great example of what Python calls a list comprehension. What it does is to construct a new list by looping through (or mapping if you prefer) the items in list args. For each such item a in args, it computes the string a["type"] + ' ' + a["name"], which is the argument type and the argument name separated by a space.

Then on the resulting list a call is made to ', '.join(), which joins all the items in the list using the string ', ' as the glue between the items. The resulting string might look like:

int width, float distance, string s