Tufts CS 117 Assignment -

Overview
Types and IDL

Built in primitive types
Structured types (structs & arrays)

Functions and IDL
IDL File Syntax
Type implementation details
Exceptions Raised by Functions
The IDL Parser
General Hints and Warnings
Hints and Warnings For Python and Ruby Users
- A Ruby Example
- A Python Example

Overview

The RPC assignment uses a type system that is a very restricted subset of C++ types. Interface definitions are C++ type and function declarations, packaged into includable files; these have the extension .idl but otherwise resemble and are usable like C++ .h files.

This document describes:

The type system itself
.idl files
The tools provided for parsing .idl files. Specifically:
- A C++ parsing framework that creates a tree of C++ objects corresponding to the types and functions in an idl file
- A C++ program called idl_to_json which serves a a demonstration sample for that framework, but which is also very useful writing the type information into a JSON file. (Those who build their RPC generators in Python or Ruby are likely to use only the JSON, and may have no need to learn details of the C++ framework.
Some related code that's include in the RPC.samples distribution that's provided for the RPC assignment

Types and IDL

This section describes the types that must be supported for the arguments and return values of functions.

Built-in primitive types

The following atomic types are built in, and cannot be re-declared in IDL:

void
float
int
string

Structured types (structs & arrays)

IDL is used to create struct and array types. Members of structs and arrays may be of the primitive types listed above, or maybe other structures or arrays. In general, forward references to types declared later in the IDL are not permitted.

Detailed syntax is provided below, but informally, structs are declared using a limited version of C++ syntax, for example:

struct Person {
  string firstName;
  string lastName;
  int age;
};

Note that semicolons are required after each member declaration, as well as following the curly brace that completes the entire structure. The isStruct() method on the corresponding TypeDeclaration object created by the IDL parser (described below) will return true, and getStructMembers will return a C++ vector with three members, one for firstName, one for lastName, and one for age.

Array types have fixed bounds and are created as a byproduct of their use in structures or function signatures. For example:

struct takesTwoArrays {
  int x[10];
  int y[20];
};

This declares a structure with two arrays of integers, one with 10 elements and one with 20. When this declaration is encountered, the IDL parser will create not just the type declaration for the structure, but also declarations for two types named __int[10] and __int[20]. (The names are mostly not important, but may show up in debugging output if you ask the type for its name.) The isArray() method on the corresponding TypeDeclaration objects returns true. The getArrayMemberType() method returns a pointer to the TypeDeclaration objection for the member type (int in both examples above), and the getArrayBound() methods returns the number of elements (10 or 20 respectively).

Note that duplicate types are not created; the following struct would result in a single type __int[10] shared by both the members:

struct takesTwoArrays {
  int x[10];
  int y[10];
};

All of this is handled for you by the IDL parser. The idl_to_json program offered for use with Ruby or Python versions of rpcgenerate uses the same IDL parser, so the JSON will contain information for the same type names described above.

Note that arrays of arrays are supported, just as in C and C++:

struct takesTwoArrays {
  int x[10][20];
};

This results in a type named __int[10][20]; note that the under bars (_) are not doubled, but again, you will mainly see these type names in debugging output, since they are maintained for you by the parser. Note that multidimensional arrays are indeed modeled as arrays of arrays. The above struct would implicitly define two array types:

Typename: __int[20] Membertype: int Bound: 20
Typename: __int[10][20] Membertype: __int[20] Bound: 10

There are no pointers or pointer types supported by this IDL or RPC framework; all you need to handle are the built in types, structures, and arrays, though as shown, types can be composed to arbitrary depth: arrays of structures and structures containing array members should be supported.

Functions and IDL

Functions in IDL are declared using traditional C/C++ prototype syntax:

int multiply(int x, int y);           // accept two ints, return an int
void addEmployee(Person newEmployee); // Uses Person struct from above
Person getEmployee(string lastName);  // Return  values can be structs
int max(int numbers[100]);            // Array types can be declared

Functions can return structures, as shown above, but not arrays. All arguments to functions, including arrays, are passed by value, not by reference; if a routine like max were for some reason to update its input array, that update would not be sent back to the client. Function arguments have names as well as types, but as with C and C++, parameter passing is by position. No means is provided for setting parameters by name on a function call. Pointer types are not supported. Neither are C++ references (like int&).

At most one function with a given name is allowed in each IDL file; overloaded functions are not supported.

IDL File Syntax

The IDL file syntax is:


TOKEN = ...see C++ rules for identifier names
NUMBER = 1*DIGITS      ; NUMBER is one or more digits

;
;   Type declarations
;
PREDEFINEDTYPE = "void" / "float" / "int" / "string"
USERDEFINEDTYPE = TOKEN   ; must be declared in the IDL to be accepted   
TYPE = PREDEFINEDTYPE / USERDEFINEDTYPE

;
;   Function declarations
;
ARGUMENTNAME = TOKEN
ARGUMENT = TYPE ARGUMENTNAME *( "[" NUMBER "]" )
RETURNTYPE = TYPE
FUNCTIONNAME = TOKEN
FUNCTIONDECL =  RETURNTYPE FUNCTIONNAME "("  ")" ";" 
             /  RETURNTYPE FUNCTIONNAME "(" ARGUMENT  *("," ARGUMENT ) ")" ";" 

;
;   Structure declarations
;
MEMBERNAME = TOKEN
MEMBER = TYPE MEMBERNAME *( "[" NUMBER "]" ) ";" 
STRUCTNAME = TOKEN
STRUCTDECL = "struct" STRUCTNAME "{" *MEMBER "}" ";"

;
;   Whole IDL file
;

IDLFILE = *(STRUCTDECL / FUNCTIONDECL)

The grammar does not illustrate it, but whitespace may be used freely between separately named non-terminals and/or separately quoted terminals. Except for white space, absolutely nothing else can be in an IDL file. In particular, NO COMMENT SYNTAX IS RECOGNIZED. The only reason for this is that there has not yet been time to upgrade the parser to skip the comments. DO NOT EXPECT THE PARSER TO ACCEPT OTHER CONSTRUCTIONS EVEN IF THEY ARE PERFECTLY LEGAL C OR C++!

Type implementation details

IMPORTANT: For this assignment, you MUST make the following assumptions:

String data does not live entirely within the space allocated for a string variable, array element or structure member. To implement a remote call, you must explicitly serialize the actual contents of the string, and preserve its length.
You can assume that the int type represents a signed 32-bit integer, but you don't know whether the byte order will be the same at both ends of the connection. This means that you may either convert values to strings for transmission (probably the easiest way) or handle the byte order explicitly. You can assume we will use compilers that implement ints as 32 bit, regardless of whether we are running on a 32-bit or 64-bit architecture.
You may assume that floats are the usual IEEE 32 bit floating point used by C++, though again, byte order is not specified. As with integers, it's safer to convert to a character string for transmission (you may rely on any of the standard C/C++ formatting libraries... we won't judge you on the finer points of conversion accuracy). If necessary, you may just send the 32-bit binary number, but please indicate in your report which you chose to do.
As noted above, structure packing may in principle be different at two ends of the connection. Even if the two compilers use the identical representation for strings in the example above, there is no guarantee that someString would wind up at the same offset in the structure at the sending and receiving end. Indeed, you can tell that some interesting alignment is going on in the example above, because the int is 4 bytes, and the string (you can check) is 8, yet the struct as a whole is 16 bytes, not 12. Most likely, the compiler is "wasting" 4 bytes after the int, as most compilers want structs to begin on an 8 byte boundary.

Again, the main implication of all of the above is that you should plan on serializing each field individually, and in a machine-independent form.

Exceptions Raised by Functions

In C++ the possibility that a function will throw an exception is considered part of its signature. See, for example this informal tutorial. You should assume that the functions described by IDL are implicitly declared as noexcept(true), which means that they will not throw exceptions. Note that this is different from the C++ default, which is that unless otherwise declared a function may raise an exception.

An obvious consequence of the fact that functions must not raise exceptions is that implementations of RPC for CS 117 need not handle functions that raise exceptions. The behavior of your system when a function does raise an exception (e.g. if a remote call is made to a function that divides by zero, or that explicitly throws an exception) is undefined; you may do whatever you like in such a case.

The IDL Parser

This section of the instructions will mainly be of interest to those building rpcgenerate using C++. If you are using Python or Ruby, then you can just use the JSON produced by idl_to_json.

Although no RPC generator is provided for you, a moderately sophisticated IDL parser is provided. This parser is used by idl_to_json.cpp, so reading that source file is the best way to learn to call the parser from C++. You give this parser any C++ input stream containing IDL (typically from opening a file in the obvious way — idl_to_json.cpp has what you need), and it constructs an object of class Declarations.

  #include "declarations.h"

  //
  // Open the file
  //
  ifstream idlFile(fileName);        // open 

  if (!idlFile.is_open()) {
        ... Error handling code here...
  }
  // The following line does all the work parsing the file into
  // the variable parseTree
  Declarations parseTree(idlFile);

This object is the root of a parse tree that contains the following:

A public member types which is a C++ map from type name to a TypeDeclaration object. You can use the map in the standard C++ way to look up a type by name, or if necessary you can iterate through all types. The iteration is demonstrated in the idl_to_json.cpp sample program. You can also index types using the ["typename"] syntax, because the [] operator is overloaded.
typeExists("typeName") and functionExists("functionName") methods that return true iff the named type/function has been declared. It's a good idea to check this before retrieving a type or function from the maps, or you'll have to deal with C++ conventions for missing map entries.
A public member functions which is a C++ map from type name to a FunctionDeclaration object. You can also index functions using the ["functionname"] syntax, because the [] operator is overloaded.
Note that a common C++ object type is used to represent both function arguments and struct member declarations, because the information needed is essentially the same for both. The type of this object is Arg_or_Member_Declaration. So, within each FunctionDeclaration object is a C++ vector of Arg_or_Member_Declaration objects, each of which provides a pointer to the TypeDeclaration for the argument type as well as the argument name. You can get this factor by calling getArgumentVector(). There is a similar method available for struct TypeDeclarations called getStructMembers().

Each of the declaration objects mentioned supports a getName() method giving the name of the type, function, or argument respectively. As described above, methods are provided for struct and array types that allow you to find the array bounds, struct member types, array member types, etc.

Remember, your job will be to write an RPC generator program that reads in an IDL file and produces automatically proxies and stubs similar to the handwritten ones in the samples (you don't have to make them look the same, you have to make them work!) To do that, you will almost surely want to work through the parse tree for functions in very much the same way that the idl_to_json does. As noted above, you may adapt idl_to_json.cpp to become the main program for your rpcgenerate, but you will lose credit if you leave lots of old misleading comments in the source!

General Hints and Warnings

Here are a few additional hints and warnings regarding this project:

In C++, the string type declaration needs to be included, and it lives in the std namespace (the full type name is std::string). Any source file that includes any of our IDL files that uses string should first do:
```
#include <string>
using namespace std;
```
If you don't, the string type will come up as undefined.
The IDL parser was written with in some haste when this course was first taught. The code for it and the samples is not as clean as I would like, though it generally seems to work. It will try to produce useful error messages if you give it buggy IDL, but they may not always be as helpful as you would like. I won't be shocked if you find worse problems.
Remember: no comments in the IDL, and only the limited syntax described above.
It's a known bug that the parser framework does not free the structures it allocates; valgrind will complain. Many other shortcomings are noted with NEEDSWORK in the source.

Hints and Warnings For Python and Ruby Users

The following sections discuss the sample Ruby and Python code we provide for accessing IDL type information. If you are using those languages, you will likely want to adapt the samples for use in your RPCgenerator.

A Ruby Example

The following demonstration code is in the RPC.samples directory for you to play with. It shows how a Ruby program can easily invoke idl_to_rpc and use the output. The program is named print_functions.rb.

#!/bin/env ruby
#
#          print signatures of all the functions named in supplied IDL file
#

require 'json'

IDL_TO_JSON_EXECUTABLE = './idl_to_json'

#
#     Make sure invoked properly
#

abort "Usage: #{$PROGRAM_NAME} <idlfilename>" if ARGV.length != 1


#
#     Make sure file exists and is readable
#
filename = ARGV[0]
abort "#{$PROGRAM_NAME}: no file named #{filename}" if not File.file? filename
abort "#{$PROGRAM_NAME}: #{filename} not readable" if not File.readable? filename

#
#     Parse declarations into Ruby hash
#
if !File.executable?(IDL_TO_JSON_EXECUTABLE)
  abort "#{IDL_TO_JSON_EXECUTABLE} does not exist or is not executable..."
end
json_string =`idl_to_json #{filename}`
abort "#{$PROGRAM_NAME}: Failed to parse IDL file #{filename}" if $? != 0
decls =  JSON.parse(json_string)

#
#     Print the function signatures
#
decls["functions"].each do |name, sig|

  # Ruby Array of all args (each is a hash with keys "name" and "type")
  args = sig["arguments"]

  # Make a string of form:  "type1 arg1, type2 arg2" for use in function sig
  argstring = args.map{|a| "#{a["type"]} #{a["name"]}"}.join(', ')

  # print the function signature
  puts "#{sig["return_type"]} #{name}(#{argstring})"

end

Most of this should be straightforward if you know Ruby. One possible exception is the line:

argstring = args.map{|a| "#{a["type"]} #{a["name"]}"}.join(', ')

The map call, as you might expect, maps over the items in list args creating a new list. Each item in that list is of the form "type argname". On that new list a call is made to join(', '), which joins all the strings into a single string, using ", " as the glue. The resulting string might look like:

int width, float distance, string s

A Python Example

Here is a version in Python, named print_functions.py. The logic in this is intentionally as similar as possible to the Ruby, so you can compare them, and also use one as a guide to learning the other.

#!/bin/env python3
#
#          print signatures of all the functions named in supplied IDL file
#

import subprocess
import json
import sys
import os

IDL_TO_JSON_EXECUTABLE = './idl_to_json'

try:
    #
    #     Make sure invoked properly
    #
    assert len(sys.argv) == 2, "Wrong number of arguments"

    #
    #     Make sure IDL file exists and is readable
    #
    filename = sys.argv[1]
    assert os.path.isfile(filename), f"Path {filename} does not designate a file"
    assert os.access(filename, os.R_OK), f"File {filename} is not readable" 

    #
    #     Make sure idl_to_json exists and is executable
    #
    assert os.path.isfile(IDL_TO_JSON_EXECUTABLE), f"Path {IDL_TO_JSON_EXECUTABLE} does not designate a file...run \"make\" to create it" 
    assert os.access(IDL_TO_JSON_EXECUTABLE, os.X_OK), f"File {IDL_TO_JSON_EXECUTABLE} exists but is not executable"

    #
    #     Parse declarations into a Python dictionary
    #
    decls = json.loads(subprocess.check_output([IDL_TO_JSON_EXECUTABLE, filename]))

    #
    # Loop printing each function signature
    #
    for  name, sig in decls["functions"].items():

        # Python List of all args (each is a dictionary with keys "name" and "type")
        args = sig["arguments"]

        # Make a string of form:  "type1 arg1, type2 arg2" for use in function sig
        argstring = ', '.join([a["type"] + ' ' + a["name"] for a in args])

        # print the function signature
        print(f"{sig['return_type']} {name}({argstring})")

except Exception as e:
    print(str(e), file=sys.stderr)
    print(f"Usage: {sys.argv[0]} <idlfilename>", file=sys.stderr)

If you are new to Python, a few of the constructions here may seem a bit tricky:

decls = json.loads(subprocess.check_output(["idl_to_json", filename]))

subprocess.check_output() runs the supplied command and returns as a string the standard output. So, here we are running the idl_to_json program on the named file. json.loads() interprets the resulting string as json, creating the dictionary we need.

argstring = ', '.join([a["type"] + ' ' + a["name"] for a in args])

Start with the inner [a["type"] + ' ' + a["name"] for a in args]. This is a great example of what Python calls a list comprehension. What it does is to construct a new list by looping through (or mapping if you prefer) the items in list args. For each such item a in args, it computes the string a["type"] + ' ' + a["name"], which is the argument type and the argument name separated by a space.

Then on the resulting list a call is made to ', '.join(), which joins all the items in the list using the string ', ' as the glue between the items. The resulting string might look like:

int width, float distance, string s

Tufts CS 117:
Type System and Tools
for the RPC Programming Assignment

Table of Contents

Overview

Types and IDL

Built-in primitive types

Structured types (structs & arrays)

Functions and IDL

IDL File Syntax

Type implementation details

Exceptions Raised by Functions

The IDL Parser

General Hints and Warnings

Hints and Warnings For Python and Ruby Users

A Ruby Example

A Python Example

Tufts CS 117:Type System and Toolsfor the RPC Programming Assignment

Table of Contents

Overview

Types and IDL

Built-in primitive types

Structured types (structs & arrays)

Functions and IDL

IDL File Syntax

Type implementation details

Exceptions Raised by Functions

The IDL Parser

General Hints and Warnings

Hints and Warnings For Python and Ruby Users

A Ruby Example

A Python Example

Tufts CS 117:
Type System and Tools
for the RPC Programming Assignment