Table of Contents
- Overview
- Types and IDL
- Functions and IDL
- IDL File Syntax
- Type implementation details
- Exceptions Raised by Functions
- The IDL Parser
- General Hints and Warnings
- Hints and Warnings For Python and Ruby Users
Overview
The RPC assignment uses a type system
that is a very restricted subset of C++ types.
Interface definitions are C++ type and function declarations, packaged into includable
files; these have the extension .idl
but otherwise
resemble and are usable like C++ .h
files.
This document describes:
- The type system itself
- .idl files
- The tools provided for parsing .idl files. Specifically:
- A C++ parsing framework that creates a tree of C++ objects corresponding to the types and functions in an idl file
- A C++ program called
idl_to_json
which serves a a demonstration sample for that framework, but which is also very useful writing the type information into a JSON file. (Those who build their RPC generators in Python or Ruby are likely to use only the JSON, and may have no need to learn details of the C++ framework.
- Some related code that's include in the
RPC.samples
distribution that's provided for the RPC assignment
Types and IDL
This section describes the types that must be supported for the arguments and return values of functions.
Built-in primitive types
The following atomic types are built in, and cannot be re-declared in IDL:
- void
- float
- int
- string
Structured types (structs & arrays)
IDL is used to create struct and array types. Members of structs and arrays may be of the primitive types listed above, or maybe other structures or arrays. In general, forward references to types declared later in the IDL are not permitted.
Detailed syntax is provided below, but informally, structs are declared using a limited version of C++ syntax, for example:
struct Person { string firstName; string lastName; int age; };
Note that semicolons are required after each member declaration, as well as following the curly brace that completes the entire structure.
The isStruct()
method on the corresponding TypeDeclaration object created by the IDL parser (described below) will return true
, and getStructMembers
will return a C++ vector
with three members, one for firstName
, one for lastName
, and one for age
.
Array types have fixed bounds and are created as a byproduct of their use in structures or function signatures. For example:
struct takesTwoArrays { int x[10]; int y[20]; };
This declares a structure with two arrays of integers, one with 10 elements and one with 20. When this declaration is encountered, the IDL parser will create not just the type declaration for the structure, but also declarations for two types named __int[10]
and __int[20]
. (The names are mostly not important, but may show up in debugging output if you ask the type for its name.) The isArray()
method on the corresponding TypeDeclaration objects returns true
.
The getArrayMemberType()
method returns a pointer to the TypeDeclaration
objection for the member type (int
in both examples above),
and the getArrayBound()
methods returns the number of elements (10 or 20 respectively).
Note that duplicate types are not created; the following struct would result in a single type __int[10]
shared by both the members:
struct takesTwoArrays { int x[10]; int y[10]; };
All of this is handled for you by the IDL parser.
The idl_to_json
program offered for use with Ruby or Python
versions of rpcgenerate
uses the same IDL parser, so the
JSON will contain information for the same type names described above.
Note that arrays of arrays are supported, just as in C and C++:
struct takesTwoArrays { int x[10][20]; };
This results in a type named __int[10][20]
; note that the under bars (_) are not doubled, but again, you will mainly see these type names in debugging output, since they are maintained for you by the parser.
Note that multidimensional arrays are indeed modeled as arrays of arrays. The above
struct would implicitly define two array types:
- Typename:
__int[20]
Membertype:int
Bound:20
- Typename:
__int[10][20]
Membertype:__int[20]
Bound:10
There are no pointers or pointer types supported by this IDL or RPC framework; all you need to handle are the built in types, structures, and arrays, though as shown, types can be composed to arbitrary depth: arrays of structures and structures containing array members should be supported.
Functions and IDL
Functions in IDL are declared using traditional C/C++ prototype syntax:
int multiply(int x, int y); // accept two ints, return an int void addEmployee(Person newEmployee); // Uses Person struct from above Person getEmployee(string lastName); // Return values can be structs int max(int numbers[100]); // Array types can be declared
Functions can return structures, as shown above, but not arrays.
All arguments to functions, including arrays, are passed by value,
not by reference; if a routine like max
were for some reason
to update its input array, that update would not be sent
back to the client.
Function arguments have names as well as types, but as with C and C++, parameter passing is by position. No means is provided for setting parameters by name on a function call.
Pointer types are not supported. Neither are C++ references (like int&
).
At most one function with a given name is allowed in each IDL file; overloaded functions are not supported.
IDL File Syntax
The IDL file syntax is:
TOKEN = ...see C++ rules for identifier names NUMBER = 1*DIGITS ; NUMBER is one or more digits ; ; Type declarations ; PREDEFINEDTYPE = "void" / "float" / "int" / "string" USERDEFINEDTYPE = TOKEN ; must be declared in the IDL to be accepted TYPE = PREDEFINEDTYPE / USERDEFINEDTYPE ; ; Function declarations ; ARGUMENTNAME = TOKEN ARGUMENT = TYPE ARGUMENTNAME *( "[" NUMBER "]" ) RETURNTYPE = TYPE FUNCTIONNAME = TOKEN FUNCTIONDECL = RETURNTYPE FUNCTIONNAME "(" ")" ";" / RETURNTYPE FUNCTIONNAME "(" ARGUMENT *("," ARGUMENT ) ")" ";" ; ; Structure declarations ; MEMBERNAME = TOKEN MEMBER = TYPE MEMBERNAME *( "[" NUMBER "]" ) ";" STRUCTNAME = TOKEN STRUCTDECL = "struct" STRUCTNAME "{" *MEMBER "}" ";" ; ; Whole IDL file ; IDLFILE = *(STRUCTDECL / FUNCTIONDECL)
The grammar does not illustrate it, but whitespace may be used freely between separately named non-terminals and/or separately quoted terminals. Except for white space, absolutely nothing else can be in an IDL file. In particular, NO COMMENT SYNTAX IS RECOGNIZED. The only reason for this is that there has not yet been time to upgrade the parser to skip the comments. DO NOT EXPECT THE PARSER TO ACCEPT OTHER CONSTRUCTIONS EVEN IF THEY ARE PERFECTLY LEGAL C OR C++!
Type implementation details
IMPORTANT: For this assignment, you MUST make the following assumptions:-
String
data does not live entirely within the space allocated for a string variable, array element or structure member. To implement a remote call, you must explicitly serialize the actual contents of the string, and preserve its length. - You can assume that the
int
type represents a signed 32-bit integer, but you don't know whether the byte order will be the same at both ends of the connection. This means that you may either convert values to strings for transmission (probably the easiest way) or handle the byte order explicitly. You can assume we will use compilers that implement ints as 32 bit, regardless of whether we are running on a 32-bit or 64-bit architecture. - You may assume that
floats
are the usual IEEE 32 bit floating point used by C++, though again, byte order is not specified. As with integers, it's safer to convert to a character string for transmission (you may rely on any of the standard C/C++ formatting libraries... we won't judge you on the finer points of conversion accuracy). If necessary, you may just send the 32-bit binary number, but please indicate in your report which you chose to do. - As noted above, structure packing may in principle be different at two ends of the connection. Even if the two compilers use the identical representation for strings in the example above, there is no guarantee that
someString
would wind up at the same offset in the structure at the sending and receiving end. Indeed, you can tell that some interesting alignment is going on in the example above, because the int is 4 bytes, and the string (you can check) is 8, yet the struct as a whole is 16 bytes, not 12. Most likely, the compiler is "wasting" 4 bytes after the int, as most compilers want structs to begin on an 8 byte boundary.
Again, the main implication of all of the above is that you should plan on serializing each field individually, and in a machine-independent form.
Exceptions Raised by Functions
In C++ the possibility that a function will throw an exception is considered part of its signature. See, for example this informal tutorial. You should assume that the functions described by IDL are implicitly declared as noexcept(true)
, which means that they will not throw exceptions. Note that this is different from the C++ default, which is that unless otherwise declared a function may raise an exception.
An obvious consequence of the fact that functions must not raise exceptions is that implementations of RPC for CS 117 need not handle functions that raise exceptions. The behavior of your system when a function does raise an exception (e.g. if a remote call is made to a function that divides by zero, or that explicitly throws an exception) is undefined; you may do whatever you like in such a case.
The IDL Parser
This section of the instructions will mainly be of interest to those building
rpcgenerate
using C++.
If you are using Python or Ruby, then you can just use the
JSON produced by idl_to_json
.
Although no RPC generator is provided for you,
a moderately sophisticated IDL parser is provided.
This parser is used by idl_to_json.cpp
,
so reading that source file is the best way to learn to call the parser
from C++.
You give this parser any C++ input stream containing IDL
(typically from opening a file
in the obvious way — idl_to_json.cpp
has what you need), and it constructs an object of class Declarations
.
#include "declarations.h" // // Open the file // ifstream idlFile(fileName); // open if (!idlFile.is_open()) { ... Error handling code here... } // The following line does all the work parsing the file into // the variable parseTree Declarations parseTree(idlFile);This object is the root of a parse tree that contains the following:
- A public member
types
which is a C++ map from type name to aTypeDeclaration
object. You can use the map in the standard C++ way to look up a type by name, or if necessary you can iterate through all types. The iteration is demonstrated in theidl_to_json.cpp
sample program. You can also indextypes
using the["typename"]
syntax, because the[]
operator is overloaded. typeExists("typeName")
andfunctionExists("functionName")
methods that return true iff the named type/function has been declared. It's a good idea to check this before retrieving a type or function from the maps, or you'll have to deal with C++ conventions for missing map entries.- A public member
functions
which is a C++ map from type name to aFunctionDeclaration
object. You can also indexfunctions
using the["functionname"]
syntax, because the[]
operator is overloaded. - Note that a common C++ object type is used to represent both function arguments and struct member declarations, because the information needed is essentially the same for both. The type of this object is
Arg_or_Member_Declaration
. So, within eachFunctionDeclaration
object is a C++ vector ofArg_or_Member_Declaration
objects, each of which provides a pointer to theTypeDeclaration
for the argument type as well as the argument name. You can get this factor by callinggetArgumentVector()
. There is a similar method available for struct TypeDeclarations calledgetStructMembers()
.
Each of the declaration objects mentioned supports a getName()
method giving the name of the type, function, or argument respectively.
As described above, methods are provided for struct and array types
that allow you to find the array bounds, struct member types,
array member types, etc.
Remember, your job will be to write an RPC generator program
that reads in an IDL file and produces automatically proxies and stubs
similar to the handwritten ones in the samples
(you don't have to make them look the same, you have to make them work!)
To do that, you will almost surely want to work through the parse tree
for functions in very much the same way that the idl_to_json
does.
As noted above, you may adapt idl_to_json.cpp
to become the main program for your rpcgenerate
,
but you will lose credit if you leave lots of old misleading comments
in the source!
General Hints and Warnings
Here are a few additional hints and warnings regarding this project:
- In C++, the
string
type declaration needs to be included, and it lives in the std namespace (the full type name isstd::string
). Any source file that includes any of our IDL files that usesstring
should first do:#include <string> using namespace std;
If you don't, thestring
type will come up as undefined. - The IDL parser was written with in some haste when this course was first taught. The code for it and the samples is not as clean as I would like, though it generally seems to work. It will try to produce useful error messages if you give it buggy IDL, but they may not always be as helpful as you would like. I won't be shocked if you find worse problems.
- Remember: no comments in the IDL, and only the limited syntax described above.
- It's a known bug that the parser framework does not free the structures it allocates; valgrind will complain. Many other shortcomings are noted with NEEDSWORK in the source.
Hints and Warnings For Python and Ruby Users
The following sections discuss the sample Ruby and Python code we provide for accessing IDL type information. If you are using those languages, you will likely want to adapt the samples for use in your RPCgenerator.
A Ruby Example
The following demonstration code is in the RPC.samples directory for you to play with. It
shows how a Ruby program can easily invoke idl_to_rpc
and
use the output. The program is named print_functions.rb
.
#!/bin/env ruby # # print signatures of all the functions named in supplied IDL file # require 'json' IDL_TO_JSON_EXECUTABLE = './idl_to_json' # # Make sure invoked properly # abort "Usage: #{$PROGRAM_NAME} <idlfilename>" if ARGV.length != 1 # # Make sure file exists and is readable # filename = ARGV[0] abort "#{$PROGRAM_NAME}: no file named #{filename}" if not File.file? filename abort "#{$PROGRAM_NAME}: #{filename} not readable" if not File.readable? filename # # Parse declarations into Ruby hash # if !File.executable?(IDL_TO_JSON_EXECUTABLE) abort "#{IDL_TO_JSON_EXECUTABLE} does not exist or is not executable..." end json_string =`idl_to_json #{filename}` abort "#{$PROGRAM_NAME}: Failed to parse IDL file #{filename}" if $? != 0 decls = JSON.parse(json_string) # # Print the function signatures # decls["functions"].each do |name, sig| # Ruby Array of all args (each is a hash with keys "name" and "type") args = sig["arguments"] # Make a string of form: "type1 arg1, type2 arg2" for use in function sig argstring = args.map{|a| "#{a["type"]} #{a["name"]}"}.join(', ') # print the function signature puts "#{sig["return_type"]} #{name}(#{argstring})" end
Most of this should be straightforward if you know Ruby. One possible exception is the line:
argstring = args.map{|a| "#{a["type"]} #{a["name"]}"}.join(', ')
The map
call, as you might expect, maps over the items in list
args
creating a new list. Each item in that list is
of the form "type argname". On that new list a call is made to
join(', ')
, which joins all the strings into a single string,
using
", "
as the glue. The resulting string might look like:
int width, float distance, string s
A Python Example
Here is a version in Python, named print_functions.py
.
The logic in this is intentionally as similar as possible to the Ruby, so you can
compare them, and also use one as a guide to learning the other.
#!/bin/env python3 # # print signatures of all the functions named in supplied IDL file # import subprocess import json import sys import os IDL_TO_JSON_EXECUTABLE = './idl_to_json' try: # # Make sure invoked properly # assert len(sys.argv) == 2, "Wrong number of arguments" # # Make sure IDL file exists and is readable # filename = sys.argv[1] assert os.path.isfile(filename), f"Path {filename} does not designate a file" assert os.access(filename, os.R_OK), f"File {filename} is not readable" # # Make sure idl_to_json exists and is executable # assert os.path.isfile(IDL_TO_JSON_EXECUTABLE), f"Path {IDL_TO_JSON_EXECUTABLE} does not designate a file...run \"make\" to create it" assert os.access(IDL_TO_JSON_EXECUTABLE, os.X_OK), f"File {IDL_TO_JSON_EXECUTABLE} exists but is not executable" # # Parse declarations into a Python dictionary # decls = json.loads(subprocess.check_output([IDL_TO_JSON_EXECUTABLE, filename])) # # Loop printing each function signature # for name, sig in decls["functions"].items(): # Python List of all args (each is a dictionary with keys "name" and "type") args = sig["arguments"] # Make a string of form: "type1 arg1, type2 arg2" for use in function sig argstring = ', '.join([a["type"] + ' ' + a["name"] for a in args]) # print the function signature print(f"{sig['return_type']} {name}({argstring})") except Exception as e: print(str(e), file=sys.stderr) print(f"Usage: {sys.argv[0]} <idlfilename>", file=sys.stderr)
If you are new to Python, a few of the constructions here may seem a bit tricky:
decls = json.loads(subprocess.check_output(["idl_to_json", filename]))
subprocess.check_output()
runs the supplied command and returns
as a string the standard output. So, here we are running the idl_to_json
program on the named file. json.loads()
interprets the resulting string
as json, creating the dictionary we need.
argstring = ', '.join([a["type"] + ' ' + a["name"] for a in args])
Start with the inner [a["type"] + ' ' + a["name"] for a in args]
.
This is a great example of what Python calls a list comprehension.
What it does is to construct a new list by looping through
(or mapping if you prefer) the items in list args
.
For each such item a
in args
, it computes the string
a["type"] + ' ' + a["name"]
, which is the argument
type and the argument name separated by a space.
Then on the resulting list a call is made to ', '.join()
,
which joins all the items in the list using the string ', ' as the glue
between the items. The resulting string might look like:
int width, float distance, string s