The C-- Language Reference Manual

Simon Peyton Jones   Thomas Nordin   Dino Oliva   Pablo Nogueira Iglesias

May 23,1999






1   Introduction

C-- is a portable assembly language designed to be a good backend for high level languages (particularly for those that make use of garbage-collection) and to run fast on a number of todays major computer architectures. It is also designed to have as few dependencies as possible on the underlying hardware, but speed and ease of use has sometimes taken precedence over orthogonality and minimality. C-- should be rich enough to be a viable backend for most mainstream and research compilers.

This paper should be sufficiently self-supporting so that anyone who knows an imperative language and is acquainted with computers should be able to write her/his own C-- programs after reading this document.

2   Syntax definition

The syntax of C-- is given in Figures 1 and 2.

2.1   General

A C-- program file is written in eight bit ASCII characters. It consists in a sequence of data layout directives (Section 4), and/or procedure definitions (Section 5), and/or import declarations, and/or export declarations (Section 2.6), and/or global declarations (Sections 3.6), interleaved in any order.

A C-- compilation unit is a C-- program file that can be successfully compiled and that is suitable for linking.

C-- does not support input/output. Nevertheless, it can be accomplished using a foreign language call (Section 3.9).

2.2   Comments

Comments start with /* and end with */. They can be nested.

2.3   Names

Names are made up of letters, digits, underscore and dots. A name cannot begin with a number character or with a dot followed by a number character. Upper and lower case are distinct. Imported names should also follow these restrictions.

Names are identifiers for registers or memory addresses (Section 3.8).

The following are examples of legal C-- names:
        x
        foo
        _912
        aname12
        _foo.name_abit_12.long
        Sys.Indicators
These are two illegal C-- names:
        .9Aname
        3illegal

2.4   Name scope

Procedure and label names are always global inside a C-- compilation unit (or program). Local variable names, formal argument names, stack label names and local control labels are only in scope of the procedure body where they are declared. There is no nested scoping of names inside a procedure. A local name shadows a global name (C-- compilers may choose to emit a warning in these cases). Procedure and label names may be used before they are declared.

2.5   Reserved words

The following are reserved words and cannot be used as names for registers or memory addresses:

abs% absf% align C data default else eexponentf% xport float32
float64 foreign fractionf% fractpartf% global goto if intpartf% import
jump neg% negf% predf% register return roundf% scalef% sign% signf%
stack succf% switch truncf% ulpf% bits8 bits16 bits32 bits64

2.6   The import and export declarations

Names that are to be used outside of the C-- program must be exported with the export declaration. Likewise, names that the C-- program uses and does not declare must be imported with the import declaration. Only procedure and (pointer) label names may be exported.

Imported names should follow the syntatic restriction mentioned in Section  2.3.

An example where a few C external names are imported and a few C-- names are exported:

  import printf, sqrt; /* C procedures used in this C-- program */
  export foo, bar;     /* To be used outside   this C-- program */
Names that are explicitly exported and imported are guaranteed to be unchanged by the compiler. All other names might be renamed.

The type of a name listed in an import declaration is the native pointer type of the architecture.

An import or an export declaration may appear anywhere in the program where a data layout directive or a procedure definition does.

2.7   Constants

Constants can be (signed) integers, (signed) floating point numbers, characters, strings and names. C-- follows C's syntax for denoting integer, floating point, character, and string constants.

2.7.1   Integer and floating point numbers

Integer constants have of type bits. Floating point constants have type float. Their size is architecture-dependent.

2.7.2   Characters and strings

Character and string constants are treated as integers and as pointer labels respectively. Character constants are ASCII characters surrounded by single quotes. String constants are a sequence of ASCII characters surrounded by double quotes.

A character constant is treated as an integer whose value is the character's 8-bit ASCII code. Therefore, character constants have type bits8. C-- uses C's escape sequences to denote special characters, such as \n for the new line and \t for the tabulator.

For example, character constant 'H' is a bits8 with value 72. String constants are like labels that point to the first bits8 of an array of bits8s stored in static memory. Therefore, they have type bitsn where n is the particular architecture's natural pointer size. String constants are not automatically null-terminated. For example, the string "Hello World" is viewed as a label that points to the first byte of the array of bytes with values 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, stored in static memory.

It is possible to have UTF-8 integers for single characters and for string characters.1

The syntax to specify an UTF-8 constant is:
unicode(constant)
where constant is a character constant or a string constant. The type of UTF-8 characters is bits16, for it requires two bytes---two ASCII characters---to code a Unicode character. UTF-8 strings are pointers to the first bits16 of an array of bits16s stored in static memory, and therefore, they have type bitsn, where n is the architecture's natural pointer size.

3   Fundamental concepts in C--

3.1   Memory

Memory is an array of bytes from which different sized types (Section 3.4) can be read and written. The size of the addressable memory is implementation dependent (Section 3.7). All addresses and offsets are specified in bytes. No guarantee about endianess is given, i.e. a portable program should either not depend on a specific endianess or find it out.

3.2   Data segment

The data segment is the part of memory where the static (initialised or uninitialised) data is allocated. The data segment is read/write, so the values stored can be changed at runtime. The size and initial content of the data segment is determined at compile time (Section 4). C-- does not provide dynamic memory allocation natively, nonetheless, it can be accomplished with foreign language calls (Section 3.9 and Section 5.4).

3.3   Code segment

The code segment is the part of memory where the executable program code is stored (Section 5). The code segment consists of a series of procedure definitions.

C-- does not currently provide a mechanism for creating code at runtime.

3.4   Types

There are only two kinds of types provided by C--, namely bits and float. These types can have different sizes. A type must be qualified with a size. Thus bits8 and bits16 are different types.

There is no pointer type. The bitsn type can be used for pointers (addresses), where n is the particular architecture's natural pointer size, i.e.: n is the number of bytes needed to hold a memory address in the particular architecture.

For example, a four byte word is specified as bits32, an eight byte float is specified as float64 and so on.

Types are used in
  1. Local variable declarations (Section 5.2.1), to declare the type of local variables.

  2. Memory write statements and memory read expressions (Section  5.3.3 and Section  6.2, respectively) to indicate the type of the value written/read.

  3. Data layout directives (Section 4) to indicate the type of the allocated datum.

3.5   Local variables (or registers)

Any number of local variable names may be declared inside procedure bodies. Local declarations must appear before the statements in the text of the procedure body. They are typed storage locations that don't have an address. The term ``local variable name'' is interchangeable with the term ``register'', since there is an unlimited supply of (virtual) registers: i.e. a local variable name will be mapped to a machine register if there is one available, otherwise it is mapped to a memory location (e.g. the stack), but the mapping is transparent and local variables should be viewed as registers.

3.6   Global variables

Global variables are like local variables except that the scope of a global is all the functions of a C-- compilation unit.

Globals differ from data declarations in that data allocates memory, and a data label refers to the address of a datum. Globals, like locals, should be viewed as registers and don't have addresses.

For example, given the declaration

  global {
    bits64 hp;
  }
the implementation attempts to put variable hp in a register but, like locals and procedure parameters, it may put it in memory. global declarations may name specific registers, like

  global {
    bits64 hplim "%ebx";
  }
Register names are implementation-dependent.

All separately compiled modules must have identical global declarations.

3.7   Addresses

To specify where in memory to read or write we need an address. Any expression that evaluates to a bitsn can be used as an address, where n is the architecture's natural pointer size, i.e.: n is the number of bytes needed to hold a memory address in the particular architecture.

Absolute addresses can be used but what they refer to is implementation dependent. Their type is also bitsn.

3.8   Names

A name declares either a register or a memory address. Register names are procedure's local variables. Memory address names are either labels (Section 4.1) or procedure names (Section 5).

3.9   Foreign language interface

The foreign language interface is a way for C-- programs to use other calling conventions for procedure inter-operation with foreign code. This interface is nearly 100% portable across architectures (Section 5.4).

4   Data layout directives

Memory in the data segment is allocated using the data directive. A memory block is organised as a sequence of typed data. Each datum is thought of as an array of bytes that may be initialised.

Here is an example that allocates and initializes some memory. In particular, it allocates 8 datums of types bits32, bits16, float64, bits32, bits8, bits16, bits8 and bits64 respectively. The example is explained in more detail in the remainder of this section:

  data {
    foo:  bits32[4]{1,2,3,ff};  /* ff is a forward reference */
          bits16[4]{1,2};
    ff:   float64[2]{2.8,3.1}; bits32[2]{ff,foo};
    str:  bits8[]"Hello world\0";
    ustr: bits16[]unicode("Hello world\0");
          bits8;
    xs:   bits64[]{"This is an", "array of", "bits64's"};
  }
There may be any number of data layout directives in a C-- program.

4.1   Labels

Labels are the means to refer to the allocated memory. They should be viewed as pointers and not as memory locations. A label declaration consists of a name followed by a colon. Once declared, a label is a name (and so an expression) that refers (points) to a memory address. Therefore it has bitsn type, where n is the particular architecture's natural pointer size. Labels may be used before their are declared, e.g. label ff is used in the initialisation of the data directive's first datum before it is declared pointing to the third.

Note that labels do not provide any information about the type of the data pointed to by them.

A label points to the first byte after its declaration. Here is an example in which four labels point to the same datum:

  data { foo: label1: label2: bar: bits32} /* just allocates */
Memory is always allocated without padding inside a single data layout directive, so it is possible to find any given data in the data segment by starting from a label and adding the right offset, as in, for example, the read expression bits16[foo+4]. Indeed, foo+4 does not have to point to the beginning of a data element. It may point to any other data byte, but it is assumed by C-- that it is 2-byte aligned. To align a label (and hence the datum it points to) to a specific boundary, an alignment directive (Section 4.3) has to be placed before the label. In the following example, foo and bar might or might not be the same address, but bar is guaranteed to be aligned on an eight byte boundary.
  data { foo: align8;
         bar: bits32{0}; 
       }
It is possible to have a stupid data layout directive with no labels that is inaccessible.

4.2   Initialisation

Memory is allocated by specifying the type of the datum, the number of datum's elements to allocate, and the initial value for each element. The particular syntax is:
type[n]{constant- list};
where n specifies how many elements of the type type have to be allocated, and constant-list provides the initial value (of type type) for each allocated element, in the form of a comma-separated list of constants or constant expressions (i.e. expressions whose value is known at compile time).

There are a number of possible variants:
  1. If [n] is not provided, only one element is allocated. The {constant-list} may or may not be provided. If provided, it should contain only one constant or constant expression to which the element is initialised. If not provided, no initial value is given. For example:

      data { lb1: bits8; }
       /* Allocates one byte (contains garbage) */
      data { lb2: bits8{17}; }
       /* Allocates one byte and initialises it to (ASCII code) 17 */
      data { lb3: bits32{17}; }
       /* Allocates one 4-byte word and initialises it to integer 17   */
    
  2. If [n] is provided, then n elements are allocated. The {constant-list} may or may not be provided. If not provided, no initial value is given. If provided, it should contain c constants or constant expressions, such that c £ n. Element i (i:0 ... (n-1)) is initialised to the value of the constant or constant expression j (j:0 ... (c-1)) in {constant-list}, such that j = i mod c. For example:

      data { lb1: bits8[17]; }
       /* Allocates 17 bytes (that contain garbage) */
      data { lb2: bits8[17]{0}; }
       /* Allocates 17 bytes and initialises all of them to 0 */
      data { lb3: bits32[6]{1,2,3}; }
       /* Allocates six 4-byte words and initialises them 
        * to 1,2,3,1,2,3 respectively. 
        */
      data { lb4: bits32[6]{1,2,3,1,2,3}; }
       /* Allocates six 4-byte words and initialises them 
        * to 1,2,3,1,2,3 respectively.
        */
      data { lb5: bits32[4]{1,2,3}; }
       /* Allocates four 4-byte words and initialises them
        * to 1,2,3,1 respectively.
        */
    
  3. There is also the possibility to have abbreviations when n = c.
         
    type [] {constant-list} ; is an abbreviation for type [c] {constant-list} ;
    bits8[]"char1 ... charn"; is an abbreviation for bits8[n]{' char1 ', ... ,' charn '};
    bits16[]unicode(" char1 ... charn "); is an abbreviation for  

                       
    bits16[n]{unicode('char1'),...,unicode('charn')};

    For example:

      data { s1: bits8[6]{'h','e','l','l','o','\0'}; }
      data { s2: bits8[]"hello\0"; }  
       /* Both directives  allocate 7 bytes and initialise
        * them to the same ASCII code integers.
        */ 
      data { f1: float64[3]{3.5, 4.4, 6.98}; }
      data { f1: float64[] {3.5, 4.4, 6.98}; }
       /* Both directives allocate three 8-byte floats and initialise
        * them to the same floating point numbers.
        */
    
Since the initialised value might have dependencies on the endianess, the only way to guarantee that a memory read (Section 6.2) gets the same initialised (or written) value, is to read the datum or the element with the same type as it was initialised (or written). For example, if a datum was initialised with data {foo: bits16{17};}, if read back with bits8[foo] the value might be 0 or 17 depending on the architechture, but if read with bits16[foo] it is guaranteed to be 17.

4.3   Alignment

For performance reasons, and also to comply with some architecture requirements, it is sometimes necessary to specify the alignment of data. The alignn directive inserts padding as needed, ensuring that the next datum is placed on an n byte boundary. The value of the padding is unspecified. The alignment value n must be a power of 2 (2,4,8,16 ...).

In the following example, foo is aligned to a 4-byte boundary and bar to an 8-byte boundary. In both cases padding may be inserted: for example, between the last byte of the bits32 and the first byte of the float64 (the datum pointed to by bar) there may be padding in order to place the float64 on an 8-byte boundary.

  data { align4;
         foo: bits32{1};
         align8;
         bar: float64{1.7};
       }
In a sequence of several alignn directives, the one with the higher alignment value is considered. alignn directives appearing at the end of a data declaration are ignored. In the following example, the second bits8 is placed on an 8 byte boundary and the last align4 is ignored.

  data { foo: bits8;
         align4;
         align8;
         align2;
         bits8;
         align4;         
       }

5   Procedures

Procedures are the means to place information in the code segment. They are very similar to high-level language procedures. Procedures can optionally take arguments, contain static data declarations and return values.

5.1   Procedure definition

A procedure definition has the following syntax:
[conv] proc_name(type arg1,...,type argn) [data] { body }
where: The return type needs not be specified in the definition.

For example, procedure foo is defined as a procedure that expects one bits32 argument. Inside the procedure body, the local variable (or register) x is declared, followed by an assignment statement and a jump statement to procedure bar.

  foo(bits32 y) {
    bits32 x;
    
    x = y + 1;
    jump bar(x);
  }

5.2   Local declarations

5.2.1   Local variables

A local variable declaration has the following syntax:
type name1,...,namen ;
It declares the local variable names name1 ... namen of type type. These names will be mapped to (virtual) machine registers. As names, they are also expressions of type type.

Local variables have to be declared before they are used.

All declarations must appear at the beginning of the procedure body. All the local variable names must be unique. It is not possible to redeclare a name.

5.2.2   Stack directive

To handle high-level variables that can't be represented using C--'s primitive types, C-- can be asked to allocate named areas in the procedure's activation record.

  f (bits32 x) {
    bits32 y;

    stack { p : bits32;
            q : bits32[40];
    }
    /* Here, p and q are the addresses of the relevant chunks of
       data. Their type is the native pointer type of the machine. */
  } 
stack is rather like data; it has (almost) the same syntax between the braces, but it allocates on the stack. The only difference is that in stack allocated memory cannot be initialised. As with data, the names are bound to the (stack) addresses of the relevant locations.

A word of caution here: in the example above, p is bound to an address in f's activation frame. It only makes sense to read from/write to address p while f's frame is still on the stack, i.e., before f returns or jumps to another function). After f has returned, address p will refer to some other function's activation frame or may even be beyond the current stack pointer!

C-- makes no provision for dynamically-sized stack allocation (yet).

5.2.3   Names usage

If a local name (formal argument, local variable or stack label) is the same name as a label (Section 4.1) or a procedure name, the uses of the name within a procedure refer to the local. That is, the local name shadows the global name (Section 2.4).

5.3   Statements

5.3.1   Null statement (;)

This is just the null statement and can be inserted anywhere an ordinary statement can. It does not have any effects.

5.3.2   Assignment

An assignment statement has the following syntax:
name = expr ;
It stores the value of expr in the local variable (or register) name, where expr has the same type as name.

5.3.3   Memory write

A memory write statement has the following syntax:
bitsn[expr1] = expr2 ;
to write bitsn values, or
floatn[expr1] = expr2 ;
to write floatn values.

Expression expr1 has type bitsn, where n is the particular architecture's natural pointer size, and its value is the memory address in which the value of expr2 is written. Expression expr1 will typically contain one or more labels. Expression expr2 should be of type bitsn or floatn respectively, otherwise the value written in memory is unspecified.

The following example stores the ASCII integer code of 'A' in the 4th byte of the datum pointed to by label
bits8[label+4] = 'A';
The address yielded by expr1 is assumed aligned to the size of the type, namely, n. A memory write can optionally be qualified with an alignment flag {aligna}, so the syntax is now:

bitsn{aligna} [expr1] = expr2 ;
floatn{aligna} [expr1] = expr2 ;
A few examples of memory writes with flagged alignment:

5.3.4   if and relational operations

Conditional execution of code is accomplished with the if statement. It has the following syntax:

if expr1 rel expr2 { ...} [ else { ...} ]
The else branch is optional and the statement blocks may be empty, as in if x == 0 {}, but the curly braces are mandatory even for single statements, as in
if x == 0 { x = x + 1;}
The condition test is very simple: it consists of a relational operation, rel, that takes two expressions as arguments. The term ``operation'' is used instead of ``operator'', therefore avoiding confusion with C-- operators that are used in expressions (Section 6). Relational operations are only used in if condition tests; they cannot be used anywhere else.

This is the set of relational operations:

Name Relation
== Equality
!= NonEquality
> Greater Than
>= Greater Than or Equal
< Less Than
<= Less Than or Equal
They can all be combined with these flags:

Flag Meaning
  Signed comparison (default)
u Unsigned comparison
f Floating point comparison
fo Floating point unordered comparison, if supported
When the condition test holds, the block of statements immediately following the condition test is executed. Otherwise, if an optional else branch has been specified, its block of statements is executed. After execution of the any of these blocks, control resumes at the first statement after the if or if/else.

In the following example, >= is used in the if test condition without a flag (default signed comparison), and != is used combined with flag u to test whether the unsigned integer held in x is zero.

  f(bits32 x)
  {
    bits32 y;

    y = 0;
    if y >= bits32[foo+8] {
      y = y + 1;
      return (y);
    } else {
      x = x -u 1;
      if x !=u 0 {
         y = y + 2;
      }
      return (y);
    }
  }           

5.3.5   switch

The switch statement performs multiway branching depending on the value of a bits expression. The particular syntax is:

switch [sconst1..sconstn] expr {
sconst11 , ... , sconst1i : { ... }
·
·
·
 
sconstm1 , ... , sconstmj : { ... }
default : { ... }

}
where: In the following example, expression x+23 is assumed to yield a value in between 0 and 7. If the value is 1,2 or 3, then the first branch is taken. If the value is 5, then the second branch is taken. If the value is 0,4,6, or 7, then the default branch is taken.

  switch [0..7] x + 23 {
    1,2,3   : { y = y + 1;} 
    5       : { y = x + 1; x = y;} 
    default : { y = f();
                if y == 0 { x = 1;}
              } 
  }

5.3.6   Local control labels and goto

Local control labels are used in conjuction with the goto statement to alter the control flow within a procedure body. A local control label declaration consists of a label name followed by a colon. This kind of control label is not a name in the sense of Section 3.8, and so, it should not be confused with the pointer labels mentioned so far. The only thing that can be done with a local control label is to provide it as argument to goto statements.

In turn, a goto statement transfers control to the label it takes as argument. Only a local control label can be the argument of a goto. In the following example, the goto statement forces the control flow to resume to the very first statement after the label declaration.

  bar()
  {
  label:
    bits64[foo] = 18;
    bits64[foo+4*8] = bits64[bar];    
    goto label;
    return();
  }

5.3.7   Procedure call

A call statement invokes a procedure in the conventional way of function invocation, so all the invoking procedure's local variables are saved across the call. The particular syntax is:
[ conv ] [ name1,...,namen = ] expr( [ expr1,...,exprm ] );
where: It is unspecified what the effects are if the number and the types of the actual arguments in a call statement do not match the number and the types of the formal arguments of the invoked procedure. It is also unspecified what the effects are if the number and the types of the names in the name list do not match the number and the types of the results returned by the invoked procedure.

The order of evaluation of the arguments matters, because the evaluation of an expression can raise an exception. A C-- compiler may evaluate arguments in any order.

Call statements are not expressions and so cannot be used inside expressions. Procedure calls are complete statements. Things such as y = f(g(x)) + 1; are not allowed. Recall, however, that procedure names, as such, are expressions with the procedure address as value.

The following example is self-explanatory:
foo()
{
bits32 x, y;
x, y = bar(5);
return (x,y);
}
bar(bits32 x)
{
return (x, x+1);
}

5.3.8   jump

The jump statement performs a control jump but carrying parameters. It has as target any expression that evaluates to a procedure address and can optionally transfer arguments to that procedure. The syntax is:

jump expr(expr1,...,exprn);
where: All local variables die when jumping. It is unspecified what the effects are if the number and the types of the actual arguments in a jump statement do not match the number and the types of the formal arguments of the invoked procedure. An example of an infinite loop with no stack growth:

  bar(bits32 x, bits32 y)
  {
    jump bar(y, x);   /* Loop forever */
  }

5.3.9   return

The return statement transfers control back to the call statement issued by an invoking procedure. Optionally, it can also transfer values back. All the local variables of the procedure issuing the return die when returning. The syntax is:
[ conv ] return (expr1,...,exprn);
where expri are the expressions whose values will be returned. If no values are returned, the expression list should be empty, as in return ();. Note that in C--, a procedure may return multiple values.

The return statement may be qualified with the calling convention to be used (Section 5.4)

It is unspecified what the effects are if the number and the types of the values returned do not match between a return and the call statement.

  bar(bits32 z)
  {
    return (1+z, z/3);
  {
  foo(bits32 z)
  {
    return ();
  {

5.4   Foreign language interface

To use a foreign language calling convention for a procedure, the name of the calling convention should be declared before the procedure name with the foreign keyword. Here, foo uses the standard C calling convention.

  export foo;
  foreign C foo() 
  {
    bits32 x;
    jump bar(x);
  }
The calling convention should be also specified in the same way in call statements and in return statements, if it is not C--'s calling convention.

  import printf, fun;
  goo()
  {
    bits32 i;
    foreign C fun(5);  
        /* fun has type int -> void  */
    foreign C i = printf(str, arg); 
        /* printf() returns an int   */
    return ();
  }
  bar(bits32 a)
  {
    a = a + 1;
    foreign C return (a); /* uses C's convention to return 'a'   */
  }
There supported calling conventions are:
  1. C
  2. Pascal
All foreign language functions/procedures must have been imported with import declarations. All C-- procedures directly invoked from a foreign language must have been exported with export declarations.

When calling a C-- procedure from a foreign program, the types and sizes of the actual arguments should match the types and sizes of the formal arguments in the particular platform, otherwise the effects are unspecified. The same applies for the types and sizes of returned values.

Since the size of a particular foreign language type may differ between platforms, and since C-- types are fixed-size types, it is impossible for C-- to be completely platform independent when inter-operating with foreign languages.

6   Expressions

6.1   Introduction

An C-- expression can be a constant, a name, a memory read, a primitive, or an operator applied to other expressions. C-- makes a distinction between integer and floating point expressions, i.e., expressions that yield bitss or floats as result. The integer and floating point model is based on the LIA-1 standard (ISO/IEC 10967-1:1994(E)) and if there are any inconsistencies between this manual and LIA-1, the LIA-1 standard is correct, unless otherwise noted.

Signed and unsigned numbers are not distinguished. Instead, like any other assembler, it is the operations that are typed.

The type of any subexpression is always known and there are no automatic type casts or type conversions.

The following sections cover all the C-- operators, all the C-- primitives, and the memory read expression.

6.2   Memory read

Memory read expressions have the following syntax:

bitsm[expr]
Type: bitsn ® bitsm
to read a bitsm value, and

floatm[expr]
Type: bitsn ® floatm
to read a floatm value. Expression expr has type bitsn, where n is the particular architecture's natural pointer size. Its value is the address of the memory location to read from. It will typically contain one or more labels. The size m indicates how many bytes to read from that location.

The following example expression reads a 4-byte bits from the second byte pointed to by label p:
bits32[p+1]
The address yielded by expr is assumed aligned to an m-byte boundary. A memory read can optionally be qualified with an alignment flag {aligna}. The syntax is:

bitsm{aligna} [expr]
floatm{aligna} [expr]
A few examples of memory reads with flagged alignment:

6.3   Operators, precedence, and evaluation order

C-- operators are typed, i.e. there is a different set of operators for the two types provided by C--. The following table lists the available operators for signed bitss and floats. Operators for unsigned bitss can be obtained appending the u flag to the signed bits operators.

Each operator in the table is described in more detail in Section 6.7 and Section 6.8.

Operator type flags it can take class Description

         
* bitsn × bitsn ® bitsn t, u, h Arithmetic Integer multiplication
*f floatn × floatn ® floatn t, z, n, p Arithmetic Floating point multiplication
/ bitsn × bitsn ® bitsn t, u Arithmetic Integer division
/f floatn × floatn ® floatn t, z, n, p Arithmetic Floating point division
+ bitsn × bitsn ® bitsn t, u Arithmetic Integer addition
+f floatn × floatn ® floatn t, z, n, p Arithmetic Floating point addition
- bitsn × bitsn ® bitsn t, u Arithmetic Integer substraction
-f floatn × floatn ® floatn t, z, n, p Arithmetic Floating point substraction
% bitsn × bitsn ® bitsn t Arithmetic Integer modulo
         
~ bitsn ® bitsn   Bitwise Complement
& bitsn × bitsn ® bitsn   Bitwise AND
| bitsn × bitsn ® bitsn   Bitwise OR
^ bitsn × bitsn ® bitsn   Bitwise XOR
<< bitsn × bitsn ® bitsn   Bitwise Left shift
>> bitsn × bitsn ® bitsn u Bitwise Right shift
         


Operators should have as arguments expressions of the appropriate type, otherwise the result may be unspecified.

The next table lists the C-- operators in decreasing order of precedence. Operators in the same row have the same precedence. The reader can see that C-- operators follow the precedence order and the associativity of their C counterparts.

Operators     Associates
~     right
* *f / /f % left
+ +f - -f   left
<< >>   left
&     left
^     left
|     left

6.4   Primitives

C-- provides a set of primitive operators besides those described above. The general syntax of a primitive is:
prim_name(expr1,...exprn)
where expri are expressions and prim_name is the primitive's name. A primitive name is not a name in the sense of Section 3.8. Primitive names alone are not expressions that stand for the primitive's entry point address, since primitives are not procedures but built-in operators. Primitives can only be used inside expressions using the syntax given above.

There are bits primitives and float primitives. The following table lists all the C-- primitives. See Sections 6.7 and 6.8 for detailed information on each particular primitive.

  Primitive Type
     
bits primitives abs% bitsn ® bitsn
  neg% bitsn ® bitsn
  sign% bitsn ® bitsn
     
float primitives absf% floatn ® floatn
  exponentf% floatn ® bitsn
  fractionf% floatn ® floatn
  fractpartf% floatn ® floatn
  intpartf% floatn ® floatn
  negf% floatn ® floatn
  predf% floatn ® floatn
  roundf% floatn × bitsn ® floatn
  scalef% floatn × bitsn ® floatn
  signf% floatn ® bitsm
  succf% floatn ® floatn
  truncf% floatn × bitsn ® floatn
  ulpf% floatn ® floatn
     

6.5   Exception handling

Operators may cause system exceptions such as, for example, overflow or divide-by-zero. Operators can keep record of the exception resulted from their application if they are appended with the t (trap) flag. The exception kind is recorded in the global register3 Sys.Indicators, which is a bit vector with a bit for every kind of exception. To find out which exception has ocurred, C-- provides some predefined global constants. They are also bit vectors, with the particular bit that encodes the exception set to 1 and all the others set to 0.

System exceptions and constants are listed in the following table:

         
Exceptions lia1except ® Sys.IntegerOverflow  
    | Sys.FloatingOverflow  
    | Sys.Underflow  
    | Sys.Undefined  
    | Sys.Inexact IEC 559
    | Sys.DivideByZero IEC 559
    | Sys.Invalid IEC 559
         
Constants lia1info ® Sys.bitsn.MaxSigned  
    | Sys.bitsn.MinSigned  
    | Sys.bitsn.MaxUnSigned  
    | Sys.bitsn.MinUnSigned  
    | Sys.floatn.Radix  
    | Sys.floatn.Precision  
    | Sys.floatn.ExpMin  
    | Sys.floatn.ExpMax  
    | Sys.floatn.Denorm  
    | Sys.floatn.IEC559  
    | Sys.floatn.Max  
    | Sys.floatn.Min  
    | Sys.floatn.MinN  
    | Sys.floatn.Epsilon  
         

It is easy to find out the kind of exception that has resulted from an operator application using the bitwise operators on Sys.Indicators and the appropriate global constants. For example, to capture, handle, and clean up an overflow exception that resulted from the application of a bits addition operator, one could write:

  foo(bits32 y) {
    bits32 x;
    
    x = y +t y
    if Sys.Indicators & Sys.IntegerOverflow { 
        /* Write here code to handle exception */

        /* Clear handled exception */
        Sys.Indicators = Sys.Indicators & ~Sys.IntegerOverflow;
    } 
 }
The Sys.Indicators register can be treated as any other register, i.e. it can be cleared, bits can be flipped and so on.

6.6   Casting

6.6.1   bitsn

Type: bitsm ® bitsn
Accepts flags: u
With t flag it sets: N/A
Description:
    5in If n>m, it typecasts returning the higher order bits.
If n<m, it typecasts returning a value in which the new higher order bits are either filled with the highest order bits of its argumet value (sign extension), or filled with zeroes used with the u flag.


Example expression: bits32('A') ---sign extension. bits16u(abyte) ---zero fill.

6.6.2   floatn

Type: bitsm ® floatn
Accepts flags: t
With t flag it sets: Sys.FloatingOverflow
Description:
    5in Typecasts an integer into a floating point number, if t is used it raises Sys.FloatingOverflow if the argument is to big to fit.
Example expression: float64(foo)

6.7   bits operators and primitives

All bitsn types are bounded and the min and max values are provided as constants : Sys.bitsn.MaxSigned, Sys.bitsn.MinSigned, Sys.bitsn.MaxUnSigned and Sys.bitsn.MinUnSigned. The following definitions will be used to explain the individual operators. I denotes the set of possible integer values. N denotes the subset of I of positive integer values.

I = {x Î Z | Sys.bitsn.MinSigned £ x £ Sys.bitsn.MaxSigned}
N = {x Î Z | Sys.bitsn.MinUnSigned £ x £ Sys.bitsn.MaxUnSigned}
     
wrapI(x) = x modulo (Sys.bitsn.MaxSigned - Sys.bitsn.MinSigned + 1)
wrapI(x) Î I
     
wrapN(x) = x modulo (Sys.bitsn.MaxUnSigned - Sys.bitsn.MinUnSigned + 1)
wrapN(x) Î N

6.7.1   + and -

Type: bitsn × bitsn ® bitsn
Accepts flags: t and u
With t flag it sets: Sys.IntegerOverflow
Description:
    5in
x ± y = x ± y if x ± y Î I
  = wrapI(x ± y) if x ± y Ï I
     
x ±t y = x ± y if x ± y Î I
  = Sys.IntegerOverflow if x ± y Ï I
     
x ±u y = x ± y if x ± y Î N
  = wrapN(x ± y) if x ± y Ï N
     
x ±ut y = x ± y if x ± y Î N
  = Sys.IntegerOverflow if x ± y Ï N

Example expression: foo + 17

6.7.2   *

Type: bitsn × bitsn ® bitsn
Accepts flags: t, u, and h
With t flag it sets: Sys.IntegerOverflow
Description:
    5in
x * y = x * y if x * y Î I
  = wrapI(x * y) if x * y Ï I
     
x *t y = x * y if x * y Î I
  = Sys.IntegerOverflow if x * y Ï I
     
x *u y = x * y if x * y Î N
  = wrapN(x * y) if x * y Ï N
     
x *ut y = x * y if x * y Î N
  = Sys.IntegerOverflow if x * y Ï N
     
x *h y = high(x * y) the higher order bits
x *uh y = high(x * y) the higher order bits

Example expression: foo * 17

6.7.3   /

Type: bitsn × bitsn ® bitsn
Accepts flags: t and u
With t flag it sets: Sys.IntegerOverflow or Sys.Undefined
Description:
    5in
x / y = ë x / y û if ë x / y û Î I
  = wrapI(ë x / y û) if ë x / y û Ï I
  = undefined value if y = 0
     
x /t y = ë x / y û if y ¹ 0 and ë x / y û Î I
  = Sys.IntegerOverflow if y ¹ 0 and ë x / y û Ï I
  = Sys.Undefined if y = 0
     
x /u y = ë x / y û if ë x / y û Î N
  = wrapN(ë x / y û) if ë x / y û Ï N
  = undefined value if y = 0
     
x /ut y = ë x / y û if y ¹ 0 and ë x / y û Î N
  = Sys.IntegerOverflow if y ¹ 0 and ë x / y û Ï N
  = Sys.Undefined if y = 0

Example expression: foo / 17

6.7.4   %

Type: bitsn × bitsn ® bitsn
Accepts flags: t
With t flag it sets: Sys.Undefined
Description:
    5in
x % y = x - (ë x / y û * y) if y ¹ 0
  = undefined value if y = 0
     
x %t y = x - (ë x / y û * y) if y ¹ 0
  = Sys.Undefined if y = 0

Example expression: foo % 17

6.7.5   neg% and abs%

Type: bitsn ® bitsn
Accepts flags: t
With t flag it sets: Sys.IntegerOverflow
Description:
    5in
neg = wrapI(- x) if -x Ï I
     
neg = Sys.IntegerOverflow if -x Ï I
     
abs = wrapI(|x|) if |x| Ï I
     
abs = Sys.IntegerOverflow if |x| Ï I

Example expression: neg%(foo)

6.7.6   sign%

Type: bitsn ® bitsn
Accepts flags: N/A
With t flag it sets: N/A
Description:
    5in
sign = 0 if x = 0
  = -1 if x £ 0

Example expression: sign%(foo)

6.7.7   &, |, and ^

Type: bitsn × bitsn ® bitsn
Accepts flags: N/A
With t flag it sets: N/A
Description:
    5in
x & y = x AND y Bitwise AND
     
x | y = x OR y Bitwise OR
     
x ^ y = x XOR y Bitwise XOR

Example expression: foo & 17

6.7.8   ~

Type: bitsn ® bitsn
Accepts flags: N/A
With t flag it sets: N/A
Description:
    5in
~ x = NOT x Bitwise complement

Example expression: ~ 17

6.7.9   << and >>

Type: bitsn × bitsn ® bitsn
Accepts flags: u
With t flag it sets: N/A
Description:
    5in
x << n Left shift n bits logically
   
x >> n Right shift n bits logically
x >>u n Right shift n bits arithmeticaly

Example expression: foo << 17

6.8   float operators and primitives

The individual operators will just have a short description, for a more through discussion on the different operators consult, LIA-1 or IEC559 as appropriate.

The representation used for floating point numbers is: it is either zero or

X = ± g * re = ± 0.f1 f2 ... fp * re

where 0.f1 f2 ... fp is the p-digit fraction g (represented in base, or radix, r) and e is the exponent.

The exponent e is an integer in [emin, emax]. The fraction digits are integers in [0, r-1]. If the floating point number is normalized, f1 is not zero, and hence the minimum value of the fraction g is 1/r and the maximum value is 1-r-p.

This description gives rise to five parameters that completely characterize the values of a floating point type and they are available as bits valued constants in C-- :

Parameter Name Specifies
Sys.floatn.Radix base (r)
Sys.floatn.Precision number of radix digits provided (p)
Sys.floatn.ExpMin smallest exponent value (emin)
Sys.floatn.ExpMax largest exponent value (emax)
Sys.floatn.Denorm 1 if type has denormalized values, 0 if not
Sys.floatn.IEC559 1 if type conforms to IEC559, 0 if not
If Sys.floatn.IEC559 is equal to 1, most floating point operators support the four different rounding modes defined by IEC559. This is accomplished by adding a flag signifying which rounding mode is desired. The different flags and their rounding modes are:
Flag Round to
  Nearest
z Zero
p Positive Infinity
n Negative Infinity

A few definitions that we will use for explaining the individual operators:

r Î Z The radix of F  
p Î Z The precision of F  
emin Î Z The smallest exponent of F  
emax Î Z The largest exponent of F  
denorm Î Boolean Whether F contains denormalized values  

FN = {0, ± i * re-p | i,e Î Z, rp-1 £ i £ rp-1, emin £ e £ emax }    
FD = i * re-p | i,e Î Z, 1 £ i £ rp-1-1, e = emin }    

F = FN È FD if denorm = True  
F = FN if denorm = False  
         
fmax = max { z Î F | z > 0 } = (1-r-p) * remax  
fminN Î min { z Î FN | z > 0 } = remin-1  
fminD Î min { z Î FD | z > 0 } = remin-p  
fmin Î min { z Î F | z > 0 } = fminD if denorm = True
      = fminN if denorm = False
epsilon = r1-p    

6.8.1   +f, -f, and *f

Type: floatn × floatn ® floatn
Accepts flags: t, z, n, and p
With t flag it sets: Sys.FloatingOverflow or Sys.Underflow
Supported rounding modes (if IEC559): all
Description:
    5in The usual basic arithmetic operators with the proper rounding and trapping notification.
Example expression: foo +f 17.0

6.8.2   /f

Type: floatn × floatn ® floatn
Accepts flags: t, z, n, and p
With t flag it sets: Sys.FloatingOverflow or Sys.Underflow or Sys.Undefined
Supported rounding modes (if IEC559): all
Description:
    5in The usual basic arithmetic division with the proper rounding and trapping notification.
Example expression: foo /f 17.0

6.8.3   signf%

Type: floatn ® bitsm
Accepts flags: N/A
With t flag it sets: N/A
Supported rounding modes (if IEC559): N/A
Description:
    5in
signf(x) = 1 if x ³ 0.0
  = 0 if x = 0.0
  = -1 if x £ 0.0

Example expression: signf%(-12.450)

6.8.4   negf% and absf%

Type: floatn ® floatn
Accepts flags: N/A
With t flag it sets: N/A
Supported rounding modes (if IEC559): N/A
Description:
    5in
negf(x) = - x  
       
absf(x) = |x|  
       
signf(x) = 1.0 if x ³ 0.0
  = 0.0 if x = 0.0
  = -1.0 if x £ 0.0

Example expression: negf%(foo)

6.8.5   exponentf%

Type: floatn ® bitsn
Accepts flags: t
With t flag it sets: Sys.Undefined
Supported rounding modes (if IEC559): N/A
Description:
    5in
exponentf = Sys.Undefined if x = 0.0

Example expression: exponentf%(foo)

6.8.6   fractionf%

Type: floatn ® floatn
Accepts flags: N/A
With t flag it sets: N/A
Supported rounding modes (if IEC559): N/A
Description:
    5in
fractionf = 0 if x = 0.0

Example expression: fractionf%(foo)

6.8.7   scalef%

Type: floatn × bitsn ® floatn
Accepts flags: t, z, n, and p
With t flag it sets: Sys.FloatingOverflow or Sys.Underflow
Supported rounding modes (if IEC559): all
Description:
    5in Scales its argument by an integer power of the radix.
Example expression: scalef%(17.0, 3)

6.8.8   succf% and pred%f

Type: floatn ® floatn
Accepts flags: t
With t flag it sets: Sys.FloatingOverflow
Supported rounding modes (if IEC559): N/A
Description:
    5in
succf = Sys.FloatingOverflow if x = fmax
     
predf = Sys.FloatingOverflow if x = -fmax

Example expression: succf%(17.0)

6.8.9   ulpf%

Type: floatn ® floatn
Accepts flags: t
With t flag it sets: Sys.Underflow or Sys.Undefined
Supported rounding modes (if IEC559): N/A
Description:
    5in
ulpf = Sys.Underflow
if x ¹ 0 and r
eF(x)-p
 
Ï F
  = Sys.Undefined if x = 0

Example expression: ulpf%(17.0)

6.8.10   truncf%

Type: floatn × bitsn ® floatn
Accepts flags: N/A
With t flag it sets: N/A
Supported rounding modes (if IEC559): N/A
Description:
    5in
truncf = - truncf(-x,n) if x < 0

Example expression: truncf%(17.0, 3)

6.8.11   roundf%

Type: floatn × bitsn ® floatn
Accepts flags: t, z, n, and p
With t flag it sets: Sys.FloatingOverflow
Supported rounding modes (if IEC559): all
Description:
    5in
roundf = Sys.FloatingOverflow if |rnF(x, n)| > fmax

Example expression: roundf%(17.0, 3)

6.8.12   intpartf% and fractpartf%

Type: floatn ® floatn
Accepts flags: N/A
With t flag it sets: N/A
Supported rounding modes (if IEC559): N/A
Description:
    5in
intpartf    
fractpartf

Example expression: intpartf%(17.0, 3)

7   Further Work

This is the TO DO list. It lists all the open issues and the stuff that remains to be added to this manual.

DATE: Mon Apr 20 11:58:56 BST 1998


Expressions and constants

Preprocessing

C's preprocessing directives can be used in C-- (it would be easy to do, the C-- compiler just passes the C preprocessor to the C-- program before compiling it). It would help for offset-calculation expressions in architectures where the type of an integer is say word32 and the type of a label is word64, and lots of typecast have to be done (since casting is not automatically done), for offsets, in expressions like:

        word32[label+ word64(3)] = 43;
where the integer is typecast, we could have:

        #ifdef ... /* pointer size == integer size == 8*/
        #then 
          #define CAST(x) (x)
        #else      /* pointer size != integer size == 4 */
          #define CAST(x) word64((x))

        word32[label+CAST(3)] = 43;
to avoid to modify the C-- code anytime we want to port the code. For more complicated expressions rewriting will be tedious.

Statements


1
UTF-8 is an encoding of Unicode characters into 8-bit ASCII characters that does not use any of the ASCII control characters to perform the coding. Unicode is an abbreviation for Universal Multiple-Octet Coded Character Set (UCS), and it is defined in ISO/IEC 10646. It is an international standard for encoding computer character sets that differs from historical ASCII. UTF-8 stands for Universal Transformation Format, 8-Bit form. See http://www.unicode.org/unicode/standard/utf8.html for more information.
2
That is, bits integers or characters. See Figure 1.
3
Sys.Indicators may be viewed as a global variable, but indeed, it is the only possible global variable in a C-- program.

This document was translated from LATEX by HEVEA.