COMP 150-AVS, Fall 2018

Test case (part 4) due: Wed, Oct 10 @ 11:59pm

Main project due: Due Wed, Oct 17 @ 11:59pm

- Oct 9 - Clarified that you only need to implement two analyses (avaiable expressions and very busy expressions).
- Oct 7 - Added key argument to last several
`Bvset`

functions.

In this project, you will implement a general data flow analysis engine for RubeVM bytecode, and use it to build two of the data flow analyses we discussed in class: available expressions and very busy expressions.

To simplify the project, we have made some changes to the RubeVM
bytecode format: (1) Instructions are no longer contained in
functions. Instead, a program consists simply of a single list of
instructions. (2) We have removed several instructions; see
`instr.ml`

for details (the removed instructions are
commented out).

Once you compile the code skeleton, you'll
be left with an executable `main.byte` that takes the input
RubeVM file as an argument, plus an optional analysis name:

`./main/byte r1.rubevm`

- Parse and print out bytecode`./main.byte -analysis ae r1.rubevm`

- Run available expressions on bytecode and display results`./main.byte -analysis vbe r1.rubevm`

- Run very busy expressions on bytecode and display results

In the project, we will represent sets of dataflow facts using
bitvectors. The bitvector representation of sets admits all the usual
operations, but has one slight twist: we need to pre-identify the
*range* of the set, i.e., the possible values that may go in the
set, so that we know how large a bitvector to create.

The file `bvset.mli`

contains a module signature for
bitvector sets (note that none of these functions should have
externally visible side-effects):

type 'a bvset_key (* the key indicating which bits stand for what *) (* Make the key for a bit vector set *) val list_to_key : 'a list -> 'a bvset_key (* Return a list of all members of the set. *) val to_list : 'a bvset_key -> int -> 'a list (* Return an empty set that ranges over the elements in the list. *) val mkempty : 'a bvset_key -> int (* Add an element to a set, or do nothing if elt already present. Raises an exception if elt not in range of set. *) val insert : 'a bvset_key -> 'a -> int -> int (* Remove an element to a set, or do nothing if elt not present. Raises an exception if elt not in range of set. *) val remove : 'a bvset_key -> 'a -> int -> int (* Return true iff element in set. Raises an exception if elt not in range of set. *) val mem : 'a bvset_key -> 'a -> int -> bool (* Set union. *) val union : 'a bvset_key -> int -> int -> int (* Set intersection. *) val inter : 'a bvset_key -> int -> int -> int (* diff a b returns a - b. *) val diff : 'a bvset_key -> int -> int -> int (* negate a returns 1 - a, where 1 is the set of all elements. Thus, negate (mkempty l) returns top. *) val negate : 'a bvset_key -> int -> int (* Return true iff sets are equal. *) val equal : 'a bvset_key -> int -> int -> bool

For this part of the problem, you must implement this interface in
`bvset.ml`

. Recall that in bitvectors, each bit of an
integer represents the presence or absence of a set member. Thus,
somehow we need to keep track of which bits map to what. Inside of
`bvset.ml`

, we've given you a definition ```
type 'a
bvset_key = ('a, int) Hashtbl.t * (int, 'a) Hashtbl.t
```

, which
represents this mapping (You may not change this type.) Thus, a
*key* for a bitvector is a pair *(elt-to-int, int-to-elt)*,
where *elt-to-int* is a hash table mapping each possible set
member to its index in the bitvector, and *int-to-elt* is the
inverse mapping.

For purposes of the project, bit-vectors are single integers, and you can assume that 63 bits is enough for anyone.

For example, if `m`

is a bitvector key `'a'`

to position `0`

, `'b'`

to `1`

, and
`'c'`

to `2`

, then under this key,
`1`

represents the set {`'a'`

} and
`5`

represents the set {`'a'`

,
`'c'`

}.

Note that for the last several functions (`union`

through `equal`

, you may or may not need to use the key
argument; but you probably need to use it for at least one of
{`negate`

, `equal`

}.

In class, we learned about several gen/kill dataflow problems. For this part of the project, you will implement gen/kill set computations for two of these analysis: available expressions and very busy expressions.

For these analyses, we will use the following representation of an expression, which is essentially the "right-hand side" of the bytecode instructions that compute results and store them in registers:

type kind_uop = U_is_int | U_is_str type kind_binop = B_add | B_sub | B_mul | B_div | B_eq | B_lt | B_leq type expr = | Uop of kind_uop * reg | Binop of kind_binop * reg * reg (* op, src, src *)

To get started, we'll need to compute the expressions that occur in
a RubeVM program. Implement the following function in
`dfa.ml`

:

`exprs_of_prog (instrs:instr array):expr list`

- given a program, return all the expressions in the program.

Once we know all the expressions in the program, we can use
`Bvset.list_to_key`

to turn the list into a key for bitvector
sets.Then given that key, we can compute the gen/kill sets of every
statement in a RubeVM program. We'll define a type to represent the
gen/kill bitvector sets:

type gen_kill = { gen : int; kill : int }

(Of course, to interpret this type, we'll need to know what the key
is; but we'll assume we always have the key hanging around somewhere.)
Now, you should implement the following two functions in
`dfa.ml`

:

`avail_exprs (instrs:instr array) (k:expr bvset_key):gen_kill array`

- Return an array`a`

the same size as`instrs`

such that index`a.(i)`

contains the gen/kill set for`instrs.(i)`

for available expressions`very_busy_exprs`

- same as above, but for very busy expressions.

Hint: You'll probably need a list or set of all expressions in the
program. You can either get that by running `exprs_of_prog`

again or by using `Bvset.to_list`

on `max_int`

.

Finally, you will implement the dataflow analysis algorithm. Recall that a dataflow analysis is defined by its direction (forward or backward); whether it uses may (union) or must (intersection) at join points; its gen/kill sets; the sets of dataflow facts assumed at the program entry (for forward analysis) or program exit (for backward analysis); and "top," the initial sets of dataflow facts at all other program points. We define the following types to represent the direction and may/must choices:

type dir_type = | D_Forward | D_Backward type may_must_type = | K_May | K_Must

Then, for this part of the project, implement the following functions:

`pred : (instrs:instr array) -> (i:int) -> int list`

- return the predecessors of`i`

in the program.`succ : (instrs:instr array) -> (i:int) -> int list`

- return the successors of`i`

in the programm.

and then implement:

dfa (instrs : instr array) (key : 'a bvset_key) (dir : dir_type) (may_must : may_must_type) (gen_kill : instr array -> 'a bvset_key -> gen_kill array) (entry_or_exit_facts : int) (top : int):int array =

The input to `dfa`

is, in order: the program; a key for
bitvectors of dataflow facts for the program (which for this project
will be derived from `exprs_of_prog`

); the direction;
may/must; a function for computing the gen/kill sets for every program
statement (which you implemented in part 2); the facts that hold at
the entry (for available expressions) or exit (for very busy
expressions); and top.

The return value of `dfa`

is an array `r`

that is the same
size as `instrs`

such that `r.(i)`

is the set of
dataflow facts at the *out* of statement `i`

for a
forward analysis, and at the *in* of statement `i`

for
a backward analysis.

We've already given you code in `main.ml`

that calls
`dfa`

, so once you've written the function you can run it
as described at the very top of the project description.

One of the hardest parts of this project is figuring out if the answers you're getting from the dataflow analysis are correct. The challenge is that the dataflow analysis is making various approximations (e.g., at a join points), so you really have to walk through the algorithm somewhat painstakingly to check your answers. We strongly suggest that you write a series of test cases that exercise each step: Try a program with just a single statement; try one with a couple of statements; try one with a couple of statements and a join point; try one with a couple of statements and loop; etc.

To help make things easier on everyone, you must develop at least one test case by the date specified at the top of the project description, and post it on Piazza. Your test case should consist of (a) RubeVM code, written in RubeVM assembly language accepted by the parser; (b) The output of available expressions analysis for the code, and (c) The output of very busy expressions analysis for the code.

Note: You don't need to have a working implementation to write the test case. In fact, we want you to do this on paper so you can use it to double-check your implementation.

Don't worry about the exact format of expressions for your test
case, but if you want to get the details exactly right, you can take a
look at `output_expr`

in `dfa.ml`

.