COMP 150-AVS, Fall 2018
Test case (part 4) due: Wed, Oct 10 @ 11:59pm
Main project due: Due Wed, Oct 17 @ 11:59pm
Bvset
functions.In this project, you will implement a general data flow analysis engine for RubeVM bytecode, and use it to build two of the data flow analyses we discussed in class: available expressions and very busy expressions.
To simplify the project, we have made some changes to the RubeVM
bytecode format: (1) Instructions are no longer contained in
functions. Instead, a program consists simply of a single list of
instructions. (2) We have removed several instructions; see
instr.ml
for details (the removed instructions are
commented out).
Once you compile the code skeleton, you'll be left with an executable main.byte that takes the input RubeVM file as an argument, plus an optional analysis name:
./main/byte r1.rubevm
- Parse and print out
bytecode./main.byte -analysis ae r1.rubevm
- Run available
expressions on bytecode and display results./main.byte -analysis vbe r1.rubevm
- Run very busy
expressions on bytecode and display resultsIn the project, we will represent sets of dataflow facts using bitvectors. The bitvector representation of sets admits all the usual operations, but has one slight twist: we need to pre-identify the range of the set, i.e., the possible values that may go in the set, so that we know how large a bitvector to create.
The file bvset.mli
contains a module signature for
bitvector sets (note that none of these functions should have
externally visible side-effects):
type 'a bvset_key (* the key indicating which bits stand for what *) (* Make the key for a bit vector set *) val list_to_key : 'a list -> 'a bvset_key (* Return a list of all members of the set. *) val to_list : 'a bvset_key -> int -> 'a list (* Return an empty set that ranges over the elements in the list. *) val mkempty : 'a bvset_key -> int (* Add an element to a set, or do nothing if elt already present. Raises an exception if elt not in range of set. *) val insert : 'a bvset_key -> 'a -> int -> int (* Remove an element to a set, or do nothing if elt not present. Raises an exception if elt not in range of set. *) val remove : 'a bvset_key -> 'a -> int -> int (* Return true iff element in set. Raises an exception if elt not in range of set. *) val mem : 'a bvset_key -> 'a -> int -> bool (* Set union. *) val union : 'a bvset_key -> int -> int -> int (* Set intersection. *) val inter : 'a bvset_key -> int -> int -> int (* diff a b returns a - b. *) val diff : 'a bvset_key -> int -> int -> int (* negate a returns 1 - a, where 1 is the set of all elements. Thus, negate (mkempty l) returns top. *) val negate : 'a bvset_key -> int -> int (* Return true iff sets are equal. *) val equal : 'a bvset_key -> int -> int -> bool
For this part of the problem, you must implement this interface in
bvset.ml
. Recall that in bitvectors, each bit of an
integer represents the presence or absence of a set member. Thus,
somehow we need to keep track of which bits map to what. Inside of
bvset.ml
, we've given you a definition type 'a
bvset_key = ('a, int) Hashtbl.t * (int, 'a) Hashtbl.t
, which
represents this mapping (You may not change this type.) Thus, a
key for a bitvector is a pair (elt-to-int, int-to-elt),
where elt-to-int is a hash table mapping each possible set
member to its index in the bitvector, and int-to-elt is the
inverse mapping.
For purposes of the project, bit-vectors are single integers, and you can assume that 63 bits is enough for anyone.
For example, if m
is a bitvector key 'a'
to position 0
, 'b'
to 1
, and
'c'
to 2
, then under this key,
1
represents the set {'a'
} and
5
represents the set {'a'
,
'c'
}.
Note that for the last several functions (union
through equal
, you may or may not need to use the key
argument; but you probably need to use it for at least one of
{negate
, equal
}.
In class, we learned about several gen/kill dataflow problems. For this part of the project, you will implement gen/kill set computations for two of these analysis: available expressions and very busy expressions.
For these analyses, we will use the following representation of an expression, which is essentially the "right-hand side" of the bytecode instructions that compute results and store them in registers:
type kind_uop = U_is_int | U_is_str type kind_binop = B_add | B_sub | B_mul | B_div | B_eq | B_lt | B_leq type expr = | Uop of kind_uop * reg | Binop of kind_binop * reg * reg (* op, src, src *)
To get started, we'll need to compute the expressions that occur in
a RubeVM program. Implement the following function in
dfa.ml
:
exprs_of_prog (instrs:instr array):expr list
-
given a program, return all the expressions in the program.Once we know all the expressions in the program, we can use
Bvset.list_to_key
to turn the list into a key for bitvector
sets.Then given that key, we can compute the gen/kill sets of every
statement in a RubeVM program. We'll define a type to represent the
gen/kill bitvector sets:
type gen_kill = { gen : int; kill : int }
(Of course, to interpret this type, we'll need to know what the key
is; but we'll assume we always have the key hanging around somewhere.)
Now, you should implement the following two functions in
dfa.ml
:
avail_exprs (instrs:instr array) (k:expr
bvset_key):gen_kill array
- Return an array a
the same size as instrs
such that index
a.(i)
contains the gen/kill set for
instrs.(i)
for available expressionsvery_busy_exprs
- same as above, but for very busy
expressions.Hint: You'll probably need a list or set of all expressions in the
program. You can either get that by running exprs_of_prog
again or by using Bvset.to_list
on max_int
.
Finally, you will implement the dataflow analysis algorithm. Recall that a dataflow analysis is defined by its direction (forward or backward); whether it uses may (union) or must (intersection) at join points; its gen/kill sets; the sets of dataflow facts assumed at the program entry (for forward analysis) or program exit (for backward analysis); and "top," the initial sets of dataflow facts at all other program points. We define the following types to represent the direction and may/must choices:
type dir_type = | D_Forward | D_Backward type may_must_type = | K_May | K_Must
Then, for this part of the project, implement the following functions:
pred : (instrs:instr array) -> (i:int) -> int list
- return the
predecessors of i
in the program.succ : (instrs:instr array) -> (i:int) -> int list
- return the successors of i
in the programm.and then implement:
dfa (instrs : instr array) (key : 'a bvset_key) (dir : dir_type) (may_must : may_must_type) (gen_kill : instr array -> 'a bvset_key -> gen_kill array) (entry_or_exit_facts : int) (top : int):int array =
The input to dfa
is, in order: the program; a key for
bitvectors of dataflow facts for the program (which for this project
will be derived from exprs_of_prog
); the direction;
may/must; a function for computing the gen/kill sets for every program
statement (which you implemented in part 2); the facts that hold at
the entry (for available expressions) or exit (for very busy
expressions); and top.
The return value of dfa
is an array r
that is the same
size as instrs
such that r.(i)
is the set of
dataflow facts at the out of statement i
for a
forward analysis, and at the in of statement i
for
a backward analysis.
We've already given you code in main.ml
that calls
dfa
, so once you've written the function you can run it
as described at the very top of the project description.
One of the hardest parts of this project is figuring out if the answers you're getting from the dataflow analysis are correct. The challenge is that the dataflow analysis is making various approximations (e.g., at a join points), so you really have to walk through the algorithm somewhat painstakingly to check your answers. We strongly suggest that you write a series of test cases that exercise each step: Try a program with just a single statement; try one with a couple of statements; try one with a couple of statements and a join point; try one with a couple of statements and loop; etc.
To help make things easier on everyone, you must develop at least one test case by the date specified at the top of the project description, and post it on Piazza. Your test case should consist of (a) RubeVM code, written in RubeVM assembly language accepted by the parser; (b) The output of available expressions analysis for the code, and (c) The output of very busy expressions analysis for the code.
Note: You don't need to have a working implementation to write the test case. In fact, we want you to do this on paper so you can use it to double-check your implementation.
Don't worry about the exact format of expressions for your test
case, but if you want to get the details exactly right, you can take a
look at output_expr
in dfa.ml
.