COMP 150-AVS, Fall 2018
Test case due: Mon, Nov 5 @ 11:59pm
Main project due: Thu, Nov 15 @ 11:59pm
add
bvec_of_int calls should be followed by truncating to 3 bits.
`Reg
to `Full
for
prog_ret
and `Normal
for
inst_ret
.In this project, you will implement a symbolic executor for RubyVM bytecode. You will use your SAT sovler implementation from Project 1.
To simplify the project, we've made some changes to the RubyVM
bytecode format. Specifically, we've moved several instructions to
simplify the project, and we've added two new instructions to support
symbolic execution. You can find the changes in instr.ml
,
and we'll talk about them more below. One major change is that we've
eliminated the heap.
Once you compile the code skeleton, you'll be left with an executable main.byte that takes the input RubeVM file as an argument, plus two other arguments: the scheduler that decides the order of path exploration, and the maximum number of steps to take during symbolic execution. For example:
./main/byte r1.rubevm
- Parse and print out
bytecode./main.byte -sch bfs -steps 42 r1.rubevm
- Run
symbolic execution using breadth-first search for 42 steps total and
display results../main.byte -sch dfs -steps 42 r1.rubevm
- As
above, but using depth-first search.
The result of symbolic execution is a series of path
conditions, one for each path explored in the program. The path
conditions either describe a full path, from the start of the
main
function until it returns, or a path that starts at
main
and ends in an assertion violation.
Since symbolic execution typically could run for longer than we have time for, we've added a step count, which is the maximum number of instructions to be executed summed up across all paths.
Unlike the other projects, this one doesn't break down into sub-parts very well. Here is what you'll need to do.
This project will build on project 1, so you'll need to put your
solutions to boolean.ml
and bvec.ml
into
this project.
Like in Project 2, as
we execute programs we'll keep track of the stack. However, the stack
might now include symbolic expressions that range over integers, i.e.,
symbolic expressions that are bitvectors. Thus, we've extended the
type value
in instr.ml
so that it now
includes `L_Sym of Bvec.bvec
. This means that the
register file regs
could map registers to bitvectors.
We've introduced a new instruction, int_sym r, s
,
that creates a fresh, new integer bitvector named after the string
s
and stores the bitvector in register
r
. For purposes of this project, all bitvectors will
have length 3. The names of bitvector variables come from the name
given in int_sym
appended with, (1) an underscore, then
(2) an integer that starts with 0 and increments for each time that
name is used, then (3) another underscore, then (4) the bit position.
For example, the comments show the bitvectors stored in the registers
for the following sequence of instructions:
int_sym r0, foo # r0 contains `L_Sym [EVar "foo_0_0"; EVar "foo_0_1"; EVar "foo_0_2"] int_sym r1, bar # r1 contains `L_Sym [EVar "bar_0_0"; EVar "bar_0_1"; EVar "bar_0_2"] int_sym r2, foo # r2 contains `L_Sym [EVar "foo_1_0"; EVar "foo_1_1"; EVar "foo_1_2"]Then "computing" with bitvectors will use the functions you implemented in Project 1:
add r3, r0, r1 # r3 contains `L_Sym [something, something, something]where the
something
s are the boolean formulae
representing the corresponding bits computed
by Bvec.add
. Note that to keep everything to 3 bits, you
will need to truncate the result of addition. (This means you can also
effectively perform subtraction by overflowing 3 bits during addition!)
Note that the only way of introducing symbolic variables into the
program is through the int_sym
instruction.
Important: For "performance," regular integers and bitvector
integers will co-exist in this project. For example:
int_sym r0, foo # r0 contains `L_Sym [EVar "foo_0_0"; EVar "foo_0_1"; EVar "foo_0_2"] const r1, 3 # r1 contains `L_Int 3 const r2, "bar" # r2 contains `L_Str "bar"
This means that some instructions will be executed just like in
Project 2, i.e., using concrete values, and other instructions will be
executed using Boolean
and Bvec
. In some
cases, there will be an instruction where one operand is symmbolic and
one is concrete, e.g.,
int_sym r0, foo const r1, 3 add r2, r0, r1 i_is_int r3, r2 # stores 1 in r3In this case, your symbolic executor will need to encode
3
as a bitvector
(using Bvec.bvec_of_int
and truncating to 3 bits as needed) and then
use Bvec.add
to perform the addition.
In some cases, your symbolic executor will encounter an operation on symbolic expressions for which we have no support in Project 1, such as
int_sym r0, foo int_sym r1, bar sub r2, r0, r1 # error!In this case, your symbolic executor should raise an exception (any exception) and abort. We won't test this case in grading.
You can also assume that foreign functions will only be called with
concrete values (i.e., no bvec
argumments).
Recall from Project 2 that as we executed a RubeVM program in the
interpreter, we maintained a config
keeping track of the
program state. In symbolic execution, we also need to keep track of
the path condition, which records the branches taken so
far. We'll use Boolean.bexpr
to represent path
conditions. Thus, the type config
in this project is now
stack * Boolean.bexpr
.
Path conditions will be used in three places in this project:
if_zero
instruction, use
Boolean.sat
to see if the branch might be taken, not
taken, or both.assert_zero r
, check if
register r
is guaranteed to always be zero under the
current path condition.
Similarly to project 2, we'll implement a function run_instr :
prog -> config -> inst_ret
that takes a single step. In the
normal case, it will return `Normal config
. In other words,
given a program and a (stack, path condition) pair, run_instr
executes the current instruction and returns a new (stack, path
condition) pair. The path condition will only be changed at
if_zero
instructions, following the symbolic execution
logic we saw in class.
If the instruction is an assert_zero
, and that
instruction fails (i.e., the assertion is not implied by the path
condition), then run_instr
returns a
`Failure
result containing the current path condition.
If the instruction does anything else bad, like try to add an integer
and a string, run_instr
should throw an exception (any
exception).
Hint: One of the slightly tedious parts of
run_instr
will be handling the case when a regular
integer and a symbolic expression are used in the same instruction.
As we saw in class, symbolic execution might potentially fork when reaching a branch that could go either way. Once there is more than one possible execution, the policy for which one to execute next is a key design decision in symbolic execution.
In this project, we will represent this choice by implementing two schedulers. We will maintain the set of possible configurations as a list, with the invariant that the next configuration to execute is at the front of the list.
A scheduler has type config * config option * config list ->
config list
. Its input is one or two configurations to add to
the given configuration list. Its output is a new configuration list
ordered appropriately so that visiting the first configuration on the
list should happen next.
For this project, you should implement two schedulers. Imagine that the execution tree (as we saw in class) in a complete tree, e.g.:
a / \ b c /\ /\ d e f gThen the scheduler
sch_bfs
should perform
breadth-first traversal, i.e., it should visit the nodes in the order
a, b, c, d, e, f, g
. The schedule
sch_dfs
. should perform depth-first traversal, i.e., it
should visit the nodes in the order a, b, d, e, c, f, g
.
We won't use automated tests for this part of the project. We'll grade
it by hand. So don't worry about the exact order, e.g., it's okay if
sch_bfs
visits c
before b
(especially because we haven't said which is the true branch and which
is the false branch).
Finally, you'll implement function run_prog (p:prog)
(sch:scheduler) (max_steps:int):prog_ret
to symbolically
execute a program. The first two arguments are obvious. The last
argument is the maximum number of instructions that should be
executed, i.e., the maximum number of times run_instr
should be called. Once this count has been reached,
run_prog
will return even if symbolic execution hasn't
explored all paths. Of course, it's also possible for symbolic
execution to finish before reaching this count.
The run_prog
function returns a list of results for each
explored path. Results consist of the following:
`Full (v, path)
indicates that execution reached the
return statement of main
, which returned value
v
and had path condition path
.`Partial path
indicates that symbolic execution was
stopped (because max step count was reached) before this path was
finished, and the path condition accumulated so far was
path
.`Failure path
indicates that symbolic execution
reached a failing assertion, and the path condition was
path
.The exact ordering of these results is undefined.
Hint: Be careful with imperative data structures, particularly the register file. If the symbolic exector forks execution, be sure not to share register files across different paths, otherwise surprising and incorrect things will happen.
As with previous projects, you must develop at least one test case by the date specified at the top of the project description, and post it on Piazza. Your test case should consist of (a) RubeVM code, written in RubeVM assembly language accepted by the parser, and (b) the output of symbolic execution on that code.
Note: You don't need to have a working implementation to write the test case. In fact, we want you to do this on paper so you can use it to double-check your implementation.
Don't worry about the exact output format for your test
case, but if you want to get the details exactly right, you can take a
look in main.ml
.