COMP 150-AVS, Fall 2018
Test case due: Mon, Nov 5 @ 11:59pm
Main project due: Thu, Nov 15 @ 11:59pm
addbvec_of_int calls should be followed by truncating to 3 bits.
In this project, you will implement a symbolic executor for RubyVM bytecode. You will use your SAT sovler implementation from Project 1.
To simplify the project, we've made some changes to the RubyVM
bytecode format. Specifically, we've moved several instructions to
simplify the project, and we've added two new instructions to support
symbolic execution. You can find the changes in
and we'll talk about them more below. One major change is that we've
eliminated the heap.
Once you compile the code skeleton, you'll be left with an executable main.byte that takes the input RubeVM file as an argument, plus two other arguments: the scheduler that decides the order of path exploration, and the maximum number of steps to take during symbolic execution. For example:
./main/byte r1.rubevm- Parse and print out bytecode
./main.byte -sch bfs -steps 42 r1.rubevm- Run symbolic execution using breadth-first search for 42 steps total and display results.
./main.byte -sch dfs -steps 42 r1.rubevm- As above, but using depth-first search.
The result of symbolic execution is a series of path
conditions, one for each path explored in the program. The path
conditions either describe a full path, from the start of the
main function until it returns, or a path that starts at
main and ends in an assertion violation.
Since symbolic execution typically could run for longer than we have time for, we've added a step count, which is the maximum number of instructions to be executed summed up across all paths.
Unlike the other projects, this one doesn't break down into sub-parts very well. Here is what you'll need to do.
This project will build on project 1, so you'll need to put your
Like in Project 2, as
we execute programs we'll keep track of the stack. However, the stack
might now include symbolic expressions that range over integers, i.e.,
symbolic expressions that are bitvectors. Thus, we've extended the
instr.ml so that it now
`L_Sym of Bvec.bvec. This means that the
regs could map registers to bitvectors.
We've introduced a new instruction,
int_sym r, s,
that creates a fresh, new integer bitvector named after the string
s and stores the bitvector in register
r. For purposes of this project, all bitvectors will
have length 3. The names of bitvector variables come from the name
int_sym appended with, (1) an underscore, then
(2) an integer that starts with 0 and increments for each time that
name is used, then (3) another underscore, then (4) the bit position.
For example, the comments show the bitvectors stored in the registers
for the following sequence of instructions:
int_sym r0, foo # r0 contains `L_Sym [EVar "foo_0_0"; EVar "foo_0_1"; EVar "foo_0_2"] int_sym r1, bar # r1 contains `L_Sym [EVar "bar_0_0"; EVar "bar_0_1"; EVar "bar_0_2"] int_sym r2, foo # r2 contains `L_Sym [EVar "foo_1_0"; EVar "foo_1_1"; EVar "foo_1_2"]Then "computing" with bitvectors will use the functions you implemented in Project 1:
add r3, r0, r1 # r3 contains `L_Sym [something, something, something]where the
somethings are the boolean formulae representing the corresponding bits computed by
Bvec.add. Note that to keep everything to 3 bits, you will need to truncate the result of addition. (This means you can also effectively perform subtraction by overflowing 3 bits during addition!)
Note that the only way of introducing symbolic variables into the
program is through the
Important: For "performance," regular integers and bitvector
integers will co-exist in this project. For example:
int_sym r0, foo # r0 contains `L_Sym [EVar "foo_0_0"; EVar "foo_0_1"; EVar "foo_0_2"] const r1, 3 # r1 contains `L_Int 3 const r2, "bar" # r2 contains `L_Str "bar"
This means that some instructions will be executed just like in
Project 2, i.e., using concrete values, and other instructions will be
Bvec. In some
cases, there will be an instruction where one operand is symmbolic and
one is concrete, e.g.,
int_sym r0, foo const r1, 3 add r2, r0, r1 i_is_int r3, r2 # stores 1 in r3In this case, your symbolic executor will need to encode
3as a bitvector (using
Bvec.bvec_of_intand truncating to 3 bits as needed) and then use
Bvec.addto perform the addition.
In some cases, your symbolic executor will encounter an operation on symbolic expressions for which we have no support in Project 1, such as
int_sym r0, foo int_sym r1, bar sub r2, r0, r1 # error!In this case, your symbolic executor should raise an exception (any exception) and abort. We won't test this case in grading.
You can also assume that foreign functions will only be called with
concrete values (i.e., no
Recall from Project 2 that as we executed a RubeVM program in the
interpreter, we maintained a
config keeping track of the
program state. In symbolic execution, we also need to keep track of
the path condition, which records the branches taken so
far. We'll use
Boolean.bexpr to represent path
conditions. Thus, the type
config in this project is now
stack * Boolean.bexpr.
Path conditions will be used in three places in this project:
Boolean.satto see if the branch might be taken, not taken, or both.
assert_zero r, check if register
ris guaranteed to always be zero under the current path condition.
Similarly to project 2, we'll implement a function
prog -> config -> inst_ret that takes a single step. In the
normal case, it will return
`Normal config. In other words,
given a program and a (stack, path condition) pair,
executes the current instruction and returns a new (stack, path
condition) pair. The path condition will only be changed at
if_zero instructions, following the symbolic execution
logic we saw in class.
If the instruction is an
assert_zero, and that
instruction fails (i.e., the assertion is not implied by the path
run_instr returns a
`Failure result containing the current path condition.
If the instruction does anything else bad, like try to add an integer
and a string,
run_instr should throw an exception (any
Hint: One of the slightly tedious parts of
run_instr will be handling the case when a regular
integer and a symbolic expression are used in the same instruction.
As we saw in class, symbolic execution might potentially fork when reaching a branch that could go either way. Once there is more than one possible execution, the policy for which one to execute next is a key design decision in symbolic execution.
In this project, we will represent this choice by implementing two schedulers. We will maintain the set of possible configurations as a list, with the invariant that the next configuration to execute is at the front of the list.
A scheduler has type
config * config option * config list ->
config list. Its input is one or two configurations to add to
the given configuration list. Its output is a new configuration list
ordered appropriately so that visiting the first configuration on the
list should happen next.
For this project, you should implement two schedulers. Imagine that the execution tree (as we saw in class) in a complete tree, e.g.:
a / \ b c /\ /\ d e f gThen the scheduler
sch_bfsshould perform breadth-first traversal, i.e., it should visit the nodes in the order
a, b, c, d, e, f, g. The schedule
sch_dfs. should perform depth-first traversal, i.e., it should visit the nodes in the order
a, b, d, e, c, f, g.
We won't use automated tests for this part of the project. We'll grade
it by hand. So don't worry about the exact order, e.g., it's okay if
(especially because we haven't said which is the true branch and which
is the false branch).
Finally, you'll implement function
(sch:scheduler) (max_steps:int):prog_ret to symbolically
execute a program. The first two arguments are obvious. The last
argument is the maximum number of instructions that should be
executed, i.e., the maximum number of times
should be called. Once this count has been reached,
run_prog will return even if symbolic execution hasn't
explored all paths. Of course, it's also possible for symbolic
execution to finish before reaching this count.
run_prog function returns a list of results for each
explored path. Results consist of the following:
`Full (v, path)indicates that execution reached the return statement of
main, which returned value
vand had path condition
`Partial pathindicates that symbolic execution was stopped (because max step count was reached) before this path was finished, and the path condition accumulated so far was
`Failure pathindicates that symbolic execution reached a failing assertion, and the path condition was
The exact ordering of these results is undefined.
Hint: Be careful with imperative data structures, particularly the register file. If the symbolic exector forks execution, be sure not to share register files across different paths, otherwise surprising and incorrect things will happen.
As with previous projects, you must develop at least one test case by the date specified at the top of the project description, and post it on Piazza. Your test case should consist of (a) RubeVM code, written in RubeVM assembly language accepted by the parser, and (b) the output of symbolic execution on that code.
Note: You don't need to have a working implementation to write the test case. In fact, we want you to do this on paper so you can use it to double-check your implementation.
Don't worry about the exact output format for your test
case, but if you want to get the details exactly right, you can take a