COMP 150-AVS, Fall 2018
Main project due: Mon, Dec 10 @ 11:59pm
In this project, you will implement the CEGIS (counterexample-guided inductive synthesis) algorithm for RubeVM bytecode. You will use your (or a) SAT sovler implementation from Project 1.
The project is divided into two parts. The first part is to implement a verifier, based on your symbolic executor from the previous project. The verifier can be used independently, and we'll also use it as part of synthesis in the second part of the project.
More specifically, once you compile the code
skeleton, you'll be left with an executable main.byte
that takes the input RubeVM file as an argument. That file must
include exactly two function definitions: a definition of a
function harness
, which is the specification, and a
definition of a function main
. (Currently, other
functions can be included without raising an error, but they are
ignored.) There are two modes to run the executable in:
./main/byte -m verif r1.rubevm
- Verify that
harness
and main
are equivalent programs../main.byte -m synth r1.rubevm
-
Synthesize a solution for the holes in main
so the
resulting function is equivalent to harness
.
As usual, we've made simplifications to the RubeVM bytecode format, which we'll discuss below. The most significant changes are that there is no heap and there are no function calls.
The first part of the project is to develop a function
verify
, in synthrube.ml
, with the following
type signature:
type verif_ret = [ `Eq | `Neq of int * int * int ] let verif (h:fn) (m:fn):verify_retThe input to
verif
is the harness h
and the
main function m
. The function verif
checks
that for all inputs, the two functions are equivalent.
To avoid the need for too much bookkeeping, we will define the inputs
to both functions as the global variables a
,
b
, and c
. The functions access their inputs
using the I_rd_glob
instruction:
I_rd_glob rn, `L_Id "a" # load input a into rn I_rd_glob rn, `L_Id "b" # load input b into rn I_rd_glob rn, `L_Id "c" # load input c into rn
These instructions don't necessarily occur in sequence; they can
occur anywhere in the program. And not every input needs to be
referend, e.g., a function could choose to only access input
b
. Note that there is no writing to global
variables, so the values of these variables are fixed for all
time. You may assume that no other global variables are accessed.
The verify
function either returns `Eq
if
the two functions are equivalent for all values of a
,
b
, and c
, or it returns `Neq (na, nb,
nc)
, where na
, nb
, and
nc
are assignments to a
, b
, and
c
, respectively, for which the two functions are not
equivalent. You can assume that the two functions return a value of
type integer (which could, of course, be a symbolic
expression).e
As with the previous project, all integers will be 3 bits,
including the integers represented by a
, b
, and
c
, and arithmetic is modular, i.e., overflow is
possible.
You should most likely implement this part of the project by reusing the code from the previous project, or by using the TA's code. As with that project, in some cases operations will not be available on symbolic variables, in which case your implementation can do anything.
Most likely, you'll change the previous code so that, on startup,
it creates 3-bit bitvectors of unknowns for a
,
b
, and c
. Then when the user calls
rd_glob
, those bitvectors will be stored in the indicated
register.
The second part of the project is to develop a function
synth
, also in synthrube.ml
, with the following
type signature:
type synth_ret = [ `Unsat | `Sat of int * int * int ] let synth (h:fn) (m:fn):synth_ret = `UnsatThe input to
synth
is the harness h
and the
main function m
. The function synth
finds
values for the holes in m
such that h
and
m
are equivalent for all inputs, meaning again, for all
choices of a
, b
, and c
.
To keep things simpler, there are exactly three holes,
h1
, h2
, and h3
, each of which
represent unknown, 3-bit integers. The hole h1
is introduced into the
program through the instruction I_const rn h1
, and
similarly for the other two holes.
The return value of synth
is either `Unsat
,
meaning there is no choice of holes that will make the two programs
equivalent, or `Sat (n1, n2, n3)
, meaning that setting
holes hi
equal to ni
is a valid solution.
A function need not use all three holes; it could use a subset. Your implementation may do anything if the programmer uses more than the specified three holes. Each time a hole is mentioned, it should be treated as having the same value.
Your synthesizer must use the CEGIS loop. That is, your synthesis should operate as follows:
a
, b
, and
c
.main
is equivalent to
harness
Note: Philosophically, it's not clear whether it makes sense to use CEGIS with a brute force solver. It may be possible that the problem with all three holes and all three globals is small enough to solve on its own, at least with an efficient SAT solver. On the other hand, notice that the verification and synthesis problems under CEGIS have only 9 bits of unknowns, whereas the whole problem has 18 bits of unknowns, which might take a while to solve.
There is no requirement to share test cases for this project, but you are welcome to do so on Piazza.