COMP 150-AVS, Fall 2018

Main project due: Mon, Dec 10 @ 11:59pm

- Dec 3 - add assumption that the function returns a value of type integer

In this project, you will implement the CEGIS (counterexample-guided inductive synthesis) algorithm for RubeVM bytecode. You will use your (or a) SAT sovler implementation from Project 1.

The project is divided into two parts. The first part is to implement a verifier, based on your symbolic executor from the previous project. The verifier can be used independently, and we'll also use it as part of synthesis in the second part of the project.

More specifically, once you compile the code
skeleton, you'll be left with an executable `main.byte`
that takes the input RubeVM file as an argument. That file must
include **exactly two function definitions**: a definition of a
function `harness`

, which is the specification, and a
definition of a function `main`

. (Currently, other
functions can be included without raising an error, but they are
ignored.) There are two modes to run the executable in:

`./main/byte -m verif r1.rubevm`

- Verify that`harness`

and`main`

are equivalent programs.`./main.byte -m synth r1.rubevm`

- Synthesize a solution for the holes in`main`

so the resulting function is equivalent to`harness`

.

As usual, we've made simplifications to the RubeVM bytecode format, which we'll discuss below. The most significant changes are that there is no heap and there are no function calls.

The first part of the project is to develop a function
`verify`

, in `synthrube.ml`

, with the following
type signature:

type verif_ret = [ `Eq | `Neq of int * int * int ] let verif (h:fn) (m:fn):verify_retThe input to

`verif`

is the harness `h`

and the
main function `m`

. The function `verif`

checks
that for all inputs, the two functions are equivalent.
To avoid the need for too much bookkeeping, we will define the inputs
to both functions as the global variables `a`

,
`b`

, and `c`

. The functions access their inputs
using the `I_rd_glob`

instruction:

I_rd_glob rn, `L_Id "a" # load input a into rn I_rd_glob rn, `L_Id "b" # load input b into rn I_rd_glob rn, `L_Id "c" # load input c into rn

These instructions don't necessarily occur in sequence; they can
occur anywhere in the program. And not every input needs to be
referend, e.g., a function could choose to only access input
`b`

. Note that there is no *writing* to global
variables, so the values of these variables are fixed for all
time. You may assume that no other global variables are accessed.

The `verify`

function either returns ``Eq`

if
the two functions are equivalent for all values of `a`

,
`b`

, and `c`

, or it returns ```
`Neq (na, nb,
nc)
```

, where `na`

, `nb`

, and
`nc`

are assignments to `a`

, `b`

, and
`c`

, respectively, for which the two functions are not
equivalent. **You can assume that the two functions return a value of
type integer** (which could, of course, be a symbolic
expression).e

As with the previous project, **all integers will be 3 bits**,
including the integers represented by `a`

, `b`

, and
`c`

, and arithmetic is modular, i.e., overflow is
possible.

You should most likely implement this part of the project by reusing the code from the previous project, or by using the TA's code. As with that project, in some cases operations will not be available on symbolic variables, in which case your implementation can do anything.

Most likely, you'll change the previous code so that, on startup,
it creates 3-bit bitvectors of unknowns for `a`

,
`b`

, and `c`

. Then when the user calls
`rd_glob`

, those bitvectors will be stored in the indicated
register.

The second part of the project is to develop a function
`synth`

, also in `synthrube.ml`

, with the following
type signature:

type synth_ret = [ `Unsat | `Sat of int * int * int ] let synth (h:fn) (m:fn):synth_ret = `UnsatThe input to

`synth`

is the harness `h`

and the
main function `m`

. The function `synth`

finds
values for the holes in `m`

such that `h`

and
`m`

are equivalent for all inputs, meaning again, for all
choices of `a`

, `b`

, and `c`

.
To keep things simpler, there are exactly three holes,
`h1`

, `h2`

, and `h3`

, each of which
represent unknown, 3-bit integers. The hole `h1`

is introduced into the
program through the instruction `I_const rn h1`

, and
similarly for the other two holes.

The return value of `synth`

is either ``Unsat`

,
meaning there is no choice of holes that will make the two programs
equivalent, or ``Sat (n1, n2, n3)`

, meaning that setting
holes `hi`

equal to `ni`

is a valid solution.

A function need not use all three holes; it could use a subset. Your implementation may do anything if the programmer uses more than the specified three holes. Each time a hole is mentioned, it should be treated as having the same value.

**Your synthesizer must use the CEGIS loop.** That is, your
synthesis should operate as follows:

- Pick an initial choice of
`a`

,`b`

, and`c`

. - Synthesize holes that produce equivalent programs for that choice.
- Check if the resulting
`main`

is equivalent to`harness`

- If so, exit. If not, use the counterexample as another input to the synthesis process, and repeat.

Note: Philosophically, it's not clear whether it makes sense to use CEGIS with a brute force solver. It may be possible that the problem with all three holes and all three globals is small enough to solve on its own, at least with an efficient SAT solver. On the other hand, notice that the verification and synthesis problems under CEGIS have only 9 bits of unknowns, whereas the whole problem has 18 bits of unknowns, which might take a while to solve.

There is no requirement to share test cases for this project, but you are welcome to do so on Piazza.