Project 5: Program Synthesizer

COMP 150-AVS, Fall 2018

Main project due: Mon, Dec 10 @ 11:59pm

Dec 3 - add assumption that the function returns a value of type integer

In this project, you will implement the CEGIS (counterexample-guided inductive synthesis) algorithm for RubeVM bytecode. You will use your (or a) SAT sovler implementation from Project 1.

The project is divided into two parts. The first part is to implement a verifier, based on your symbolic executor from the previous project. The verifier can be used independently, and we'll also use it as part of synthesis in the second part of the project.

More specifically, once you compile the code skeleton, you'll be left with an executable main.byte that takes the input RubeVM file as an argument. That file must include exactly two function definitions: a definition of a function harness, which is the specification, and a definition of a function main. (Currently, other functions can be included without raising an error, but they are ignored.) There are two modes to run the executable in:

./main/byte -m verif r1.rubevm - Verify that harness and main are equivalent programs.
./main.byte -m synth r1.rubevm - Synthesize a solution for the holes in main so the resulting function is equivalent to harness.

As usual, we've made simplifications to the RubeVM bytecode format, which we'll discuss below. The most significant changes are that there is no heap and there are no function calls.

The first part of the project is to develop a function verify, in synthrube.ml, with the following type signature:

type verif_ret = [ `Eq | `Neq of int * int * int ]
let verif (h:fn) (m:fn):verify_ret

The input to verif is the harness h and the main function m. The function verif checks that for all inputs, the two functions are equivalent.

To avoid the need for too much bookkeeping, we will define the inputs to both functions as the global variables a, b, and c. The functions access their inputs using the I_rd_glob instruction:

I_rd_glob rn, `L_Id "a"  # load input a into rn
I_rd_glob rn, `L_Id "b"  # load input b into rn
I_rd_glob rn, `L_Id "c"  # load input c into rn

These instructions don't necessarily occur in sequence; they can occur anywhere in the program. And not every input needs to be referend, e.g., a function could choose to only access input b. Note that there is no writing to global variables, so the values of these variables are fixed for all time. You may assume that no other global variables are accessed.

The verify function either returns `Eq if the two functions are equivalent for all values of a, b, and c, or it returns `Neq (na, nb, nc), where na, nb, and nc are assignments to a, b, and c, respectively, for which the two functions are not equivalent. You can assume that the two functions return a value of type integer (which could, of course, be a symbolic expression).e

As with the previous project, all integers will be 3 bits, including the integers represented by a, b, and c, and arithmetic is modular, i.e., overflow is possible.

You should most likely implement this part of the project by reusing the code from the previous project, or by using the TA's code. As with that project, in some cases operations will not be available on symbolic variables, in which case your implementation can do anything.

Most likely, you'll change the previous code so that, on startup, it creates 3-bit bitvectors of unknowns for a, b, and c. Then when the user calls rd_glob, those bitvectors will be stored in the indicated register.

The second part of the project is to develop a function synth, also in synthrube.ml, with the following type signature:

type synth_ret = [ `Unsat | `Sat of int * int * int ]
let synth (h:fn) (m:fn):synth_ret = `Unsat

The input to synth is the harness h and the main function m. The function synth finds values for the holes in m such that h and m are equivalent for all inputs, meaning again, for all choices of a, b, and c.

To keep things simpler, there are exactly three holes, h1, h2, and h3, each of which represent unknown, 3-bit integers. The hole h1 is introduced into the program through the instruction I_const rn h1, and similarly for the other two holes.

The return value of synth is either `Unsat, meaning there is no choice of holes that will make the two programs equivalent, or `Sat (n1, n2, n3), meaning that setting holes hi equal to ni is a valid solution.

A function need not use all three holes; it could use a subset. Your implementation may do anything if the programmer uses more than the specified three holes. Each time a hole is mentioned, it should be treated as having the same value.

Your synthesizer must use the CEGIS loop. That is, your synthesis should operate as follows:

Pick an initial choice of a, b, and c.
Synthesize holes that produce equivalent programs for that choice.
Check if the resulting main is equivalent to harness
If so, exit. If not, use the counterexample as another input to the synthesis process, and repeat.

Note: Philosophically, it's not clear whether it makes sense to use CEGIS with a brute force solver. It may be possible that the problem with all three holes and all three globals is small enough to solve on its own, at least with an efficient SAT solver. On the other hand, notice that the verification and synthesis problems under CEGIS have only 9 bits of unknowns, whereas the whole problem has 18 bits of unknowns, which might take a while to solve.

There is no requirement to share test cases for this project, but you are welcome to do so on Piazza.

Project 5: Program Synthesizer

Updates

Introduction

Part 1: Verification

Part 2: Synthesis

Sharing Test Cases