Automatically Proving the Correctness of Compiler Optimizations Sorin Lerner Todd Millstein Craig Chambers Department of Computer Science and Engineering University of Washington flerns,todd,chambersg@cs.washington.edu Technical Report UW-CSE-02-11-02 Abstract We describe a technique for automatically proving compiler optimizations sound, meaning that their trans- formations are always semantics-preserving. We first present a domain-specific language for implementing optimizations as guarded rewrite rules. Optimizations operate over a C-like intermediate representation including unstructured control flow, pointers to local variables and dynamically allocated memory, and re- cursive procedures. Then we describe a technique for automatically proving the soundness of optimizations implemented in this language. Our technique requires only a small set of optimization-specific proof obli- gations to be discharged for each optimization by an automatic theorem prover. We have written a variety of forward and backward intraprocedural dataflow optimizations in our language, including constant propa- gation and folding, branch folding, (partial) redundancy elimination, (partial) dead assignment elimination, and simple forms of points-to analysis. We have implemented our proof strategy with the Simplify automatic theorem prover, and we have used this implementation to automatically prove our optimizations correct. Our system also found many subtle bugs during the course of developing our optimizations. 1 Introduction Compilers are an important part of the infrastructure relied upon by programmers. If a compiler is faulty, then so are potentially all programs compiled with it. Unfortunately, compiler errors can be diĘcult for programmers to detect and debug. First, because the compiler's output cannot be easily inspected, problems can often be found only by running a compiled program. Second, the compiler may appear to be correct over many runs, with a problem only manifesting itself when a particular compiled program is run with a particular input. Finally, when a problem does appear, it can be diĘcult to determine whether it is an error in the compiler or in the source program that was compiled. For these and other reasons, it is very useful to develop tools and techniques that give compiler developers and programmers confidence in their compilers. One way to gain confidence in the correctness of a compiler is to run it on various programs and check that the optimized version of each input program produces correct results on various inputs. While this method can increase confidence, it cannot provide any guarantees: it does not guarantee the absence of bugs in the compiler, nor does it even guarantee that any one particular optimized program is correct on all inputs. It also can be tedious to assemble an extensive test suite of programs and program inputs. 1 fi Credible compilers [24, 23] and translation validation [17] improve on this testing approach by having the compiler automatically check whether or not the optimized version of an input program is semantically equivalent to the original program. The compiler can therefore guarantee the correctness of certain optimized programs, but the compiler itself is still not guaranteed to be bug-free: there may exist programs for which the compiler produces incorrect output. There is little recourse for a programmer if the compiler reports that it cannot validate the programmer's compiled program. Furthermore, these approaches can have a substantial impact on the time to run an optimization. For example, Necula mentions that translation validation of an optimization pass takes about four times longer than the optimization pass itself [17]. The best solution would be to prove the compiler sound, meaning that for any input program, the compiler always produces an equivalent output program. Optimizations, and sometimes even complete compilers, have been proven sound by hand [1, 2, 13, 11, 6, 21, 3, 9]. However, manually proving large parts of a compiler sound requires a lot of effort and theoretical skill on the part of the compiler writer. In addition, these proofs are usually done for optimizations as written on paper, and bugs may still arise when the algorithms are implemented from the paper specification. We present a new technique for proving the soundness of compiler optimizations that combines the benefits from the last two approaches: our approach is fully automated, as in credible compilers and translation validation, but it also proves optimizations correct once and for all, for any input program. We achieve this goal by providing the compiler writer with a domain-specific language for implementing optimizations that is both flexible enough to express a variety of optimizations and amenable to automated correctness reasoning. The main contributions of this paper are as follows:  We present a language for defining optimizations over programs expressed in a C-like intermediate language including unstructured control flow, pointers to local variables and dynamically allocated memory, and recursive procedures. To implement an optimization (i.e., an analysis plus a code trans- formation), users provide a rewrite rule along with a guard describing the conditions that must hold for the rule to be triggered at some node of an input program's control-flow graph (CFG). The optimiza- tion also includes a small predicate over program states, which captures the key \insight" behind the optimization that justifies its correctness. Our language also allows users to express pure analyses, such as pointer analysis. Pure analyses can be used both to verify properties of interest about a program and to provide information to be consumed by later transformations. Optimizations and pure analyses written in our language are directly executable by a special dataflow analysis engine written for this purpose; they do not need to be reimplemented in a different language to be run.  We have used our language to express a variety of intraprocedural forward and backward dataflow optimizations, including constant propagation and folding, copy propagation, common subexpression elimination, dead assignment elimination, branch folding, partial redundancy elimination, partial dead code elimination, and loop-invariant code motion. We have also used our language to express several simple intraprocedural pointer analyses, whose results we have exploited in the above optimizations.  We present a strategy for automatically proving the soundness of optimizations and analyses expressed in our language. The strategy requires an automatic theorem prover to discharge a small set of proof obligations for each optimization. We have manually proven that if these obligations hold for any particular optimization, then that optimization is sound. The manual proof takes care of the necessary induction over program execution traces, which is diĘcult to automate. As a result, the automatic theorem prover is given only non-inductive theorems to prove about individual program states.  We have implemented our correctness checking strategy using Simplify [25, 20], the automatic theorem prover used in the Extended Static Checker for Java [5]. We have written a general set of axioms that are used by Simplify to automatically discharge the optimization-specific proof obligations generated by our strategy. The axioms simply encode the semantics of programs in our intermediate language. 2 fi New optimization programs can be written and proven sound without requiring any modifications to Simplify's axiom set.  We have used our correctness checker to automatically prove correct all of the optimizations and pure analyses listed above. The correctness checker uncovered a number of subtle problems with earlier versions of our optimizations that might have eluded manual testing for a long time. By providing greater confidence in the correctness of compiler optimizations, we hope to provide a foundation for extensible compilers. An extensible compiler would allow users to include new optimizations tailored to their applications or domains of interest. The extensible compiler can protect itself from buggy user optimizations by verifying their correctness using our strategy; any bugs in the resulting extended compiler can be blamed on other aspects of the compiler's implementation, not on the user's optimizations. Extensible compilers could also be a good vehicle for research into new compiler optimizations. The next section introduces our language for expressing optimizations by example and sketches our strategy for automatically proving soundness of such optimizations. Sections 3 and 4 formally define our optimization language and automatic proof strategy, respectively. Section 5 evaluates our work. Section 6 discusses our current and future work, including an extension to support interprocedural optimizations. Section 7 discusses related work, and section 8 offers our conclusions. The optional appendices contain definitions of all the optimizations and analyses we have written in our language. 2 Overview In this section, we informally describe our language for defining optimizations and our technique for proving those optimizations sound. 2.1 Forward Transformation Patterns 2.1.1 Semantics The heart of an optimization program is its transformation pattern. For a forward optimization, a transfor- mation pattern has the following form: 1 followed by 2 until s ) s 0 with witness P A transformation pattern describes the conditions under which a statement s may be transformed to s 0 . The formulas 1 and 2 , which are properties of a statement such as \x is defined and y is not used," together act as the guard indicating when it is legal to perform this transformation: s can be transformed to s 0 if on all paths in the CFG from the start of the procedure being optimized to s, there exists a statement satisfying 1 , followed by zero or more statements satisfying 2 , followed by s. Figure 1 shows this scenario pictorially. Forward transformation patterns codify a scenario common to many forward dataflow analyses: an enabling statement establishes the conditions necessary for a transformation to be performed downstream, and any intervening statements are innocuous, i.e., do not invalidate the conditions. The formula 1 captures the properties that make a statement enabling, and 2 captures the properties that make a statement innocuous. The witness P captures the conditions established by the enabling statement that allow the transformation to be safely performed. Witnesses have no effect on the semantics of an optimization; they will be discussed more below in the context of our strategy for automatically proving optimizations sound. Example 1 A simple form of constant propagation replaces statements of the form X := Y with X := C if there is an earlier (enabling) statement of the form Y := C and no intervening (innocuous) statement 3 fi boundary where holds y 2 region where holds paths in the CFG statement s y 1 Figure 1: CFG paths leading to a statement s which can be transformed to s 0 by the transformation pattern 1 followed by 2 until s ) s 0 with witness P . The shaded region can only be entered through a statement satisfying 1 , and all statements within the region satisfy 2 . The statement s can only be reached by first passing through this shaded region. modifies Y . The enabling statement ensures that variable Y holds the value C, and this condition is not inval- idated by the innocuous statements, thereby allowing the transformation to be safely performed downstream. The \pattern variables" X and Y may be instantiated with any variables of the procedure being optimized, while the pattern variable C may be instantiated with constants in the procedure. This sequence of events is expressed by the following transformation pattern (the witness is discussed in more detail in section 2.1.2): stmt(Y := C) followed by :mayDef (Y ) until X := Y ) X := C with witness (Y ) = C 2.1.2 Soundness A transformation pattern is sound, i.e., correct, if all the transformations it allows are semantics-preserving. Forward transformation patterns have a natural approach for understanding their soundness. Consider a statement s transformed to s 0 . Then any execution trace of the procedure that contains s 0 will at some point execute an enabling statement, then zero or more innocuous statements, before reaching s 0 . As mentioned earlier, executing the enabling statement establishes some conditions at the subsequent state of execution. These conditions are then preserved by the innocuous statements. Finally, the conditions imply that s and s 0 have the same effect at the point where s 0 is executed. As a result, the original program and the transformed program have the same semantics. 4 fi Our automatic strategy for proving optimizations sound is based on the above intuition. As part of the code for a forward transformation pattern, optimization writers provide a forward witness P , which is a (possibly first-order) predicate over an execution state, denoted . The witness plays the role of the conditions mentioned in the previous paragraph and is the intuitive reason why the transformation pattern is correct. Our strategy attempts to prove that the witness is established by the enabling statement and preserved by the innocuous statements, and that it implies that s and s 0 have the same effect. 1 We call the region of an execution trace between the enabling statement and the transformed statement the witnessing region. In Figure 1, the part of a trace that is inside the shaded area is its witnessing region. In example 1, the forward witness (Y ) = C denotes the fact that the value of Y in execution state  is C. Our implementation proves automatically that the witness (Y ) = C is established by the statement Y := C, preserved by statements that do not modify the contents of Y , and implies that X := Y and X := C have the same effect. Therefore, the constant propagation transformation pattern is automatically proven to be sound. 2.1.3 Labels Each node (i.e., statement) in a procedure's CFG is labeled with properties that are true at that node, such as mayDef (y) or stmt(x := 5). The formulas 1 and 2 in an optimization are propositional boolean expressions over these labels, which may reference pattern variables like Y and C. Our framework provides a single pre-defined label, stmt(s), which is true at a node if and only if that node is the statement s. Users can then define their own labels in two ways: syntactic labels and semantic labels. A syntactic label is defined only in terms of the stmt label and other syntactic labels of the current statement. For example, synDef (Y ), which stands for syntactic definition of Y , can be defined as: synDef (Y ) , stmt(Y := : : :) Then the syntactic (and conservative) version of the mayDef (Y ) label from example 1 can be defined as: mayDef (Y ) , synDef (Y ) _ stmt(X := : : :) _ stmt(: : : := P (: : :)) In other words, a statement may define variable Y if the statement is either a syntactic definition of Y , a pointer store (since our language allows taking the address of a local variable), or a procedure call (since the procedure may be passed pointers from which the address of Y is reachable). In contrast to syntactic labels, a semantic label of a node can incorporate information about the node's surrounding context in the CFG. For example, a doesNotPointTo(X ; Y ) label, which says that the contents of X is definitely not the address of Y , is a semantic label: its truth value at a node depends on the execution paths leading to the node. Section 2.4 shows how semantic labels are defined and how they can be used to make the mayDef label less conservative in the face of pointers. 2.2 Backward Transformation Patterns A backward transformation pattern is similar to a forward one, except that the direction of the flow of analysis is reversed: 1 preceded by 2 until s ) s 0 with witness P 1 The correctness of our approach does not depend on the correctness of the witness, since our approach independently verifies that the witness has the required properties. 5 fi The backward transformation pattern above says that s may be transformed to s 0 if on all paths in the CFG from s to the end of the procedure, there exists a statement satisfying 1 , preceded by zero or more statements satisfying 2 , preceded by s. The witnessing region of a program execution trace consists of the states between the transformed statement and the statement satisfying 1 ; P is called a backward witness. As with forward transformation patterns, the backward witness plays the role of an invariant in the witnessing region. However, in a backward transformation the witnessing region occurs after, rather than before, the point where the transformed statement has been executed. Therefore, in general a backward witness must be a predicate that relates two execution states  old and  new , representing corresponding execution states in the witnessing region of traces in the original and transformed programs. Our automatic proof strategy attempts to prove that the backward witness is established by the transformation and preserved by the innocuous states. Finally, we prove that after the enabling statement is executed, the witness implies that the original and transformed execution states become identical, implying that the transformation is semantics-preserving. Example 2 Dead assignment elimination may be implemented in our language by the following backward transformation pattern: (synDef (X) _ stmt(return : : :)) ^ :mayUse(X) preceded by :mayUse(X) until X := E ) skip with witness  old =X =  new =X We express statement removal by replacement with a skip statement. 2 The removal of X := E is enabled by either a later assignment to X, indicated by synDef (X), or a return statement, which signals the end of the procedure. Preceding statements are innocuous if they don't use the contents of X. The backward witness  old =X =  new =X says that  old and  new are equal \up to" X: corresponding states in the witnessing region of the original and transformed programs are identical except for the contents of variable X. This invariant is established by the removal of X := E and preserved throughout the region because X is not used. The witness implies that a re-definition of X or a return statement causes the execution states of the two traces to become identical. 2.3 Profitability Heuristics If an optimization's transformation pattern is proven sound, then all matching occurrences of that pattern are legal to be transformed. For some optimizations, including our two examples above, all legal transfor- mations are also profitable. However, in more complex optimizations, such as code motion and optimizations that trade off time and space, many transformations may preserve program behavior while only a small subset of them improve the code. To address this distinction between legality and profitability, an opti- mization is written in two pieces. The transformation pattern defines only which transformations are legal. An optimization separately describes which of the legal transformations are also profitable and should be performed; we call this second piece of an optimization its profitability heuristic. An optimization's profitability heuristic is expressed via a choose function, which can be arbitrarily complex and written in a language of the user's choice. Given the set  of the legal transformations 2 An execution engine would not actually insert such skips. 6 fi determined by the transformation pattern and the procedure being optimized, choose returns the subset of the transformations in  that should actually be performed. A complete optimization in our language therefore has the following form, where O pat is a transformation pattern: O pat filtered through choose This way of factoring optimizations into a transformation pattern and a profitability heuristic is critical to our ability to prove optimizations sound automatically, since only an optimization's transformation pattern affects soundness. Transformation patterns tend to be simple even for complicated optimizations, with the bulk of an optimization's complexity pertaining to profitability. Profitability heuristics can be written in any language, thereby removing any limitations on their expressiveness. Without profitability heuristics, the extra complexity added to guards to express profitability information would prevent automated correctness reasoning. For the constant propagation and dead assignment elimination optimizations shown earlier, the choose function returns all instances: choose all (; p) = . This profitability heuristic is the default if none is specified explicitly. Below we give an example of an optimization with a non-trivial choose function: Example 3 Consider the implementation of partial redundancy elimination (PRE) [12, 8] in our optimiza- tion language. One way to perform PRE is to first insert copies of statements in well-chosen places in order to convert partial redundancies into full redundancies, and then to eliminate the full redundancies by running a standard common subexpression elimination (CSE) optimization expressible in our language. For example, in the following code fragment, the computation x := a + b at the end is partially redundant, since it is redundant only when the true leg of the branch is executed: b := ...; if (...) { a := ...; x := a + b; } else { ... // don't define a, b, or x, and don't use x. } x := a + b; This partial redundancy can be eliminated by making a copy of the assignment x := a + b in the false leg of the branch. Now the assignment after the branch is fully redundant and can be removed by running CSE followed by self-assignment removal (removing assignments of the form x := x). The criterion that determines when it is legal to duplicate a statement is relatively simple. Most of the complexity in PRE involves determining which of the many legal duplications are profitable, so that partial redundancies will be converted to full redundancies at minimum cost. The first, \code duplication" pass of PRE can be expressed in our language as the following backward optimization: stmt(X := E) ^ unchanged(E) preceded by unchanged(E) ^ :mayDef (X) ^ :mayUse(X) until skip ) X := E with witness  old =X =  new =X filtered through : : : 7 fi Analogous to statement removal, we express statement insertion as replacement of a skip statement. 3 The label unchanged(E) is defined (by the optimization writer, as described in section 2.1.3) to be true at a statement s if s does not redefine the contents of any variable mentioned in E. The transformation pattern for code duplication allows the insertion if, on all paths in the CFG from the skip, X := E is preceded by statements that do not modify E and X and do not use X, which are preceded by the skip. In the code fragment above, the transformation pattern allows x:= a + b to be duplicated in the else branch, as well as other (unprofitable) duplications. This optimization's choose function is responsible for selecting those legal code insertions that also are the latest ones that turn all partial redundancies into full redundancies and do not introduce any partially dead computations. This condition is rather complicated, but it can be implemented in a language of the user's choice and can be ignored when verifying the soundness of code duplication. A sample choose function for PRE is shown in appendix A. 2.4 Pure Analyses In addition to optimizations, our language allows users to write pure analyses that do not perform transfor- mations. These analyses can be used to compute or verify properties of interest about a procedure and to provide information to be consumed by later transformations. A pure analysis defines a new semantic label, and the result of the analysis is a labeling of the given CFG. For instance, the does-not-point-to analysis (a definition of which is shown in appendix A) results in nodes of the CFG being annotated with labels of the form doesNotPointTo(X; Y ). These labels can then be used by other optimizations in their guards. A pure analysis is similar to a forward optimization, except that it does not contain a rewrite rule or a profitability heuristic. 4 Instead, it has a defines clause that gives a name to the new semantic label. A pure analysis has the form 1 followed by 2 defines label with witness P The new label can be added to a statement s if on all paths to s, there exists an (enabling) statement satisfying 1 , followed by zero or more (innocuous) statements satisfying 2 , followed by s. The given forward witness should be established by the enabling statement and preserved by the innocuous statements. If so, the witness provides the new label's meaning: if a statement s has semantic label label, then the corresponding witness P is true of the program state just before execution of s. The following example shows how a pure analysis can be used to compute a simple form of pointer information: Example 4 We say that a variable is tainted at a program point if its address may have been taken prior to that program point. The following analysis defines the notTainted label: stmt(decl X) followed by :stmt(: : : := &X) defines notTainted(X) with witness notPointedTo(X; ) 3 An execution engine for optimizations would conceptually insert skips dynamically as needed to perform insertions. 4 Our language currently has no notion of backward analyses. In addition, we currently only allow the results of a forward analysis to be used in a forward optimization, or in another forward analysis. 8 fi The analysis says that a variable is not tainted at a statement if on all paths leading to that statement, the variable was declared, and then its address was never taken. The witness notPointedTo(X; ) is a first-order predicate defined by the user (and shown in appendix A) that ensures that no memory location in  contains a pointer to X. The notTainted label can be used to define a more precise version of the mayDef label from earlier examples, which incorporates the fact that neither pointer stores nor procedure calls can affect variables that are untainted: mayDef (Y ) , synDef (Y ) _ (stmt(X := : : :) ^ :notTainted(Y )) _ (stmt(: : : := P (: : :)) ^ :notTainted(Y )) With this new definition, optimizations using mayDef become less conservative in the face of pointer stores and calls. 3 Language for Defining Optimizations This section provides a formal definition of our optimization language and the intermediate language that optimizations manipulate. The full formal details can be found in appendix B. 3.1 Intermediate Language A program  in our (untyped) intermediate language is described by the following grammar: Progs  ::= pr : : : pr Procs pr ::= p(x) fs; : : : ; s;g Stmts s ::= decl x j skip j lhs := e j x := new j x := p(b) j if b goto  else  j return x Exprs e ::= b j x j &x j op b : : : b Locatables lhs ::= x j x Base Exprs b ::= x j c Ops op ::= various operators with arity  1 Vars x ::= x j y j z j : : : Proc Names p ::= p j q j r j : : : Consts c ::= constants Indices  ::= 0 j 1 j 2 j : : : A program  is a sequence of procedures, and each procedure is a sequence of statements. We assume a distinguished procedure named main. Statements include local variable declarations, assignments to lo- cal variables and through pointers, heap memory allocation, procedure calls and returns, and conditional branches (unconditional branches can be simulated with conditional branches). We assume that each proce- dure ends with a return statement. Statements are indexed consecutively from 0, and stmtAt(; ) returns the statement with index  in . Expressions include constants, local variable references, pointer dereferences, taking the addresses of local variables, and n-ary operators over non-pointer values. A state of execution of a program is a tuple  = (; ; ; ; M). The index  indicates which statement is about to be executed. The environment  is a map from variables in scope to their locations in memory, and the store  describes the contents of memory by mapping locations to values (constants and locations). 9 fi The dynamic call chain is represented by a stack , and M is the memory allocator, which returns fresh locations as needed. The states of a program  transition according to the state transition function !  . We denote by  !   0 the fact that  0 is the program state that is \stepped to" when execution proceeds from state . The definition of !  is standard and is given in appendix B. We also define an intraprocedural state transition function ,!  . This function acts like !  except when the statement to be executed is a procedure call. In that case, ,!  steps \over" the call, returning the program state that will eventually be reached when control returns to the calling procedure. We model run-time errors through the absence of state transitions: if in some state  program execution would fail with a run-time error, there won't be any  0 such that  !   0 is true. Likewise, if a procedure call does not return successfully, e.g., because of infinite recursion, there won't be any  0 such that  ,!   0 is true. 3.2 Optimization Language In this section, we first specify the syntax of a rewrite rule's original and transformed statements s and s 0 . Then we define the language used for expressing 1 and 2 . Finally, we provide the semantics of optimizations. The witness P does not affect the (dynamic) semantics of optimizations. 3.2.1 Syntax of s and s 0 Statements s and s 0 are defined in the syntax of the extended intermediate language, which augments the intermediate language with a form of free variables called pattern variables. Each production in the grammar of the original intermediate language is extended with a case for a pattern variable. A few examples are shown below: Exprs e ::=    j E Vars x ::=    j X j Y j Z j : : : Consts c ::=    j C Statements in the extended intermediate language are instantiated by substituting for each pattern vari- able a program fragment of the appropriate kind from the intermediate-language program being optimized. For example, the statement X := E in the extended intermediate language contains two pattern variables X and E, and this statement can be instantiated to form an intermediate-language statement assigning any expression occurring in the intermediate program to any variable occurring in the intermediate program. 3.2.2 Syntax and Semantics of 1 and 2 The syntax for is described by the following grammar: ::= true j false j pr j : j 1 _ 2 j 1 ^ 2 In this grammar, pr stands for atomic predicates, each of which is formed from the pre-defined stmt label or from any user-defined label, with parameters drawn from the extended intermediate language. For example, pr includes predicates such as mayDef (X) and unchanged(E), where X and E are pattern variables. The semantics of a formula is defined with respect to a labeled CFG. Each node n in the CFG for procedure p is labeled with a finite set L p (), where  is n's index. L p () includes atomic predicates pr that do not contain pattern variables. For example, a node could be labeled with stmt(x := 3) and mayDef (x). The meaning of a formula at a node depends on a substitution  mapping the pattern variables in to fragments of p. We extend substitutions to formulas and program fragments containing pattern variables in the usual way. We write  j= p  to indicate that the node with index  satisfies in the 10 fi labeled CFG of p under substitution . The definition of  j= p  is straightforward, with the base case being  j= p  pr () (pr) 2 L p (). The complete definition of j= p  is in appendix B. 3.2.3 Semantics of Optimizations We define the semantics of analyses and optimizations in several pieces. First, the meaning of a forward guard 1 followed by 2 is a function that takes a procedure and returns a set of matching indices with their corresponding substitutions: Definition 1 The meaning of a forward guard O guard of the form 1 followed by 2 is as follows: JO guard K(p) = f(; ) j for all paths  1 ; : : : ;  j ;  in p's CFG such that  1 is the index of p's entry node 9k:(1  k  j ^  k j= p  1 ^ 8i:(k < i  j )  i j= p  2 ))g The above definition formalizes the description of forward guards from Section 2. The meaning of a backward guard 1 preceded by 2 is identical, except that the guard is evaluated on CFG paths ;  j ; : : : ;  1 that start, rather than end, at , where  1 is the index of the procedure's exit node. Guards can be seen as a restricted form of temporal logic formula, expressible in variants of both LTL and CTL. Next we define the semantics of transformation patterns. A (forward or backward) transformation pattern O pat = O guard until s ) s 0 with witness P simply filters the set of nodes matching its guard to include only those nodes of the form s: JO pat K(p) = f(; ) j (; ) 2 JO guard K(p) and n j= p  stmt(s)g The meaning of an optimization is a function that takes a procedure p and returns the procedure produced by applying to p all transformations chosen by the choose function. Definition 2 Given an optimization O of the form O pat filtered through choose, where O pat has rewrite rule s ) s 0 , the meaning of O is as follows: JOK(p) = let  := JO pat K(p) in app(s 0 ; p; choose(; p) \ ) where app(s 0 ; p;  0 ) returns the procedure identical to p but with the node with index  transformed to (s 0 ), for each (; ) in  0 . Finally, the meaning of a pure analysis O guard defines label with witness P applied to a procedure p is a new version of p's CFG where for each pair (; ) in JO guard K(p), the node with index  is additionally labeled with (label ). 4 Proving Soundness Automatically In this section we describe in detail our technique for automatically proving soundness of optimizations. We say that an intermediate-language program  0 is a semantically equivalent transformation of  if, whenever main(v 1 ) returns v 2 in , for some values v 1 and v 2 , then it also does in  0 . Let [p 7! p 0 ] denote the program identical to  but with procedure p replaced by p 0 . An optimization O is sound if for all intermediate- language programs  and procedures p in , [p 7! JOK(p)] is a semantically equivalent transformation of . The first subsection describes our technique for proving optimizations sound, which requires an automatic theorem prover to discharge only a small set of simple proof obligations. The second subsection describes how these obligations are implemented and automatically discharged with the Simplify theorem prover. 11 fi 4.1 Soundness of Optimizations We say that a transformation pattern O pat with rewrite rule s ) s 0 is sound if, for all intermediate-language programs  and procedures p in , for all subsets   JO pat K(p), [p 7! app(s 0 ; p; )] is a semantically equivalent transformation of . If a transformation pattern is sound, then any optimization O with that transformation pattern is sound, since the optimization will select some subset of the transformation pattern's suggested transformations, and each of these is known to be a semantically equivalent transformation of . Therefore, we need not reason at all about an optimization's profitability heuristic in order to prove that the optimization is sound. 4.1.1 Forward Transformation Patterns Consider a forward transformation pattern of the following form: 1 followed by 2 until s ) s 0 with witness P As discussed in section 2, our proof strategy entails showing that the forward witness P holds throughout the witnessing region and that the witness implies s and s 0 have the same semantics in this context. This can naturally be shown by induction over the states in the witnessing region of an execution trace leading to a transformed statement. In general, it is diĘcult for an automatic theorem prover to determine when proof by induction is necessary and to perform such a proof with a strong enough inductive hypothesis. Therefore we instead require an automatic theorem prover to discharge only non-inductive obligations, which pertain to individual execution states rather than entire execution traces. We have proven (see Theorems 1 and 2 below) that if these obligations hold for any particular optimization, then that optimization is sound. We use index as an accessor on states: index ((; ; ; ; M)) = . The optimization-specific obligations, to be discharged by an automatic theorem prover, are as follows, where (P) is the predicate formed by applying  to each pattern variable in the definition of P : F1. If  ,!   0 and index () j= p  1 , then (P)( 0 ). F2. If (P)() and  ,!   0 and index () j= p  2 , then (P)( 0 ). F3. If (P)() and  ,!   0 and  = index () and stmtAt(; ) = (s) and stmtAt( 0 ; ) = (s 0 ), then  ,!  0  0 . Condition F1 ensures that the witness holds at any state following the execution of an enabling statement (one satisfying 1 ). Condition F2 ensures that the witness is preserved by any innocuous statement (one satisfying 2 ). Finally, condition F3 ensures that s and s 0 have the same semantics when executed from a state satisfying the witness. As an example, consider condition F1 for the constant propagation optimization from example 1. The condition looks as follows: If  ,!   0 and index () j= p  stmt(Y := C), then ( 0 (Y ) = C). The condition is easily proven automatically from the semantics of assignments and the stmt label. The following theorem validates the optimization-specific proof obligations. Theorem 1 If O is a forward optimization satisfying conditions F1, F2, and F3, then O is sound. The proof of this theorem, which is given in appendix B, uses conditions F1 and F2 as part of the base case and the inductive case, respectively, in an inductive argument that the witness holds throughout a witnessing region. Condition F3 is then used to show that s and s 0 have the same semantics in this context. Our proof also handles the case when multiple transformations, with possibly overlapping witnessing regions, are performed. 12 fi 4.1.2 Backward Transformation Patterns Consider a backward transformation pattern of the following form: 1 preceded by 2 until s ) s 0 with witness P The optimization-specific obligations are similar to those for a forward transformation pattern, except that the ordering of events in the witnessing region is reversed: B1. If  ,!   old and  ,!  0  new and  = index () and stmtAt(; ) = (s) and stmtAt( 0 ; ) = (s 0 ), then (P)( old ;  new ). B2. If (P)( old ;  new ) and  old ,!   0 old and  old = index ( old ) and  new = index ( new ) and  old j=   2 and stmtAt(;  old ) = stmtAt( 0 ;  new ), then there exists some  0 new such that  new ,!  0  0 new and (P)( 0 old ;  0 new ). B3. If (P)( old ;  new ) and  old ,!   and  old = index ( old ) and  new = index ( new ) and  old j=   1 and stmtAt(;  old ) = stmtAt( 0 ;  new ), then  new ,!  0 . Condition B1 ensures that the backward witness holds between the original and transformed programs, after s and s 0 are respectively executed. 5 Condition B2 ensures that the backward witness is preserved through the innocuous statements. Condition B3 ensures that the two traces become identical again after executing the enabling statement (and exiting the witnessing region). Analogous to the forward case, the following theorem validates the optimization-specific proof obligations for backward optimizations. Theorem 2 If O is a backward optimization satisfying conditions B1, B2, and B3, then O is sound. 4.2 Implementation with Simplify We have implemented our proof strategy with the Simplify automatic theorem prover. For each optimization, we ask Simplify to prove the three associated optimization-specific obligations. To do so, Simplify requires background information in the form of a set of axioms. These axioms, which simply encode the semantics of our intermediate language and of the stmt label, are optimization-independent: they need not be modified in order to prove new optimizations sound. We introduce function symbols to represent term constructors for each kind of expression and statement. For example, the term assgn(var(x ); deref (var(y)) represents the statement x := y . Next we formalize the representation of program states. Simplify has built-in axioms about a map data structure, with associated functions select and update to access elements and (functionally) update the map. This is useful for repre- senting many components of a state. For example, an environment is a map from variables to locations, and a store is a map from locations to values. Given our representation for states, we define axioms for a function symbol evalExpr, which evaluates an expression in a given state. The evalExpr function represents the function () used in section 2. We also define axioms for a function evalLExpr which computes the location of a lhs expression given a program state. Finally, we provide axioms for the stepIndex, stepEnv, stepStore, stepStack, and stepMem functions, which together define the state transition function !  from section 3.1. These functions take a state and a program and return the new value of the state component being \stepped." As an example, the axioms for stepping an index and a store through an assignment lhs := e are as follows: 5 This condition assumes that s 0 does not get \stuck" by causing a run-time error. That assumption must actually be proven, but for simplicity we elide this issue here. It is addressed by requiring a few additional obligations to be discharged that imply that s 0 cannot get stuck if the original program does not get stuck. Details are in appendix B. 13 fi 8; ; lhs ; e: stmtAt(; index ()) = assgn(lhs ; e) ) stepIndex (; ) = index () + 1 8; ; lhs ; e: stmtAt(; index ()) = assgn(lhs ; e) ) stepStore(; ) = update(store();evalLExpr (; lhs); evalExpr (; e)) The first axiom says that the new index is the current index incremented by one. The second axiom says that the new store is the same as the old one, but with the location of lhs updated to the value of e. The ,!  function is then defined in terms of the !  function. We have implemented and automatically proven sound a dozen optimizations and analyses in our language (which are given in appendix A). On a modern workstation, the time taken by Simplify to discharge the optimization-specific obligations for these optimizations ranges from 2 to 89 seconds, with an average of 22 seconds. 5 Discussion In this section, we evaluate our system along three dimensions: expressiveness of our language, debugging value, and reduced trusted computing base. Expressiveness. One of the key choices in our approach is to restrict the language in which optimizations can be written, in order to gain automatic reasoning about soundness. However, the restrictions of our optimization language are not as onerous as they may first appear. First, much of the complexity of an optimization can be factored out into the profitability heuristic, which is unrestricted. Second, the pattern of a witnessing region | beginning with a single enabling statement and passing through zero or more innocuous statements before reaching the statement to be transformed | is common to many forward intraprocedural dataflow analyses, and similarly for backward intraprocedural dataflow analyses. Third, optimizations that traditionally are expressed as having effects at multiple points in the program, such as various sorts of code motion and partial redundancy elimination, can in fact be decomposed into several simpler transformations, each of which fits our constraint of making a single transformation at the end of a witnessing region. The PRE example in section 2.3 illustrates all three of these points. PRE is a complex code-motion optimization [12, 8], and yet it can be expressed in our language using simple forward and backward passes with appropriate profitability heuristics. Our way of factoring complicated optimizations into smaller pieces, and separating the part that affects soundness from the part that doesn't, allows users to write optimizations that are intricate and expressive yet still amenable to automated correctness reasoning. Even so, our current language does have limitations. For example, it cannot express interprocedural optimizations or one-to-many transformations. As mentioned in section 6, our ongoing work is addressing these limitations. Also, optimizations and analyses that build complex data structures to represent their dataflow facts may be diĘcult to express. Finally, it is possible for limitations in either our proof strategy or in the automatic theorem prover to cause a sound optimization expressible in our language to be rejected. In these cases, optimizations can be written outside of our framework, perhaps verified using translation validation. Optimizations written in our optimization language and proven correct can peacefully co-exist with optimizations written \the normal way." Debugging benefit. Writing correct optimizations is diĘcult because there are many corner cases to consider, and it is easy to miss one. Our system in fact found several subtle problems in previous versions of our optimizations. For example, we have implemented a form of common subexpression elimination (CSE) that eliminates not only redundant arithmetic expressions, but also redundant loads. In particular, this 14 fi optimization tries to eliminate a computation of X if the result is already available from a previous load. Our initial version of the optimization precluded pointer stores from the witnessing region, to ensure that the value of X was not modified. However, a failed soundness proof made us realize that even a direct assignment Y := : : : can change the value of X , because X could point to Y . Once we incorporated pointer information to make sure that direct assignments in the witnessing region were not changing the value of X , our implementation was able to automatically prove the optimization sound. Without the static checks to find the bug, it could have gone undetected for a long time, because that particular corner case may not occur in many programs. Reduced trusted computing base. The trusted computing base (TCB) ordinarily includes the entire compiler. In our system we have moved the compiler's optimization phase, one of the most intricate and error-prone portions, outside of the TCB. Instead, we have shifted the trust in this phase to three compo- nents: the automatic theorem prover, the manual proofs done as part of our framework, and the run-time engine that executes optimizations. Because all of these components are optimization-independent, new op- timizations can be incorporated into the compiler without enlarging the TCB. Furthermore, as discussed in section 6, the run-time engine can be implemented as a single dataflow analysis common to all user-defined optimizations. This means that the trustworthiness of the run-time engine is akin to the trustworthiness of a single optimization pass in a traditional compiler. Trust can be further enhanced in several ways. First, we could use an automatic theorem prover that generates proofs, such as the prover in the Touchstone compiler [19]. This would allow trust to be shifted from the theorem prover to a simpler proof checker. The manual proofs of our framework are made public for peer review in appendix B to increase confidence. We could also use an interactive theorem prover such as PVS [22] to validate these proofs. 6 Current and Future Work Our current work is focused in two directions. First, we are implementing the dynamic semantics of our optimization language as an analysis in the Whirlwind compiler, a successor to Vortex [4]. This analysis stores at every program point a set of substitutions, each substitution representing a potential witnessing region. Consider a forward optimization: 1 followed by 2 until s ) s 0 with witness P filtered through choose The flow function for our analysis works as follows. First, if the statement being processed satisfies 1 , then the flow function adds to the outgoing dataflow fact the substitution that caused 1 to be true. Also, for each substitution  in the incoming dataflow fact, the flow function checks if ( 2 ) is true at the current statement. If it is, then  is propagated to the outgoing dataflow fact, otherwise it is dropped. Finally merge nodes simply take the intersection of the incoming dataflow facts. After the analysis has reached a fixed point, if a statement has a substitution  in its incoming dataflow fact that makes (stmt(s)) true, and the choose function selects this statement, then the statement is transformed to (s 0 ). For example, in constant propagation we have 1 = stmt(Y := C) and 2 = :mayDef (Y ). The following program fragment shows the dataflow facts propagated after each statement: S1 : a := 2; [Y 7! a; C 7! 2] S2 : b := 3; [Y 7! a; C 7! 2]; [Y 7! b; C 7! 3] S3 : c := a; S1 satisfies 1 , and so its outgoing dataflow fact contains the substitution [Y 7! a; C 7! 2]. S2 satisfies 2 under this substitution, and so the substitution is propagated; S2 also satisfies 1 and so [Y 7! b; C 7! 3] 15 fi is added to the outgoing dataflow fact. In fact, the dataflow information after S2 is very similar to the regular constant propagation dataflow fact fa 7! 2; b 7! 3g. At fixed point, the statement c := a can be transformed to c := 2 because the incoming dataflow fact contains the map [Y 7! a; C 7! 2]. Note that this implementation evaluates all \instances" of the constant propagation optimization pattern simultaneously. (We also plan to explore potentially more eĘcient implementation techniques, such as generating specialized code to run each optimization [26].) Our analysis is being implemented using our earlier framework for composing optimizations in Whirl- wind [10]. This framework allows optimizations to be defined modularly and then automatically combines all-forward or all-backward optimizations in order to gain mutually beneficial interactions. Analyses and op- timizations written in our language will therefore also be composable in this way. Furthermore, Whirlwind's framework automatically composes an optimization with itself, allowing a recursively defined optimization to be solved in an optimistic, iterative manner; this property will likewise be conferred on optimizations written in our language. For example, a recursive version of dead-assignment elimination would allow X := E to be removed even if X is used before being re-defined, as long as it is only used by other dead assignments (including itself). The other direction of our current work is in extending the language to handle interprocedural optimiza- tions. One approach would extend the scope of analysis from a single procedure to the whole program's control-flow supergraph. One technical challenge here is the need to express the witness P in a way that makes sense across procedure calls. For example, the predicate (Y ) = C does not make sense once a call is stepped into, because Y has gone out of scope. We intend to extend the syntax for the witness to be more precise about which variable is being talked about. A different approach to interprocedural analysis would use pure analyses to define summaries of procedures, which could be used in intraprocedural optimizations of callers. There are also many directions for future work. First, the optimization language currently supports only transformations that replace a single statement with a single statement. It should be relatively straightfor- ward to generalize the framework to handle one-to-many statement transformations, allowing optimizations like inlining to be expressed. Supporting many-to-many statement transformations would also be interesting. We plan to try inferring the witnesses, which are currently provided by the user. It may be possible to use some simple heuristics to guess a witness from the given transformation pattern. As a simple example, in the constant propagation example of section 2, the appropriate witness, that Y has the value C, is simply the strongest postcondition of the enabling statement Y := C. Many of the other forward optimizations that we have written also have this property. Finally, an important consideration that we have not addressed is the interface between the optimization writer and our automatic soundness prover. It will be critical to provide useful error messages when an optimization is unable to be proven sound. When Simplify cannot prove a given proposition, it returns a counterexample context, which is a state of the world that violates the proposition. An interesting approach would be to use this counterexample context to synthesize a small intermediate-language program that illustrates a potential unsoundness of the given optimization. 7 Related Work Our work is inspired by that of Lacey et al. [9]. Lacey describes a language for writing optimizations as guarded rewrite rules evaluated over a labeled CFG, and our transformation patterns are modeled on this language. Lacey's intermediate language lacks several constructs found in realistic languages, including pointers, dynamic memory allocation, and procedures. Lacey describes a general strategy, based on relating execution traces of the original and transformed programs, for manually proving the soundness of optimiza- tions in his language. Three example optimizations are shown and proven sound by hand using this strategy. 16 fi Unfortunately, the generality of this strategy makes it diĘcult to automate. Lacey's guards may be arbitrary CTL formulas, while our guard language can be viewed as a strict subset of CTL that codifies a particularly common idiom. However, we are still able to express more precise versions of Lacey's three example optimizations (as well as many others) and to prove them sound automatically. Further, Lacey's optimization language has no notion of semantic labels nor of profitability heuristics. Therefore, expressing optimizations that employ pointer information (assuming Lacey's language were augmented with pointers) or optimizations like PRE would instead require writing more complicated guards, and some optimizations that we support may not be expressible by Lacey. As mentioned in the introduction, much other work has been done on manually proving optimizations correct [11, 13, 1, 2, 6, 21, 3]. Transformations have also been proven correct mechanically, but not auto- matically: the transformation is proven sound using an interactive theorem prover, which increases one's confidence but requires user interaction. For example, Young [28] has proven a code generator correct using the Boyer-Moore theorem prover enhanced with an interactive interface [7]. Instead of proving that the compiler is always correct, credible compilation [24, 23] and translation validation [17] both attack the problem of checking the correctness of a given compilation run. Therefore, a bug in an optimization will only appear when the compiler is run on a program that triggers the bug. Our work allows optimizations to be proven correct before the compiler is even run once. However, to do so we require optimizations to be written in a special-purpose language, while credible compilation and translation validation typically do not. Proof-carrying code [16], certified compilation [18], typed intermediate languages [27], and typed assembly languages [14, 15] have all been used to prove properties of programs generated by a compiler. However, the kind of properties that these approaches have typically guaranteed are type safety and memory safety. In our work, we prove the stronger property of semantic equivalence between the original and resulting programs. 8 Conclusion We have presented an approach for automatically proving the correctness of compiler optimizations. Our technique provides the optimization writer with a domain-specific language for writing optimizations that is both reasonably expressive and amenable to automated correctness reasoning. Using our technique we have proven correct our implementations of several optimizations over a realistic intermediate language. We believe our approach is a promising step toward the goal of reliable and user-extensible compilers. References [1] Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Conference Record of the Fourth ACM Symposium on Principles of Programming Languages, pages 238{252, January 1977. [2] Patrick Cousot and Radhia Cousot. Systematic design of program analysis frameworks. In Conference Record of the Sixth ACM Symposium on Principles of Programming Languages, pages 269{282, January 1979. [3] Patrick Cousot and Radhia Cousot. Systematic design of program transformation frameworks by abstract interpretation. In Conference Record of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 2002. [4] Jeffrey Dean, Greg DeFouw, Dave Grove, Vassily Litvinov, and Craig Chambers. Vortex: An optimizing compiler for object- oriented languages. In Proceedings of the 1996 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 83{100, San Jose, CA, October 1996. [5] Cormac Flanagan, K. Rustan M. Leino, Mark Lillibridge, Greg Nelson, James B. Saxe, and Raymie Stata. Extended static checking for Java. In Proceedings of the ACM SIGPLAN '02 Conference on Programming Language Design and Implementation, June 2002. 17 fi [6] J. Guttman, J. Ramsdell, and M. Wand. VLISP: a verified implementation of Scheme. Lisp and Symbolic Compucation, 8(1-2):33{110, 1995. [7] M. Kauffmann and R.S. Boyer. The Boyer-Moore theorem prover and its interactive enhancement. Computers and Mathematics with Applications, 29(2):27{62, 1995. [8] Jens Knoop, Oliver Ruthing, and Bernhard Steffen. Optimal code motion: Theory and practice. ACM Transactions on Programming Languages and Systems, 16(4):1117{1155, July 1994. [9] David Lacey, Neil D. Jones, Eric Van Wyk, and Carl Christian Frederiksen. Proving correctness of compiler optimizations by temporal logic. In Conference Record of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 2002. [10] Sorin Lerner, David Grove, and Craig Chambers. Composing dataflow analyses and transformations. In Conference Record of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 2002. [11] J. McCarthy and J. Painter. Correctness of a compiler for arithmetic expressions. In T. J. Schwartz, editor, Proceedings of Symposia in Applied Mathematics, January 1967. [12] E. Morel and C. Renvoise. Global optimization by suppression of partial redundancies. Communications of the ACM, 22(2):96{103, February 1979. [13] F. Lockwood Morris. Advice on structuring compilers and proving them correct. In Conference Record of the 1st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 1973. [14] Greg Morrisett, Karl Crary, Neal Glew, Dan Grossman, Richard Samuels, Frederick Smith, David Walker, Stephanie Weirich, and Steve Zdancewic. TALx86: A realistic typed assembly language. In 1999 ACM SIGPLAN Workshop on Compiler Support for System Software, pages 25{35, Atlanta, GA, USA, May 1999. [15] Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From System F to Typed Assembly Language. ACM Trans- actions on Programming Languages and Systems, 21(3):528{569, May 1999. [16] George C. Necula. Proof-carrying code. In Conference Record of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 1997. [17] George C. Necula. Translation validation for an optimizing compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 83{95, Vancouver, Canada, June 2000. [18] George C. Necula and Peter Lee. The design and implementation of a certifying compiler. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation, June 1998. [19] George C. Necula and Peter Lee. Proof generation in the Touchstone theorem prover. In Proceedings of the International Conference on Automated Deduction, pages 25{44, Pittsburgh, Pennsylvania, June 2000. Springer-Verlag LNAI 1831. [20] Greg Nelson and Derek C. Oppen. Simplification by cooperating decision procedures. ACM Transactions on Programming Languages and Systems, 1(2):245{257, October 1979. [21] D. P. Oliva, J. Ramsdell, and M. Wand. The VLISP verified PreScheme compiler. Lisp and Symbolic Computation, 8(1-2):111{182, 1995. [22] S. Owre, S. Rajan, J.M. Rushby, N. Shankar, and M.K. Srivas. PVS: Combining specification, proof checking, and model checking. In Computer-Aided Verification, CAV '96, volume 1102 of Lecture Notes in Computer Science, pages 411{414, New Brunswick, NJ, July/August 1996. Springer-Verlag. [23] Martin Rinard. Credible compilation. Technical Report MIT-LCS-TR-776, Massachusetts Institute of Technology, March 1999. [24] Martin Rinard and Darko Marinov. Credible compilation. In Proceedings of the FLoC Workshop Run-Time Result Verification, July 1999. [25] Simplify automatic theorem prover home page, http://research.compaq.com/SRC/esc/Simplify.html. [26] Bernhard Steffen. Data flow analysis as model checking. In A.R. Meyer T. Ito, editor, Theoretical Aspects of Computer Sci- ence (TACS'91), Sendai (Japan), volume 526 of Lecture Notes in Computer Science (LNCS), pages 346{364, Heidelberg, Germany, September 1991. Springer-Verlag. [27] David Tarditi, Greg Morrisett, Perry Cheng, Chris Stone, Robert Harper, and Peter Lee. TIL: A type-directed opti- mizing compiler for ML. In Proceedings of the ACM SIGPLAN '96 Conference on Programming Language Design and Implementation, May 1996. [28] William D. Young. A mechanically verified code generator. Journal of Automated Reasoning, 5(4):493{518, December 1989. 18 fi A Additional Optimizations A.1 Optimizations Copy propagation stmt(Y := Z) followed by :mayDef (Z) ^ :mayDef (Y ) until X := Y ) X := Z with witness (Y ) = (Z) Constant propagation stmt(Y := C) followed by :mayDef (Y ) until X := Y ) X := C with witness (Y ) = C stmt(X := C) followed by :mayDef (X) until if X goto P 1 else P 2 ) if C goto P 1 else P 2 with witness (X) = C stmt(Y := C) followed by :mayDef (Y ) until X := op B Y ) X := op B C with witness (Y ) = C stmt(Y := C) followed by :mayDef (Y ) until X := op Y B ) X := op C B with witness (Y ) = C 19 fi Constant folding true followed by true until X := op C 1 C 2 ) X := C (C = op(C 1 ; C 2 )) with witness true The above guard, true followed by true, holds at all nodes. Branch folding In the following optimizations, we use goto P 1 as sugar for if true goto P 1 else P 1 . true followed by true until if true goto P 1 else P 2 ) goto P 1 with witness true true followed by true until if C goto P 1 else P 2 ) goto P 2 (C 6= true) with witness true Common subexpression elimination unchanged(E) ^ stmt(Z := E) followed by :mayDef (Z) ^ unchanged(E) until X := E ) X := Z with witness (Z) = (E) Load removal stmt(Y := &Z) followed by :mayDef(Y ) ^ :stmt(decl Z) until X := Y ) X := Z with witness (Y ) = (&Z) 20 fi Dead assignment elimination (synDef (X) _ stmt(return : : :)) ^ :mayUse(X) preceded by :mayUse(X) until X := E ) skip with witness  old =X =  new =X Code hoisting (PRE) This is the PRE optimization from section 2.3, with pseudocode for a sample choose function. The choose function shown here is Steffen's optimal computation placement condition [26], which intuitively tries to place duplicates as early as possible. A duplicate is placed at node n if placing the duplicate any earlier would either not be legal, or it would cover different redundancies from those covered by the placement at n. This is just one possible choose function, given here for illustrative purposes. stmt(X := E) ^ unchanged(E) preceded by unchanged(E) ^ :mayDef (X) ^ :mayUse(X) until skip ) X := E with witness  old =X =  new =X filtered through let 1 = stmt(X := E) ^ unchanged(E) let 2 = unchanged (E) ^ :mayDef (X) ^ :mayUse(X) let 3 = :unchanged(E) _ mayUse(X) _ (mayDef (X) ^ :stmt(X := E)) in f(; ) 2 j forall paths in the CFG leading to  there exists a node at which 3 holds followed by nodes at which :legal holds followed by the node with index g where legal holds at a node with index  0 if forall paths in the CFG from  0 there exists a node at which 1 holds preceded by nodes at which 2 holds preceded by the node with index  0 Code sinking (PDE) We show a version of code sinking that performs partial dead code elimination. We perform code sinking by inserting copies of certain computations downstream of where they occur in the CFG, and then running dead-assignment elimination to remove the original computations. Our code sinking optimization is symmetrical to the code hoisting optimization described above, and the intuition for how it works is the same. 21 fi stmt(X := E) ^ unchanged(E) followed by unchanged(E) ^ :mayDef (X) ^ :mayUse(X) until skip ) X := E with witness (X) = (E) filtered through let 1 = stmt(X := E) ^ unchanged(E) let 2 = unchanged (E) ^ :mayDef (X) ^ :mayUse(X) let 3 = :unchanged(E) _ mayUse(X) _ (mayDef (X) ^ :stmt(X := E)) in f(; ) 2 j forall paths in the CFG from  there exists a node at which 3 holds preceded by nodes at which :legal holds preceded by the node with index g where legal holds at a node with index  0 if forall paths in the CFG leading to  0 there exists a node at which 1 holds followed by nodes at which 2 holds followed by the node with index  0 22 fi A.2 Analyses Tainted-variable analysis stmt(decl X) followed by :stmt(: : : := &X) defines notTainted(X) with witness notPointedTo(X; ) where notPointedTo is defined as: notPointedTo(X; ) , 8l 2 domain(store()):(l) 6= (&X) Simple points-to analysis The following is a simple points-to analysis. In section A.3, we show how this label is combined with the notTainted label to define the doesNotPointTo label, which in turn is used to define labels such as mayDef , mayUse and unchanged . This optimization uses npMayDef , which is a version of mayDef that does not use pointer information. stmt(X := &Z) ^ :synUse(Y ) ^ hasBeenDeclared (Y ) followed by :npMayDef (X) defines simpleNotPntTo(X; Y ) with witness (X) 6= (&Y ) Has-been-declared analysis stmt(decl X) followed by true defines hasBeenDeclared (X) with witness X 2 domain(env()) 23 fi A.3 Label definitions We now define the labels used in this paper. Labels that start with np (e.g. npMayDef , npMayUse , npUnchanged) are versions of the labels that don't use pointer information (and thus are more conservative). synDef (Y ) , stmt(Y := : : :) synUse(Y ) , stmt(: : : := : : : Y : : :) _ stmt(Y := : : :) _ stmt(if Y : : :) npMayDef (Y ) , synDef (Y ) _ stmt(X := : : :) _ stmt(: : : := P (: : :)) npMayUse(Y ) , synUse(Y ) _ stmt(: : : := X) _ stmt(: : : := P (: : :)) doesNotPointTo(X; Y ) , simpleNotPntTo(X; Y ) _ notTainted(Y ) mayPointTo(X; Y ) , :doesNotPointTo(X; Y ) mayDef (Y ) , synDef (Y ) _ (stmt(X := : : :) ^ mayPointTo(X; Y )) _ (stmt(: : : := P (: : :)) ^ :notTainted(Y )) mayUse(Y ) , synDef (Y ) _ (stmt(: : : := X) ^ mayPointTo(X; Y )) _ stmt(: : : := P (: : :)) The unchanged label is defined as follows. When E is not X , unchanged(E) is defined as the conjunction of :mayDef for all the variables in E. For X , unchanged (X) is defined as follows: unchanged(X) , :mayDef (X) ^ :stmt(Y := : : :) ^ :stmt(: : : := P (: : :)) ^ (stmt(Y := : : :) =) doesNotPointTo(X ; Y )) npUnchanged is a version of unchanged that does not use pointer information. For X , npUnchanged (X) is false. When E is not X , it is defined as the conjunction of :npMayDef for all the variables in E. 24 fi B Formalization B.1 Semantics of the Intermediate Language B.1.1 Preliminaries The set of indices of program  is denoted Indices  . The set of indices of procedure p is denoted Indices p . The formal argument name of procedure p in program  is denoted formal p . The index of the first statement in procedure p of program  is denoted start p . We assume WLOG that no if statements in a procedure p refer to indices not in Indices p , nor to the index start p . 6 The arity of an operator op is denoted arity(op). We assume a fixed interpretation function for each n-ary operator symbol op: JopK : Consts n ! Consts . We assume an infinite set Locations of memory locations, with metavariable l ranging over the set. We assume that the set Consts is disjoint from Locations and contains the distinguished elements true and uninit . Then the set of values is defined as Values = (Locations [ Consts) An environment is a partial function  : Vars * Locations ; we denote by Environments the set of all environments. A store is a partial function  : Locations * Values ; we denote by Stores the set of all stores. The domain of an environment  is denoted dom(), and similarly for the domain of a store. The notation [x 7! l] denotes the environment identical to  but with variable x mapping to location l; if x 2 dom(), the old mapping for x is shadowed by the new one. The notation  7! v] is defined similarly. The notation =fl 1 ; : : : ; l i g denotes the store identical to  except that all pairs (l; v) 2  such that l 2 fl 1 ; : : : ; l i g are removed. The current dynamic call chain is represented by a stack. A stack frame is a triple f = (; l; ) : Indices  Locations  Environments. Here  is the index of the first statement following the call currently being executed, l is the location in which to put the return value from the call, and  is the current lexical environment at the point of the call. We denote by Frames the set of all stack frames. A stack  =< f 1 : : : fn >: Frames is a sequence of stack frames. The set of all stacks is denoted Stacks . Stacks support two operations defined as follows: push : (Frames  Stacks) ! Stacks push(f; < f 1 : : : fn >) =< f f 1 : : : fn > pop : Stacks * (Frames  Stacks) pop(< f 1 f 2 : : : fn >) = (f 1 ; < f 2 : : : fn >); where n > 0 Finally, a memory allocator M is an infinite stream < l 1 ; l 2 ; : : : > of locations. We denote the set of all memory allocators as MemAllocs. A state of execution of a program  is a quintuple  = (; ; ; ; M) where  2 Indices ,  2 Environments ,  2 Stores ,  2 Stacks , and M 2 MemAllocs . We denote the set of program states by States . We refer to the corresponding index of a state  as index (), and we similarly define accessors env, store, stack, and mem. 6 This last restriction ensures that the entry node of p's CFG will have no incoming edges. 25 fi B.1.2 State Transition Functions The evaluation of an expression e in a program state , where env() =  and store() = , is given by the function (e) : (States  Exprs) ! Values defined by: (x) = ((x)) where x 2 dom(); (x) 2 dom() (c) = c (x) = (((x))) where x 2 dom(); (x) 2 dom(); ((x)) 2 dom() (&x) = (x) where x 2 dom() (opb 1 : : : b n ) = JopK((b 1 ); : : : ; (b n )) where arity(op) = n and 81  j  n:((b j ) 2 Consts) Similarly, the evaluation of a locatable expression lhs in a state , where env() =  and store() = , is given by the function  l (lhs) : (States  Locatables) ! Locations defined by:  l (x) = (x) where x 2 dom()  l (x) = ((x)) where x 2 dom(); (x) 2 dom(); ((x)) 2 Locations Definition 3 Given a program , the state transition function !   States  States is defined by:  If stmtAt(; ) = decl x then (; ; ; ; < l; l 1 ; l 2 ; : : : >) !  ( + 1; [x 7! l];  7! uninit ]; ; < l 1 ; l 2 ; : : : >) where l 62 dom()  If stmtAt(; ) = skip then (; ; ; ; M) !  ( + 1; ; ; ; M)  If stmtAt(; ) = (lhs := e) then (; ; ; ; M) !  ( + 1; ; [ l (lhs) 7! (e)]; ; M) where  = (; ; ; ; M)  If stmtAt(; ) = x := new then (; ; ; ; < l; l 1 ; l 2 ; : : : >) !  ( + 1; ; [(x) 7! l][l 7! uninit ]; ; < l 1 ; l 2 ; : : : >) where x 2 dom(), l 62 dom()  If stmtAt(; ) = x := p 0 (b) then (; ; ; ; < l; l 1 ; l 2 ; : : : >) !  ( 0 ; f(y; l)g;  7! (b)]; push(f; ); < l 1 ; l 2 ; : : : >) where  = (; ; ; ; < l; l 1 ; l 2 ; : : : >),  0 = start p0 , y = formal p0 , l 62 dom(), x 2 dom(), f = ( + 1; (x); )  If stmtAt(; ) = (if b goto  1 else  2 ) then (; ; ; ; M) !  ( 1 ; ; ; ; M) where (; ; ; ; M)(b) = true  If stmtAt(; ) = (if b goto  1 else  2 ) then (; ; ; ; M) !  ( 2 ; ; ; ; M) where (; ; ; ; M)(b) 6= true  If stmtAt(; ) = return x then (; ; ; ; M) !  ( 0 ;  0 ;  0 ;  0 ; M) where pop() = (( 0 ; l 0 ;  0 );  0 ), dom() = fx 1 ; : : : ; x i g,  0 = (=f(x 1 ); : : : ; (x i )g)[l 0 7! (; ; ; ; M)(x)] 26 fi We denote the reflexive, transitive closure of !  as !   . Definition 4 Given a program , the intraprocedural state transition function ,!   States  States is defined by:  If stmtAt(; ) is not a procedure call, then  ,!   0 where  !   0  If stmtAt(; ) is a procedure call, then  ,!   0 where  !   00 !    0 and  0 is the first state on the trace between  and  0 such that stack( 0 ) = stack() We denote the reflexive, transitive closure of ,!  as ,!   . We say that  6,!  if there does not exist some  0 such that  ,!   0 .  B.1.3 Programs and Program Transformations Definition 5 The semantic function of a program  is the partial function JK : Consts  MemAllocs * Values defined by: JK(c; < l; l 1 ; l 2 ; : : : >) = (l) where (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >) !   ( + 1; f(x; l)g; ; <>;M) and  62 Indices  and stmtAt(; ) is defined to be x := main(c). Definition 6 We say that  0 is a semantically equivalent transformation of  if for all c; M such that JK(c; M) = v, it is the case that J 0 K(c; M) = v. B.2 Semantics of Optimizations Let stmtAt(p; ) denote the statement at index  of procedure p. Definition 7 The control flow graph (CFG) of a procedure p is a graph (Indices p ; ! cfg ), where ! cfg  Indices p  Indices p and  1 ! cfg  2 if and only if: (stmtAt(p;  1 ) 2 fdecl x; lhs := e; skip; x := new; x := p(b); return xg ^  2 =  1 + 1) _ (stmtAt(p;  1 ) = if b goto  else  0 ^ ( 2 =  _  2 =  0 ))  Each node with index  is annotated with the label stmt(s), where s = stmtAt(p; ). The definition of  j= p  , which evaluates a formula at the node for index  in the CFG of p, is as follows.  j= p  true () true  j= p  false () false  j= p  : () not  j= p   j= p  1 _ 2 ()  j= p  1 or  j= p  2  j= p  1 ^ 2 ()  j= p  1 and  j= p  2  j= p  pr () (pr ) 2 L p () 27 fi B.3 Proof Obligations B.3.1 Forward Optimizations Consider a forward transformation pattern of the following form: 1 followed by 2 until s ) s 0 with witness P The optimization-specific obligations, to be discharged by an automatic theorem prover, are as follows: F1. If  ,!   0 and index () j= p  1 , then (P)( 0 ). F2. If (P)() and  ,!   0 and index () j= p  2 , then (P)( 0 ). F3. If (P)() and  ,!   0 and  = index () and stmtAt(; ) = (s) and stmtAt( 0 ; ) = (s 0 ), then  ,!  0  0 . B.3.2 Backward Optimizations Consider a backward transformation pattern of the following form: 1 preceded by 2 until s ) s 0 with witness P We require that the witness P match program points in the original and transformed state, or in other words that P =) (index ( old ) = index ( new )). The optimization-specific obligations, to be discharged by an automatic theorem prover, are as follows: B1. If  ,!   old and  ,!  0  new and  = index () and stmtAt(; ) = (s) and stmtAt( 0 ; ) = (s 0 ), then (P)( old ;  new ). B2. If (P)( old ;  new ) and  old ,!   0 old and  old = index ( old ) and  new = index ( new ) and  old j=   2 and stmtAt(;  old ) = stmtAt( 0 ;  new ), then there exists some  0 new such that  new ,!  0  0 new and (P)( 0 old ;  0 new ). B3. If (P)( old ;  new ) and  old ,!   and  old = index ( old ) and  new = index ( new ) and  old j=   1 and stmtAt(;  old ) = stmtAt( 0 ;  new ), then  new ,!  0 . In rule B1, we assume that both programs can step. However, we in fact need to prove that the trans- formed program steps if the original one does, in order to show that the transformed program is semantically equivalent to the original one. Unfortunately, it is not possible to prove this for B1 using only local knowl- edge. Therefore, we allow B1 to assume that the transformed program steps, and we separately prove the property using some additional obligations. We introduce the notion of an error predicate (). Intuitively, the error predicate says what the state of the original program must look like at the point in the trace where a transformation is allowed, if that transformation would get stuck. We then show that the error predicate would continue to hold on the original program throughout the witnessing region, eventually implying that the original program itself will get stuck. So we will have shown that the transformed program gets stuck only if the original one does. We currently infer the error predicate: it is simply a predicate stating the conditions under which the transformed statement s 0 is \stuck" | it cannot take a step. This inference has been suĘcient to automatically prove soundness of all the backward optimizations we have written. However, in our obligations below, we allow an arbitrary error predicate to be specified. B1 0 . If  ,!   old and  6,!  0 and  = index () and stmtAt(; ) = (s) and stmtAt( 0 ; ) = (s 0 ), then ()( old ). 28 fi B2 0 . If ()() and  ,!   0 and  = index () and  j=   2 , then ()( 0 ). B3 0 . If ()() and  = index () and  j=   1 , then  6,!  . B.3.3 Analyses Consider a pure analysis of the following form: 1 followed by 2 defines label with witness P The optimization-specific obligations, to be discharged by an automatic theorem prover, are as follows: A1. If  ,!   0 and index () j= p  1 , then (P)( 0 ). A2. If (P)() and  ,!   0 and index () j= p  2 , then (P)( 0 ). B.4 Metatheory B.4.1 Forward Optimizations Theorem 1 If O is a forward optimization with transformation pattern 1 followed by 2 until s ) s 0 with witness P satisfying conditions F1, F2, and F3, then O is sound. Proof: Let  be an intermediate-language program, p be a procedure in ,   JO pat K(p), and let  be the program identical to  but with p replaced by app(s 0 ; p; ). It suĘces to show that  is a semantically equivalent transformation of . Let c be a constant and M be a memory allocator such that JK(c; M) = v. By definition 6 we must show that also J  K(c; M) = v. Since JK(c; M) = v, by definition 5 we have that (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >) !   ( + 1; f(x; l)g; ; <>;M 0 ) and stmtAt(; ) is defined to be x := main(c) and  62 Indices  and v = (l), where M =< l; l 1 ; l 2 ; : : : >. Define  to act like !  ordinarily, but to act like ,!  when executing a state at some node with index  0 such that some ( 0 ; ) 2 . Then we have that (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >)   1   2       k 1  ( + 1; f(x; l)g; ; <>;M 0 ) for some  1 ; : : : ;  k 1 . To prove that J K(c; M) = v we will show that also (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >)   1   2       k 1  (+1; f(x; l)g; ; <>;M 0 ) where stmtAt( ; ) is defined to be x := main(c). Let  k = ( + 1; f(x; l)g; ; <>;M 0 ). We show by induction on k that every prefix of the trace in  up to  j , for all 1  j  k, is mirrored in  .  Case j=1. Since stmtAt(; ) = stmtAt( ; ) = x := main(c) and (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >)   1 , by definition of  we have (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >) !   1 . Therefore also (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >) !   1 , so (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >)   1 .  Case 1 < j  k. By induction we have that (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >)   1       j 1 . Let index ( j 1 ) =  j 1 . We have two sub-cases: { :9:(( j 1 ; ) 2 ). Then by the definition of  we have stmtAt(;  j 1 ) = stmtAt( ;  j 1 ). Therefore, since  j 1   j , by definition of  we have  j 1 !   j . Then also  j 1 !   j , so  j 1   j and the result follows. 29 fi { 9:(( j 1 ; ) 2 ). Then by the definition of  , there is some  such that stmtAt( ;  j 1 ) = (s 0 ). Then also ( j 1 ; ) 2 JO pat K(p), so we have  j 1 j= p  stmt(i), so stmtAt(;  j 1 ) = (s). By definition of  we know that (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >) !    j 1 , and we also have stmtAt(; ) = x := main(c) and  j 1 2 Indices p . Assume (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >) !   0 1 !     !   0 v where  0 v =  j 1 . Then there must be some t such that 1  t < v and index ( 0 t ) = start p , repre- senting the first statement executed on the same invocation of p as in  j 1 . Then let  00 w ; : : : ;  00 1 be identical to the sequence  0 v ; : : : ;  0 t , but with all states that are not in the same invocation of p as in  j 1 removed. Let index ( 00 x ) =  00 x for all 1  x  w. Then  00 w =  j 1 . It is easy to show that  00 1 ,!     ,!   00 w . Also, by the definition of an intraprocedural CFG we have that n 00 w ; : : : ; n 00 1 represents a backward path in the CFG of p to the entry node. Therefore, since ( j 1 ; ) 2 JO pat K(p), it follows that there exists some r such that 1  r < w and  00 r j= p  1 , and for all q such that r < q < w we have  00 q j= p  2 . First we prove 8q:(r < q  w) ) (P)( 00 q ). We prove it by induction on q:  (base case) q = r + 1. So we have  00 r ,!   00 q and index ( 00 r ) j= p  1 , and the result follows from condition F1.  (inductive case) q > r + 1. By the inductive hypothesis we have (P)( 00 q 1 ). We also know that  00 q 1 ,! p  00 q and, since r < q 1 < w, index ( 00 q 1 ) j= p  2 . Then the result follows from condition F2. So we have shown in particular that (P)( j 1 ) holds. We saw above that  j 1   j , and by definition of  that means  j 1 ,!   j . We also know that stmtAt(;  j 1 ) = (s) and stmtAt( ;  j 1 ) = (s 0 ). Then by condition F3 we have  j 1 ,!   j , so also  j 1   j and the result follows. B.4.2 Backward Optimizations Lemma 1 Let O be a backward optimization with transformation pattern 1 preceded by 2 until s ) s 0 with witness P and error predicate  such that B1 0 -B3 0 hold. Let p be a procedure,  be a program containing p,  2 Indices p , and stmtAt(; ) = (s). Let  be a state such that index () = . Let  0 be a program such that stmtAt( 0 ; ) = (s 0 ) and  6,!  0 . If  ,!   1 ,!     ,!   k and for all 1  j < k we have index ( j ) j= p  2 and index ( k ) j= p  1 , then  k 6,!  . Proof: We will first prove by induction on k that (( j ) holds for all 1  j  k.  Case j = 1. Since  !   1 and  6,!  0 and stmtAt(; ) = (s) and stmtAt( 0 ; ) = (s 0 ), the result follows from B1 0 .  Case 1 < j  k. By induction, assume that ( j 1 ) holds. We are given that  j 1 !   j , and since (j 1)  k we also have that index ( j 1 ) j= p  2 . Then the result follows from B2 0 . So in particular we have shown that ( k ) holds. We are given that index ( k ) j= p  1 , so by B3 0 we have  k 6,!  . Theorem 2 If O is a backward optimization with transformation pattern 1 preceded by 2 until s ) s 0 with witness P with error predicate  and satisfying conditions B1, B2, B3, B1 0 , B2 0 , and B3 0 , then O is sound. 30 fi Proof: Let  be an intermediate-language program, p be a procedure in ,   JO pat K(p), and let  be the program identical to  but with p replaced by app(s 0 ; p; ). It suĘces to show that  is a semantically equivalent transformation of . We define an infinite family of generalized intermediate-language programs as follows. Let  j  denote the program that acts like  for the first j states but henceforth acts like . Formally, we define the transition relation of  j  directly as a relation !  j  on prefixes of execution traces, rather than as a relation on states. Let T = [ 1     r ] denote a partial trace of  j  such that index ( 1 ) 62 Indices  j  and s  j  index(1 ) is a call to main. We say that T !  j  T 0 if and only if T 0 = [ 1     r+1 ], where  r  j )  r !   r+1  r > j )  r !   r+1 Let !   j  denote the reflexive, transitive closure of !  j  . We also define an intraprocedural version ,!  j  in the identical way that ,!  is defined for !  . Finally, we define the semantic function of  j  by the straightforward modification of Definition 5. We prove that for all j  1,  j  is a semantically equivalent transformation of . Since  =  1  and the semantic equivalence relation is transitive, it then follows easily that  is a semantically equivalent transformation of . The proof proceeds by induction on j. For the base case, j = 1. Let c be a constant and M =< l; l 1 ; l 2 ; : : : > such that JK(c; M) = v. By Definition 6 we must show that J 1  K(c; M) = v. By Definition 5 we have that v = (l) and (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >) !   ( + 1; f(x; l)g; ; <>;M) and stmtAt(; ) is defined to be x := main(c) and  62 Indices  . Therefore assume that (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >) !   2 !     !   k !  ( + 1; f(x; l)g; ; <>;M) Let  1 = (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >) and  k+1 = ( + 1; f(x; l)g; ; <>;M). Also let s  n be x := main(c). Then I claim that [ 1 ] !  1  [ 1 ;  2 ] !  1     !  1  [ 1 ; : : : ;  k ;  k+1 ] If we can prove this, then the result follows. We prove inductively that each transition in the above sequence of transitions holds.  Base Case. We must show that [ 1 ] !  1  [ 1 ;  2 ]. We're given that  1 !   2 and stmtAt(; ) = stmtAt( 1  ; ) = x := main(c). Then  1 !   2 , so by the definition of !  1  the result follows.  Inductive Case. By induction we have [ 1 ] !  1  [ 1 ;  2 ] !  1     !  1  [ 1 ; : : : ;  q ], for some 1 < q  k. We're given that  q !   q+1 . Then by the definition of !  1  we have that [ 1 ; : : : ;  q ] !  1  [ 1 ; : : : ;  q+1 ]. For the inductive case, j > 1 and  j 1  is a semantically equivalent transformation of . We will prove that  j  is a semantically equivalent transformation of  j 1  . Let c be a constant and M =< l; l 1 ; l 2 ; : : : > such that JK(c; M) = v. It suĘces to show that J j  K(c; M) = v. Since  j 1  is a semantically equivalent transfor- mation of , we know that J j 1  K(c; M) = v. Then we have that v = (l) and [(; f(x; l)g; f(l; uninit)g; <> ; < l 1 ; l 2 ; : : : >)] !   j 1  [(; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >);  2 ; : : : ;  k ; ( +1; f(x; l)g; ; <>;M)] and stmtAt(; ) = stmtAt( j 1  ; ) is defined to be x := main(c) and  62 Indices  . Therefore assume that [ 1 ] !  j 1  [ 1 ;  2 ] !  j 1     !  j 1  [ 1 ; : : : ;  k+1 ] 31 fi where  1 = (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : >) and  k+1 = ( + 1; f(x; l)g; ; <>;M). For each 1  t  k + 1, let (t; ) denote the following predicate: t > j ^ (index ( j ); ) 2  ^ stmtAt(; index ( j )) = (s) ^ stmtAt( ; index ( j )) = (s 0 ) ^ 8m:((j < m < t ^  j ,!   m ) ) index ( m ) 6j=   1 ) Define  j 1  as a view on the above execution trace, in the following way:  j 1  acts like !  j 1  ordinarily, but it acts like ,!  j 1  when it is either at a state  t such that (t; ) for some , or it is at the state  j , where (index ( j ); ) 2 . Then we have [ 0 1 ]  j 1  [ 0 1 ;  0 2 ]  j 1      j 1  [ 0 1 ; : : : ;  0 z ] where  1 =  0 1 and  k+1 =  0 z . Then I claim that [ 00 1 ]  j  [ 00 1 ;  00 2 ]  j      j  [ 00 1 ;  00 2 ; : : : ;  00 z ] where  j  acts like !  j  ordinarilly, but acts like ,!  j  when it is either at a state  00 y such that  0 y =  t and (t; ) for some , or it is at a state  00 y such that  0 y =  j , where (index ( j ); ) 2 . Further, each  00 y is defined as follows. For each 1  y  z:  If there exists  such that (t; ), where 1  t  k + 1 and  0 y =  t , then (P)( 0 y ;  00 y ).  Else  0 y =  00 y . If we can prove this, then we have that  00 z =  0 z =  k+1 , and the result follows. We prove inductively that each of the partial traces in the sequence above exists.  For the base case, y = 1. We saw above that  1 =  0 1 . Since j > 1, we have 1 6> j, so 8::(1; ). Therefore we must prove that  0 1 =  00 1 . We're given that  1 = (; f(x; l)g; f(l; uninit)g; <>;< l 1 ; l 2 ; : : : > ) and stmtAt(; ) = x := main(c). Therefore [ 0 1 ] is a valid partial trace for  j  .  For the inductive case, y > 1 and [ 00 1 ; : : : ;  00 y 1 ] is a valid partial trace for  j  with each component state meeting the definition above. We must show that there exists  00 y meeting the definition above such that [ 00 1 ; : : : ;  00 y ] is a valid partial trace for  j  . Let t be the integer between 1 and k +1 such that  0 y 1 =  t . There are several cases. { t < j. Since t 6> j, by definition of  we have that  00 y 1 =  0 y 1 . By the definition of  j 1  ,  0 y 1 !   0 y . Then by definition of  j  we have [ 00 1 ; : : : ;  00 y 1 ]  j  [ 00 1 ; : : : ;  00 y 1 ;  0 y ]. Since t + 1 6> j, by definition of  we must show that  00 y =  0 y , so the result follows. { t = j. Then by definition of  we have that  00 y 1 =  0 y 1 . There are two sub-cases.  :9:(index ( j ); ) 2 . Then by definition of  j 1  , we have  0 y 1 !   0 y . Also stmtAt(; index ( j )) = stmtAt( ; index ( j )), so we have  0 y 1 !   0 y . Therefore by defini- tion of  j  we have [ 00 1 ; : : : ;  00 y 1 ]  j  [ 00 1 ; : : : ;  00 y 1 ;  0 y ]. Since :9:(index ( j ); ) 2 , by definition of  we must show that  0 y =  00 y , so the result follows. 32 fi  9:(index ( j ); ) 2 . Then by definition of  , there is some  such that (index ( j ); ) 2  and stmtAt( ; index ( j )) = (s 0 ). Then by definition of  j 1  , we have  0 y 1 ,!   0 y . Further, since   JO pat K, we have stmtAt(; index ( j )) = (s). Let  0 y =  t 0 , for some j < t 0  k + 1. Since  0 y 1 ,!   0 y , we vacuously have that 8m:((j < m < t 0 ^  j ,!   m ) ) index ( m ) 6j=   1 ). Therefore we have shown (t 0 ; ), so by definition of  and  j  we have to show that there exists  00 y such that  00 y 1 ,!   00 y and (P)( 0 y ;  00 y ). We have two cases. Suppose there exists  00 y such that  00 y 1 ,!   00 y . Then by condition B1 the result follows. Now suppose there does not exist  00 y such that  00 y 1 ,!   00 y , so that  00 y 1 6,!  . We are given that (index ( j ); ) 2 . By definition of  j 1  we know that  j 1  acts like  from  j on in the sequence [ 1 ; : : : ;  k+1 ]. Further,  k+1 = ( + 1; f(x; l)g; ; <>;M), where  + 1 62 Indices  . Therefore one of the states between  j and  k+1 exclusive must represent the return node from the same invocation of p as  j . Therefore we have that there exists some j < r  k such that index ( r ) j=   1 and for all t  q < r such that  q is in the same invocation of p as  t , we have index ( q ) j=   2 . Then by Lemma 1 we have  r 6,!  , and we have a contradiction. { t > j. There are two sub-cases.  :9:(t; ). By definition of  j 1  , we have  0 y 1 !   0 y . Therefore  00 y 1 =  0 y 1 , and for all  we have that either (index ( j ); ) 62  or stmtAt(;  j ) 6= (s) or stmtAt( ;  j ) 6= (s 0 ) or 9m:(j < m < t ^  j ,!   m ^ index ( m ) 6j=   1 ). Then also :9:(t 0 ; ), where  0 y =  t 0 , so by the definition of  j  we must show that  00 y 1 !   00 y , where  0 y =  00 y . Since  0 y 1 !   0 y , the result follows.  9:(t; ). Therefore (P)( 0 y 1 ;  00 y 1 ) and by definition of  j 1  , we have  0 y 1 ,!   0 y . We have two sub-cases.  index ( t ) 6j=   1 . Then since  0 y 1 !   0 y , we have 8m:((j < m < t 0 ^  j ,!   m ) ) index ( m ) 6j=   1 ), where  0 y =  t 0 , so (t 0 ; ). Then we must show that  00 y 1 ,!   00 y , where (P)( 0 y ;  00 y ). Since 9:(t; ), we have (index ( j ); ) 2 . We know that  j+1 !     !   t . Therefore either index ( w ) j=   2 for all j +1  w < t such that w is a state in the same invocation of p as  j , or there exists j + 1  w < t such that index ( w ) j=   1 and w is a state in the same invocation of p as  j . Since we saw above that 8m:((j < m < t 0 ^  j ,!   m ) ) index ( m ) 6j=   1 ), it must be the case that index ( t ) j=   2 . Therefore the result follows from condition B2.  index ( t ) j=   1 . Then by the definition of  we have :(t 0 ; ), where  0 y =  t 0 . Since  is the unique substitution such that stmtAt(; index ( j )) = (s) and stmtAt( ; index ( j )) = (s 0 ), we have :9:(t 0 ; ). Therefore by the definition of  j  we must show that  00 y 1 ,!   0 y . The result follows from condition B3. B.4.3 Pure Analyses Let 1 followed by 2 defines label with witness P be a pure analysis. We say that the analysis is sound if for all programs , all procedures p in , all indices  in Indices p , and all substitutions , the following condition holds: If the analysis puts a label of the form (label ) on the node of p's CFG with index , then (P) holds at all program states  of all execution traces of  such that index () = . Theorem 3 If 1 followed by 2 defines label with witness P is a pure analysis satisfying conditions A1 and A2, then the analysis is sound. 33 fi Proof: Identical to the argument used in the proof of Theorem 1 to show that (P) holds at any execution's program state just before a transformed statement is executed. (Note that conditions A1 and A2 are the same as F1 and F2.) 34 fi