Lecture notes for COMP 150-DFO (Dataflow Analysis and Optimization)

Table of Contents

25 Jan 2010: Introduction

An unusual kind of course; I haven't done something like this in five or six years.

Spectrum of courses:

I set up this course with the goal of creating new knowledge about dataflow analysis and optimization.

Problem: new knowledge builds on existing knowledge

(You have to get to the frontier before you can extend it.)

By and large, the existing knowledge is not in our curriculum.

And all of you come with different knowledge in your heads.

So the plan is something like this:

  1. Get everybody up to speed. I'm hoping this can be done in four weeks.

  2. Spend the rest of the term on projects.

Each of you can pick your own projects; John and I are here to help you pick good ones. And you can work in teams of any size. At the end of the term you'll give a juried presentation and about a week later you'll turn in a paper.

What I hope to accomplish today:

  1. Talk about logistics

  2. Explain what our obligations are and how this class is different from other classes you may have taken

  3. Tell you why I thought this would be a very cool topic for an exploratory course

  4. Teach a little bit of material

Logistics

I imagine the course is going to proceed in three phases:

Phases Two and Three are the reason this class has a special exemption from the registrar to run for four hours per week instead of two and half—75&minutes simply isn't enough time to grapple with a research paper or present and discuss a problem you are working on. In Phase One I hope to give an hour of lecture and then follow up with an hour of in-class exercises.

How is this class different from other classes? What are our obligations?

What kind of class is this?

Evaluation

I suspect some of this sounds intimidating. Let me assure you it is way more fun than writing problem sets and exams. We've got a huge playground, and you're invited to find whatever you like best on the playground and do something interesting with it.

What's cool about dataflow?

Preface: why I hated this subject in school

But languages people like things to be compositional, and there is another view:

Note: there is a third kind of transformation, which is not local: a change of representation (e.g., lambda lifting, closure conversion, unboxing). These kinds of transformations are very important for functional languages and for very dynamic object-oriented languages like Self. They are more or less forbideen in C and C++, and they are unlikely in Java.

The key assumption underlying this line of work (and therefore this course) is that

Although the "optimization zoo" in the Dragon book is ugly, we believe we see connections to the semantics of programmings languages and to the logic of programs, which is beautiful.

So far, while semanticists have been thinking beautiful thoughts, compiler writers have been writing ugly code. Our goal is twofold:

  1. Create beautiful code that reflects the beautiful thoughts, which can nevertheless be used to write optimizing compilers.

  2. Change the way compiler writers think about their craft.

Quick sketch of project opportunities

Can we really make it easy to implement optimizations? Advanced onces?

What optimizations combine well?

How to ensure soundness?

Is our stuff really better than existing stuff?

Build a compiler to try out ideas

Play around with the Glasgow Haskell Compiler (optimization and code generation)?

Explore whole-program optimization for a language like Lua or Javascript.

How do you test an optimization?

How do you test an optimization library?

What's your background?

Hear from each student. Tell us

27 Jan 2010: Language semantics: recursion equations and fixed-point theory

p1273842 p1273843 p1273844 p1273845 p1273846 p1273847

All a dataflow analysis does is solve recursion equations. But they are useful recursion equations because the solutions tell you facts about your program. We're going to start with something simpler: recursion equations for defining data types. Here's the plan:

  1. Recursive data types
  2. Recursive functions
  3. Recursive dataflow equations

Same methods everywhere!

What is a list of integers? How about a solution to this equation?

List(Int) = 1 + Int * List(Int)

Or informally

A list of integers is either empty or it is an integer followed by a list of integers Perhaps surprisingly, this equation does not have a unique solution!

Notation:

1       -- the unit type, which has exactly one element
        -- for this exercise we'll pronounce the element "nil"

A + B   -- either an A or a B, and it's tagged whether the thing
        -- came from the left or the right

Example

Bool = 1 + 1

More notation

A * B = { (x, y) | x in A, y in B }

I'll show you how to construct the solution used in most programming languages:

L0 = { }  -- the empty set
L1 = 1 + Int * L0
L2 = 1 + Int * L1
 ...
Ln = 1 + Int * L_{n-1}

Now let L = union over n>=0 of L_n.

I claim L solves the recursion equation for List(Int)

Exercise: prove it (right now)

Notice something:

Ln = f (Ln - 1)

where

f = \ S . 1 + Int * S

(who has or has not seen lambda?)

Notice that f has no recursion in it; f is a perfectly well defined mathematical function. (Dana Scott nailed this down in the 1970s.)

I claim that if a set H has the property that f(H) == H, then H is a solution to the recursion equation for list of integers.

Exercise: prove it (right now)

Actually I claim this is "if and only if"

Exercise: prove it (right now)

Notice

L0 = { }
L1 = f(L0)
L2 = f(f(L0))
L3 = f(f(f(L0)))

and so on!

L = union over n>=0 of L_n

Question: is the list of all even numbers a member of L?

Exercise: prove your answer!

Least fixed points (exercise)

The technique just shown is constructive, and it computes the least fixed point of a function. To have a least fixed point, the function must be

For set-theoretic functions, the least element is the empty set.

The least fixed point is the smallest solution to the equations, and it contains all the others.

What's "constructive"?

Consider the powerset construction: P(A) is the set of all subset of A. Assuming that given a set, we can enumerate its elements but not its subsets, the following definition is not constructive:

Power(A) = { S | S subset A }

Let's make it constructive by looping only over elements:

Power(A) = { { } } union { {x} union S | x in A, S in Power(A) }

Exercise: Compute Power(Bool)

Exercise:

Considering this equation:

PB = { { } } union { {x} union S | x in Bool, S in PB }

does it have any solutions other than Power(Bool)?

1 Feb 2010: From semantics to types to programming

p2013848 p2013849 p2013850

I have written a handout on the refinement ordering and fixed points.

Problems from last time:

  1. Show that forall S : g(S) = S if and only ifS = 1 + Int * S`. Show the same for isomorphism.

    (Proof technique for implying an equality or other relation: Assume the left-hand side as hypothesis, then use calculation to show the relation.)

    Example: show that if g(S) = S, then = S = 1 + Int * S:

     S
       = [by hypothesis]
     g(S)
       = [by definition of `g`]
     1 + Int * S
    

    The opposite direction is similar.

  2. Let e be the list of all even numbers. Is e in Lint, where Lint = union over n >=0 of L_n?

No.

  1. Prove your answer.

Simplest proof: for all n, every element of L_n is finite.

By the axiom of choice if l in Lint, the there exists an n such that l in L_n. Therefore l is finite.

Because e is not finite, it can't be in L_n

  1. Consider this recursion equation for power sets of A:

     Power(A) = { {} } union { {x} union S | x in A, S in Power(A) }
    

    Construct the least solution for Power(Bool) using the technique of sequential approximation from class.

  2. Does equation 4 have other solutions for Power(Bool)?

    Solution: start fixed-point construction with the set {yellow}:

     Y0 = { yellow }
     Y1 = { {}, {true, yellow}, {false, yellow} }
     Y2 = { {}, {true}, {true, yellow} , {true, false, yellow}, 
                {false}, {false, yellow} }
     Y3 = { {}, {true}, {true, yellow} , {true, false, yellow}, 
                {false}, {false, yellow}, {true, false} }
     Y4 = Y3    
    

From set theory to type theory to programming

So far we have zero, one, cross (times), plus, and primitive sets like Int and Bool. This mathetmatics is analogous to what Richard Bird calls "wholemeal programming"---dealing with big values. Functinoal programmers like it.

We also have explicit set constructions using set comprehensions and the element-of or in operator. This mathematics is analogous to writing recursive functions that poke at data structures.

Now let's make this analogy more explicit.

Never mind what it means; let's look at syntax.

Math:

Lint = 1 + Int * Lint

Standard ML:

datatype lint = Nil | Cons of int * lint

Haskell:

data Lint = Nil | Cons Int Lint

Digression: curried vs uncurried functions

We have't talked yet about the arrow constructor. Consider the space of monotonic, continuous functions from A to B:

A -> B

Proposition:

A * B -> C is isomorphic to A -> B -> C

Proof: homework

Math:

ML and Haskell:

Is this function guaranteed to terminate?

(* SML *)
fun intlength Nil = 0  
  | intlengh (Cons (n, ns)) = 1 + intlength ns

(* Haskell *)
intlength Nil = 0
intlength (Cons n ns) = 1 + intlength ns

Answers:

For coinduction, see section 1 of the tutorial paper by Andy Gordon.

Intuition: the largest set satisfying the equations.

The meaning of recursion:

Question: Let Linf be the set of all finite and infinite lists of integers. Does Linf solve

Linf = 1 + Int * Linf ?

Question: Suppose you wanted to define a set containing only the infinite lists of integers (no finite lists). What recursion equation would you write?

Dealing with coinductive types

Worlds of infinite values:

Both semantics and code have a word for this: "bottom"

|

One notation, many meanings:

New partial order: refinement

Refinement comes with its own:

Consider this recursion equation, in the value domain of lists of integers:

lintmap f Nil = Nil
lintmap f (Cons n l) = Cons (f n) (lintmap f l)

e = 0 `Cons` lintmap (2+) e

What is e?

Fixed-point construction!!! Starting at _|_.

3 Feb 2010: Introduction to formal languages --- Kinding judgment and well-kinding rules

p2033852 p2033853 p2033854 p2033855 p2033856 p2033857 p2033858 p2033859 p2033860

Prelude: fixed points

Least and greatest fixed points:

Real truths about Haskell:

  1. I went to the experts.

  2. The experts spake with forked tongue.

  3. If you're willing to pull the bottom element like a rabbit out of a hat, and if you're willing to engage in a lot of mystical juju like

     LI = (1 + (Int * LI)bot)bot
    

    then you can recover Haskell types using induction rather than coinduction, and moreover (at least in some cases) you can prove that the greatest and least fixed points are the same.

I found this grotesque!

On coinduction, the jury is still out.

Prelude II: syntactic vs semantic theories of values

The handout is gobbledegook.

Do you want another?

Types and Kinds

Parametric polymorphism in math:

List(A) = 1 + A * List(A)

Parametric polymorphism in Haskell:

data List a = Nil | Cons a (List a)

Special notation for built-in lists!

data [a] = []  |  : a (List a)

8 Feb 2010: Introduction to type-directed programming

p2083866 p2083867 p2083868 p2083869 p2083870 p2083871 p2083872 p2083873 p2083874 p2083875 p2083876 p2083877 p2083878

Context (looking ahead):

Coming attraction: Concept map

Foundations:

Type-based programming and parametric polymorphism

Consider this definition:

data Maybe a = Nothing | Just a

What functions have type

forall a . Maybe a -> Bool

Discover by type-directed programming

Polymorphic functions on lists:

p :: forall a b . (a -> b) -> [a] -> [b]

q :: forall a . (a -> Bool) -> [a] -> [a]

r :: forall a . (a -> Bool) -> [a] -> Bool

u :: forall a b . b -> (a -> b -> b) -> [a] -> b

Type-directed development:

p :: (a->b) -> ([a]->[b])

Introduction rule for arrow:

Gamma, x : tau |- e : tau'
-----------------------------
Gamma |- \x -> e : tau -> tau'

So to make a value arrow type, lambda!

p = \ (f::a->b) -> hole
       where hole :: [a] -> [b]

Similarly, hole has arrow type

             hole = \as -> hole'

Now what is the type of as?

What is the elimination form for an algebraic datatype?

             hole = \as -> case as of
                              []   -> empty
                              a:as -> nonempty

What's the type of nonempty?

How could we possible produce a value of that type?

Does it use all the arguments?

OK, what about nonempty? What are the arguments in scope we need to use?

How can we use them?

What do we want in the result?

What is the introduction form for an algebraic data type?

What's the right constructor in this case? At what type should it be used?

Principles of type-directed programming:

  1. Driven by

  2. If the required type has a unique introduction form (arrow, product), use it.

  3. If the required type has multiple introduction forms (any algebraic data type with multiple constructors), then look for a form which consumes case-bound or lambda-bound variables in the environment.

  4. To consume a value, look for

Introduction and elimination forms

Arrow: introduced by lambda and eliminated by application

Product (rare): introduced by (x1, x2) and similar; eliminated by let binding

Sum (algebraic data type): introduced by constructor and eliminated by case.

Syntactic sugar!

Instead of

length = \l -> case l in [] -> 0 _:xs -> 1 + length xs

Absorb lambda in to definition:

length l = case l in [] -> 0 _:xs -> 1 + length xs

Convert top-level case into multiple equations (most idiomatic!):

length [] = 0 length (_:xs) = 1 + length xs

More syntactic sugar: special elimination form for Bool:

So, all we really have in Haskell is

Exercises

In class: Design a Haskell data type to represent formulas in propositional logic. You have choices.

In class: Design a Haskell data type to represent formulas in each of the following normal forms:

In class: Derivations for q, r, and u. None are uniquely determined, but some have an obvious "best simple" implementation.

Homework: Haskell finger exercises

10 February 2010: Consolidation by concept mapping

p2103879 p2103880 p2103881 p2103882 p2103883 p2103884 p2103885 p2103886 p2103887 p2103888 p2103889 p2103890 p2103891 p2103892 p2103893 p2103894 p2103895 p2103896

Concept map:

You can see Norman and John's concept map in PDF or SVG:

Norman also wrote up a partial glossary of concepts.

17 February 2010: List functions from types; Boolean solvers

p2173897 p2173898 p2173899 p2173900 p2173901 p2173902 p2173903

18 February 2010: Boolean solvers and list functions revisited; generalizing to expressions

p2183904 p2183905 p2183906 p2183907 p2183908 p2183909 p2183910 p2183911

Important Haskell constructs:

Useful ideas in the space of Boolean solvers:

22 February 2010: Introduction to program logic

p2223912 p2223913 p2223914 p2223915 p2223916 p2223917

Paper, Tony Hoare: Axiomatic Basis for Computer Programming

Rules:

Computational method (Dijkstra): weakest preconditions

Example: insertion sort

ASCII art:

  0                           j               k                n
+-------------------------+     +----------+ - - - - - - - +  
|                         |     |          |               |
+-------------------------+     +----------+ - - - - - - - +

24 February 2010: Type Classes, Monads, and the State Monad

p2243918 p2243919 p2243920 p2243921 p2243922 p2243923 p2243924 p2243925

This lecture was delivered impromptu in response to students' questions.

Quick summary

Monads allow one to encapsulate computations and run them in sequence. A computation "does something" and then produces a result.

Every monad must provide return and >>= (bind) operators

The return and bind operations are not enough to have useful computations. For additional utility, each monad general provides specific primitives, and usually a "run" operation. (The IO monad is unique in not providing a run operation within the program. Instead, the value of main has type IO (), and the Haskell runtime system runs the IO action.)

Example IO primitives:

getChar :: IO Char
putChar :: Char -> IO ()

Copy a character from standard input to standard output, forcing to upper case:

copyUpper :: IO ()
copyUpper = getChar >>= \c -> return (toUpper c)

This function can also be written using monadic do notation, which makes the lambda-bound variables much easier to read:

copyUpper = do c <- getChar
               return (toUpper c)

We then discussed the state monad:

type StateMonad a = State -> (State, a)

We implemented return and bind operations, and we also defined some of these primitives:

get :: StateMonad State
put :: State -> StateMonad ()
run :: StateMonad a -> State -> a  

Also observe

run' :: StateMonad a -> State -> State -- run computation, ignore result,
                                       -- and return final state
run' c = run $ c >>= \_ -> get

Finally, after class, Eddie asked about using this in the simplifier. Norman suggested

simplify' :: Exp -> StateMonad Exp   -- uses 'put' but never 'get'

finish :: a -> StateMonad (Maybe a)
finish answer = do s <- get
                   case s in HasChanged    -> return $ Just answer
                             Hasn'tChanged -> return Nothing

Then we can write

simplify :: Exp -> Maybe Exp
simplify e = run $ simplify' e >>= finish

1 March 2010: Code reviews

No notes, no photos.

3 March 2010: Preparation for type inference

p3033926 p3033927 p3033928 p3033929 p3033930 p3033931 p3033932 p3033933

Type inference

Algorithm: given a term e, return an environment Gamma and a type tau such that

Gamma |- e : tau

is a derivable judgement of the type system, and moreover, Gamma and tau are principal.

  1. What does it mean to be principal?

  2. Is a principal type unique?

  3. An example: give principal typing of

     if x == y then 1 else 0
    

    (Converts Bool to Int.)

From inference rules to algorithm

Let's take a closure look at the computation of the principal typing of

x == y

8 March 2010: Return to program logic (Insertion sort: the thrilling conclusion)

p3083934 p3083935 p3083936 p3083937 p3083938 p3083939 p3083940 p3083941 p3083942 p3083943 p3083944

Final version of insertion sort (not complete; see photos)

void isort(int *a, int n) {
  // PERM(a, in a) /\ n >= 0   -- precondition
  int k = 0;

  // OUTER INVARIANT:
  //   { PERM(a, in a) /\ SORTED[0..k) /\ 0 <= k <= n }
  while (k != n) {
    k = k + 1;
    int j = k - 1;
    int elem = a[j];

    // INNER INVARIANT:
    //   { PERM(???, in a) /\ SORTED[0..j) /\ SORTED[j+1..k)
    //     /\ BOUNDS => elem <= a[j+1]
    //     /\ ???
    //   }
    while (j != 0 && elem > a[j-1]) {
        //
        // ???
        //
    }
    a[j] = elem;
  }
}

Definition and proof rules for SORTED[x..y)

Definition:

Proof rules

SORTED[x+1..y)    a[x] <= a[x+1]   0 <= x
-------------------------------------------   Extend-Left
         SORTED[x..y)

SORTED[x..y-1)    a[y-2] <= a[y-1]  y <= n
-------------------------------------------   Extend-Right
         SORTED[x..y)

SORTED[x..w)   SORTED[w..y)  a[w-1] <= a[w]
-------------------------------------------   Split
         SORTED[x..y)

Logic and proof rules for array update

Hoare triple for array assignment:

P[a |--> a with [j] -> e]   { a[j] := e }   P

Also written

P[a |--> update(a, j, e)]   { a[j] := e }   P

Where we have proof rules for indexing:

          x = y
-------------------------
  update(a, x, e)[y] = e

         x != y
-------------------------
update(a, x, e)[y] = a[y]

Exercise: binary search

10 March 2010: Program logic continued---binary search

p3103945 p3103946 p3103947 p3103948 p3103949 p3103950 p3103951 p3103952

Solutions to binary search

Lattice of Boolean formulas

15 March 2010: The method of weakest preconditions

p3153960 p3153961 p3153962 p3153963 p3153964 p3153965 p3153966 p3153967 p3153968

Given statement S, there are many triples

P {S} Q

Which one is the best?

Let us take Q, the goal, as given.

Then P' is the weakest precondition of command S and postcondition Q.

Weakness and strength

Table (as logical formulas)

A formula is a predicate describing sets of states.
Extend the view to sets

Q: In the Venn diagram, what are the weakest conditions?
(helps keep story straight)

Connection #1: our old friend the subset relation. So relative weakness/strength (impliciation) is a partial order.

Do weakest preconditions exist?

The argument from logical formulas is dubious.

Suppose P1 and P2 are preconditions. That is

P1 {S} Q

P2 {S} Q

what can we conclude?

Q: Given any two formulas P1 and P2, do they have a greatest lower bound? That is

P1 => B
P2 => B

and

Yes!

I think this conclusion can be derived syntactically, without appealing to an underying model.

So let's conclude

P1 \/ P2 {Q} S 

This is much easier to understand if we appeal to the underlying model of machine states!

Lattice structure of Boolean predicates

Let's draw a picture...

What about infinite chains?

Consider the infinite set of Pk such that

Pk {S} Q

Does an infinite set of formulas have a greatest lower bound?

A: No!

x = 3 or x = 5 or x = 7 or x = 11 or ...

What about the model?

Consider this definition:

Q: Does this set exist?

A: Yes! (by appeal to the model)

Q: Is it necessarily representable by a formula?

A: No! Consider a primality test. (And note decidability issues)

Conclusion: wp has to be approximated

How can we use wp to replace the proof system for P {S} Q?

Examples of weakest preconditions

Dataflow analysis is all about mutation and sequencing, so let's start there:

wp(x := e, Q) = Q[x |--> e]

wp((S1; S2), Q) = wp(S1, wp(S2, Q))  -- like continuations

wp(if B then St else Sf, Q) = (B /\ wp(St, Q)) \/ (!B /\ wp(Sf, Q))

or B => wp(St, Q) /\ !B => wp(Sf, Q)

We can check to make sure the two formulas are the same by enumerating the truth table:

class Predicate a where
  tautology :: a -> Bool

instance Predicate Bool where
  tautology b = b

instance Predicate b => Predicate (Bool -> b) where
  tautology p = tautology (p True) && tautology (p False)


classf b t f = (b && t) || (not b && f)
nrf    b t f = (b `implies` t) && (not b `implies` f)

same = tautology check 
  where check b t f = classf b t f == nrf b t f

implies p q = not p || q

The empty statment is traditionally written skip.

wp(skip, Q) = Q

17 March 2010: Weakest preconditions, loops, fixed points

p3173969 p3173970 p3173971 p3173972 p3173973 p3173974 p3173975 p3173976

Review of weakest preconditions: let's curry:

wp :: C -> Formula -> Formula
wp (Gets x e)  = subst x e
wp Skip        = id
wp (Seq s1 s2) = wp s1 . wp s2
wp (If c t f)  = (const c /\ wp t) \/ (const (com c) /\ wp f)

This code compiles.

We have

class Predicate a where
    (/\)      :: a -> a -> a
    (\/)      :: a -> a -> a
    com       :: a -> a

instance Predicate Formula where ...

instance Predicate b => Predicate (a -> b) where
    p /\ q = \f -> p f /\ q f
    p \/ q = \f -> p f \/ q f
    com f = com . f

Homework:

Loops upset the apple cart

Review: reduction of structured control flow to conditional and unconditional goto.

Idea: each label L is associated with a predicate P_L

wp(goto L, Q) = ???

wp(if B then goto Lt else goto Lf, Q) = ???

Consider this loop

while n > 0 do n := n - 1 end

What is

wp(while n > 0 do n := n - 1 end, n = 0)?

Break down to basic blocks

Loop:
  if n > 0 then goto Body else goto End;

Body:
  n := n - 1
  goto Loop;

End:

To compute

wp(while n > 0 do n := n - 1 end, Q), 

Each basic block contributes a recursion equation: Here Q is n = 0

At the board:

29 March 2010: Basic blocks and control-flow graphs; strongest postconditions

p3293978 p3293979 p3293980 p3293981 p3293982 p3293983 p3293984 p3293985

Review of previous class

We computed

wp(while n > 0 do n := n - 1 end, n = 0), 

in several steps:

  1. Break the loop down into basic blocks

     Loop:
       if n > 0 then goto Body else goto End;
    
    
     Body:
       n := n - 1
       goto Loop;
    
    
     End:
    
  2. Each basic block contributed a recursion equation

  3. By substitution, got a recursion equation for

     P_Loop = (n > 0 /\ P_Loop [n -> n - 1]) \/ n = 0
    
  4. Solved by successive approximations:

     P0 = true
    
    
     P1 = (n > 0 /\ true) \/ n = 0
        = n >= 0
    
    
     P2 = (n > 0 /\ (n>=0)[n |-> n - 1]) \/ n = 0
        = (n > 0 /\ (n-1>=0)) \/ n = 0
        = (n > 0 /\ (n>0)) \/ n = 0
        = n > 0 \/ n = 0
        = n = 0
    

Representations of programs for analysis

Special-case algorithms exist for

But it's pretty easy to solve the general case

General case arises out of the nature of the von Neumann machine

control transfers and computation can be observed

branch targets are either given by the compiler, recovered through analysis, or in the case of JIT compilation, garnered through observation.

Program logic hinges on predecessors and successors.

(Many/many is obtained through composition; we rule it out to keep compilers simple---there are no foundational difficulties.)

Exercise at the board: how do weakest preconditions work, in the general case, for each of these three kinds of constructs? (Hint: the results of computing wp is a set of recursion equations.)

Structure of a program

Arbitrary sequence of assignment, control transfer, label

Main step is to * replace fall through with conditional branch*.

Program as regexp (Assignment, Branch, Label)

  (A*(BL))*A*

Not very insightful. But let's refactor:

(A*B)? (LA*B)* (LA*)?

We now have entry, body, exit. Notice

Finally, labels are unique

Assignment at board: reduce insertion sort (printout) to basic blocks.

Strongest postconditions

Weakest preconditions are about goal-directed programming:

  1. Start with some postcondition we are trying to established (the array is sorted).

  2. Put some command in front of that postcondition to get to a precondition that is easier to establish.

  3. Eventually we will get to a precondition that is so weak that it is trivially established by a procedure's entry condition.

  4. Presto! We have constructed a procedure.

As a method of software construction, this is ideal: we reason backward from a known goal to something easily established.

Problem: the compiler seldom has a goal in mind, and when it does, has no guarantee the goal and be achieved.

Mathematically unsatisfying:

This is the method of strongest postconditions.

If weakest pre is about what we are trying to achieve, then strongest postconditions is about what we can conclude from the code we're given.

Problem: define the predicate transformer sp. (include assertions)

Note: By far the most difficult issue is what to do about an assignment x := e. In particular, what happens to our knowledge about the previous value of x?

31 March 2010: More dataflow analysis; approximating formulas

p3313986 p3313987 p3313988 p3313989 p3313990 p3313991 p3313992 p3313993 p3313994 p3313995

From structured control flow to arbitrary control flow

No analog of P {S} Q. Instead, basic blocks:

     \ | /
       L
       |
     Asst
       |
     Asst
       .
       :
     Asst
       |
    Branch
     / | \
   L1  L2 L3

Method for analysis:

  1. Associate a formula with each label

  2. Compute recursion equations

  3. Solve recursion equations by method of successive approximations

Questions:

More ambitious problem: eliminating array-bounds checks

Strongest postcondition may imply that assertion always holds.

void isort(int *a, int n) {
  L0:;
    int k = 0;
    goto L1;
  L1:
    if (k != n) goto L2; else goto L3;
  L2:
    k = k + 1;
    int j = k-1;
    assert(0 <= j && j < n);
    int elem = a[j];
    goto L6;
  L6:
    if (j != 0) goto L7; else goto L5;
  L7:
    assert(0 <= j-1 && j-1 < n);
    if (a[j-1] > elem) goto L4; else goto L5;
  L4:
    assert(0 <= j && j < n);
    assert(0 <= j-1 && j-1 < n);
    a[j] = a[j-1];
    j = j - 1;
    goto L6;
  L5:
    assert(0 <= j && j < n);
    a[j] = elem;
    goto L1;
  L3:
    return;
}

Can the compiler tell if array access is in bounds?

5 April 2010: Approximation of formulas as dataflow facts

p4053996 p4053997 p4053998 p4053999 p4054000 p4054001 p4054002

Note: Hoopl is not quite ready for use.

Focus on approximation of facts:

Lecture: GADTs and open/closed nodes; algebra of control-flow graphs

Array-bounds checks to wait.

7 April 2010: Review of lattice theory; the constant-propagation lattice; facts about paths

p4074003 p4074004 p4074005 p4074006 p4074007 p4074008 p4074009 p4074010 p4074011 p4074012 p4074013 p4074014 p4074015 p4074016

Lattices

A lattice is a partial order with two operations (meet and join)

least upper bound == join == sup == vee (corresponds to forward data flow)

greatest lower bound == meet == inf == wedge

I can never remember the meaning of meet, join, vee, wedge. I stick with "lub" and "glb", or if I'm feeling daring, "sup" and "inf".

What makes it a lattice is that inf and sup are always defined.

A bounded lattice also has the two distinguished elements top and bottom, each of which is related to all the elements in the lattice.

Any finite lattice is automatically bounded. If you get an infinite unbounded lattice you can always get a bounded lattice by a there is always a top and bottom.

The powerset of any set forms a bounded lattice.

Lattice orientation is conventional

Given any lattice L', I can construct a new lattice L' which is L "turned upside down." Thus, what you view as top and bottom is purely a matter of convention. No end of confusion.

What happened Monday

  1. Strongest postconditions.
  2. Method of successive approximations gives a least solution (closest to the bottom).
  3. Therefore the bottom had better be the strongest possible formula (FALSE!)

Compositional approximation of a general formula in the lattice:

A[| TRUE |]       =  T
A[| FALSE |]      = _|_
A[| f /\ f' |]    = A[| f |] `inf` A[| f' |]
A[| f \/ f' |]    = A[| f |] `sup` A[| f' |]
A[| x = k   |]    = [ x |--> Const k ]
A[| k = x   |]    = [ x |--> Const k ]
A[| all others |] = T

Disjunctive normal form may lend intuition but it is superfluous.

Implementation using the Presburger package, but the package is not entirely helpful...

Show the code

12 April 2010: Control-flow graphs and paths

p4124019 p4124020 p4124021 p4124022 p4124023 p4124024 p4124025

Hoare logic ignores history:

Open/closed; algebra of control-flow graphs

Dataflow logic can remember information about paths:

Postorder depth-first numbering

Identifying loop headers

14 April 2010: Reachability and dominance

p4144026 p4144027 p4144028 p4144029 p4144030 p4144031 p4144032 p4144033 p4144034 p4144035 p4144036

Work at the board computing reaching sets and dominators.

21 April 2010: Project planning

p4214041 p4214042 p4214043 p4214044 p4214045 p4214046 p4214047 p4214048 p4214049 p4214050 p4214051 p4214052

26 April 2010: Project advice

p4264053 p4264054 p4264055 p4264056 p4264057 p4264058 p4264059 p4264060

Plan for Weds:

28 April 2010: Project show and tell with external jury

Presentations will be in Halligan 108. Directions to Halligan are online at http://www.cs.tufts.edu/~nr/directions/; visitors, please try to arrive by 1:20, or if you will be driving and need a legal place to park your car, please arrive by 1:10.

The schedule for the afternoon is as follows:

WhenWhoWhat
1:30Norman RamseyIntroduction to Hoopl
2:00Chris RiceTesting Hoopl clients
2:30Eddie Aftandilian, Nathan RicciTesting optimizations
3:00Sound BitesBreak
3:15John Mazella Lazy Code Motion
3:45Jessie Berlin, John Sotherland Strength Reduction and Induction Variables
4:15Everyone Mingling
4:30Event ends

Visitors who are able are invited to stay for dinner.

3 May 2010: Wrapup

p5034061 p5034062 p5034063

Discussion of how to think about strength reduction and induction variables at a higher level of abstraction than Cocke and Kennedy.

Further discussion of loops and how to identify them.


Back to class home page