Section 3.1 of the July Hoopl paper talks about dataflow facts for constant propagation.

Define a Haskell type, each value of which (except for Haskell

`undefined`

) represents a dataflow fact useful for constant propagation.Define a mapping, using some mix of Haskell and informal notation, from your type from part 1 to a standard logical formula.

Define an

**approximation function**that takes*any*logical formula and computes a "best-effort" approximation that is representible using the restricted logic represented by your solution to part 1.Prove that if you approximate an arbitrary logical formula

`P`

, the approximation is weaker than`P`

(that is, the approximation is implied by`P`

).Prove that for any given program, there are no finite ascending chains of approximations.

Using the representations described in the April Hoopl paper, define a representation of nodes sufficient to represent insertion sort and binary search.

Using your representation, create a Haskell value that represents a graph for insertion sort.

Questions:

In the world of model theory, what is a good way to represent a

*path*within a control-flow graph?Given a general control-flow graph and a label

`L`

, what model-theoretic idea would you use to characterize the set of all possible paths from the entry point to`L`

? In general, is this set finite or infinite?Do paths have a lattice structure? If not, can you extend paths to have a lattice structure? If paths have a native lattice structure, or if they can be extended to have a latice structure, does the lattice structure have any infinite ascending chains?

Using your knowledge of computer science, show that for any program, for any label, the set of all paths from the entry to that label can be represented by a finite term in a domain-specific language called

**Mystery**. What is the language**Mystery**?

(This language is found in many areas of computer science, and it is part of the required curriculum, but it maybe in a course you haven't encountered yet.) What well-known algorithm can be used to construct the appropriate term given a control-flow graph?[Bonus question for outside of class: does the

**Mystery**language have a lattice structure? If so, what is it?]Given a set of recursion equations about paths, can a finite term in the

**Mystery**language be computed by the method of successive approximations?How could you

*approximate*paths in a way that guarantees both a lattice structure and ensures that- There are no infinite ascending chains?
- The method of successive approximations works?

A label `L'`

*reaches* a label `L`

if *some* path from the entry to
`L`

passes through `L'`

. There are efficient special-purpose
algorithms for computing reachability, but we can also compute
reachability through a forward dataflow analysis.

- What is the dataflow fact?
- What assertion (about paths) does it represent?
- What operation corresponds to the disjunction at a join point?
- How would you represent the top and bottom elements of the lattice?
- What should the transfer function look like?

If *every* path from the entry to
label `L`

must pass through another label `D`

,
we say that `D`

*dominates* `L`

.
`D`

is called a *dominator* of `L`

.
There are efficient special-purpose
algorithms for computing dominators, but we can also compute
dominators through a forward dataflow analysis.

- What is the dataflow fact?
- What assertion (about paths) does it represent?
- What operation corresponds to the disjunction at a join point?
- How would you represent the top and bottom elements of the lattice?
- What should the transfer function look like?
- For a control-flow graph of N labels, approximately what is the worst-case space cost?

Can you conclude anything about loops?

Dominator exercises:

Suppose that both

`D`

and`D'`

dominate`L`

. Prove that exactly one of the following three relations holds:`D`

equals`D'`

`D`

dominates`D'`

`D'`

dominates`D`

Hint: proof by contradiction.

Suppose that

`I`

dominates`L`

.`I`

is called an*immediate dominator*of`L`

if whenever`D`

dominates`L`

, either`D = I`

or`D`

dominates`I`

.Prove that if

`L`

has an immediate dominator, the immediate dominator is unique.Prove that if

`L`

has a dominator,`L`

has an immediate dominator.

Prove that the set of labels dominating

`L`

can be represented as a list with the following properties:If the set of dominators is empty, the list is

`[]`

If the set of dominators is nonempty, the list has the form

`d:ds`

, where`d`

is the immediate dominator of`L`

, and`ds`

is the list representing the set of labels that dominate`d`

.

Show that the list representation has a lattice structure and that it is an

*exact*"approximation" of the representation you used above in its inefficient form.Show that the lattice

`inf`

for the list representation**can be computed without allocating any cons cells**.To compute dominators with the list representation, what should the transfer function look like?

What is the total space cost to calculate dominators using the list representation?

The results of a dominator analysis can be represented as a tree:

```
data DominatorNode = Entry | Labelled Label
data DominatorTree = Dominates DominatorNode [DominatorTree]
```

This data structure is a *rose tree* in which each node may have
arbitrarily many children. The root of the tree is the `Entry`

constructor, and each label is a child of its immediate dominator.

Using Hoopl, write a forward dataflow analysis that computes dominators at each node in a control-flow graph, using the list representation

`[Label]`

above.Write code to take the set of all lists and compute a dominator tree:

`dominatorTree :: [[Label]] -> DominatorTree`

Your code need not work on arbitrary lists of

`Label`

s, but it must work on lists that come out of your analysis in part 1. (*Hint*: reverse the lists.)

Dataflow analysis is strictly more powerful than Hoare logic, because
dataflow analysis can reason about *paths*. This kind of reasoning is
needed to identify induction variables.

The simplest form of **basic induction variable** is one whose value describes
an *arithmetic progression* during the course of the loop. As
mathematical formula, we can say that `i`

is an induction variable if
its value at the loop header satisfies

```
i == v + c * k
```

where `v`

is any value, `c`

is the **count** of the loop iterations,
and `k`

is a constant. (More generally, `k`

can be a loop-invariant
expression, but for now, let's not go there.)

Dataflow analysis doesn't know from loops, but perhaps we can
subsituted some *all-paths reasoning*. For `i`

to be an induction
variable at a particular point `L`

, what has to be true?

For all paths from the procedure entry to point

`L`

,*not*including`L`

, what must be true about`i`

?For all paths from

`L`

to itself,*not*including`L`

, what must be true about`i`

?How would you express this property as a logical formula, universally quantified over program paths?

How would you express this property in a forward dataflow analysis?

What would the transfer functions look like?

Back to class home page