Section 3.1 of the July Hoopl paper talks about dataflow facts for constant propagation.
Define a Haskell type, each value of which (except for Haskell
undefined
) represents a dataflow fact useful for constant propagation.
Define a mapping, using some mix of Haskell and informal notation, from your type from part 1 to a standard logical formula.
Define an approximation function that takes any logical formula and computes a "best-effort" approximation that is representible using the restricted logic represented by your solution to part 1.
Prove that if you approximate an arbitrary logical formula P
,
the approximation is weaker than P
(that is, the approximation
is implied by P
).
Prove that for any given program, there are no finite ascending chains of approximations.
Using the representations described in the April Hoopl paper, define a representation of nodes sufficient to represent insertion sort and binary search.
Using your representation, create a Haskell value that represents a graph for insertion sort.
Questions:
In the world of model theory, what is a good way to represent a path within a control-flow graph?
Given a general control-flow graph and a label L
, what
model-theoretic idea would you use to characterize the set of all
possible paths from the entry point to L
? In general, is this
set finite or infinite?
Do paths have a lattice structure? If not, can you extend paths to have a lattice structure? If paths have a native lattice structure, or if they can be extended to have a latice structure, does the lattice structure have any infinite ascending chains?
Using your knowledge of computer science, show that for any
program, for any label, the set of all paths from the entry
to that label can be represented by a finite term in a
domain-specific language called Mystery. What is the language
Mystery?
(This language is found in many areas
of computer science, and it is part of the required curriculum,
but it maybe in a course you haven't encountered yet.)
What well-known algorithm can be used to construct the appropriate
term given a control-flow graph?
[Bonus question for outside of class: does the Mystery language have a lattice structure? If so, what is it?]
Given a set of recursion equations about paths, can a finite term in the Mystery language be computed by the method of successive approximations?
How could you approximate paths in a way that guarantees both a lattice structure and ensures that
A label L'
reaches a label L
if some path from the entry to
L
passes through L'
. There are efficient special-purpose
algorithms for computing reachability, but we can also compute
reachability through a forward dataflow analysis.
If every path from the entry to
label L
must pass through another label D
,
we say that D
dominates L
.
D
is called a dominator of L
.
There are efficient special-purpose
algorithms for computing dominators, but we can also compute
dominators through a forward dataflow analysis.
Can you conclude anything about loops?
Dominator exercises:
Suppose that both D
and D'
dominate L
. Prove that exactly
one of the following three relations holds:
D
equals D'
D
dominates D'
D'
dominates D
Hint: proof by contradiction.
Suppose that I
dominates L
. I
is called an immediate
dominator of L
if whenever D
dominates L
, either D = I
or D
dominates I
.
Prove that if L
has an immediate dominator, the immediate
dominator is unique.
Prove that if L
has a dominator, L
has an immediate
dominator.
Prove that the set of labels dominating L
can be represented as
a list with the following properties:
If the set of dominators is empty, the list is []
If the set of dominators is nonempty, the list has the form
d:ds
, where d
is the immediate dominator of L
, and ds
is the list representing the set of labels that dominate d
.
Show that the list representation has a lattice structure and that it is an exact "approximation" of the representation you used above in its inefficient form.
Show that the lattice inf
for the list representation can be
computed without allocating any cons cells.
To compute dominators with the list representation, what should the transfer function look like?
What is the total space cost to calculate dominators using the list representation?
The results of a dominator analysis can be represented as a tree:
data DominatorNode = Entry | Labelled Label
data DominatorTree = Dominates DominatorNode [DominatorTree]
This data structure is a rose tree in which each node may have
arbitrarily many children. The root of the tree is the Entry
constructor, and each label is a child of its immediate dominator.
Using Hoopl, write a forward dataflow analysis that computes
dominators at each node in a
control-flow graph, using the list representation [Label]
above.
Write code to take the set of all lists and compute a dominator tree:
dominatorTree :: [[Label]] -> DominatorTree
Your code need not work on arbitrary lists of Label
s, but it
must work on lists that come out of your analysis in part 1.
(Hint: reverse the lists.)
Dataflow analysis is strictly more powerful than Hoare logic, because dataflow analysis can reason about paths. This kind of reasoning is needed to identify induction variables.
The simplest form of basic induction variable is one whose value describes
an arithmetic progression during the course of the loop. As
mathematical formula, we can say that i
is an induction variable if
its value at the loop header satisfies
i == v + c * k
where v
is any value, c
is the count of the loop iterations,
and k
is a constant. (More generally, k
can be a loop-invariant
expression, but for now, let's not go there.)
Dataflow analysis doesn't know from loops, but perhaps we can
subsituted some all-paths reasoning. For i
to be an induction
variable at a particular point L
, what has to be true?
For all paths from the procedure entry to point L
, not
including L
, what must be true about i
?
For all paths from L
to itself, not including L
, what must
be true about i
?
How would you express this property as a logical formula, universally quantified over program paths?
How would you express this property in a forward dataflow analysis?
What would the transfer functions look like?