Topic of study: the stuff software is made of
Conclusion: make it easier to write programs that really work
Reading: Paul Graham, especially the "Blub Paradox"
In public: a language should express computations
So, no more uncivilized lowlevel C, and no more C++ that no mortal being can understand
In the pub: professionals are keenly interested in their code
Your language can make you or break you.
New ways of thinking about programming
Become a sophisticated consumer, aware of old wine in new bottles
Double your productivity
Bonus: preparation for advanced study
Intellectual tools you need to understand, evaluate, and use languages effectively
Notations of the trade (source of precision, further study)
Learn by doing:
(Just as intellectually challenging as COMP 40, but in an entirely different direction.
Not "how long until this huge pile of code works?"
Instead "how long until I get the Aha Moment that makes these 10 lines work?"
Whorf's hypothesis applies
There aren't that many great features, and you will see them over and over
You'll choose features, and therefore languages, to fit your needs
Some are more powerful than others
The Blub paradox: habit blinds us to power
You're perfectly happy using Blub.
Would you use Haskell, Lisp, or Icon? No! That's equivalent to Blub, plus weird stuff nobody uses.
Blub looks good enough only if you think in Blub.
In Comp 105,
Only the most powerful features need apply
We explode your brain so you can think differently
You'll know you're doing it right if at first your head hurts terribly, then you have a breakthrough moment and it all seems pleasant afterwards
To say more with less (expressive power):
To promote reliability and reuse:
Describing it all precisely:
Three lectures per week: 50% more cracks at tough ideas
No cadre of teaching assistants: material is new to everyone except me and Professor Guyer
Instead, smallgroup section once a week led by Engineering Fellow:
Syllabus and schedule are works in progress:
You must get my book
Homework will be frequent and challenging:
Both individual and pair work:
For some problems larger in scope, you will work in pairs
Be very careful to separate your pair work and your individual work. (Last term in COMP 40, major violation of academic integrity with severe penalties.)
Other policies and procedures in handout and on web  You are responsible!
This topic marries programming and mathematics.
If you're going to enjoy the course,
Each of these is a program:
Transition: lexer/parser/lowering/optimization/codegen/assembler/linker
There is one true representation (and that's the syntax)
Supporting cast:
During translation, symbol table (name spaces aka environments)
At run time, memory managment, instrumentation (valgrind), debugging
Goal: make comparisons easy by eliminating superficial differences
No new language ideas. Instead, imperative programming with an IMPerative CORE:
Models features found in most languages (loops and assignment)
Trivial syntax (taken from LISP)
Idea of LISP syntax:
Names and numerals are basic.
Other constructs are bracketed with parenthesesidentified by keyword after opening bracket
Two syntactic categories: definitions, expressions (no statements!)
An Impcore program is a sequence of definitions [possible demo here]
(define mod (m n) ( m (* n (/ m n))))
Compare
int mod (int m, int n) {
return m  n * (m / n);
}
Or
(val n 99)
Compare
int n = 99;
Also, expressions at top level (definition of it
)
No statements means expressionoriented:
(if e1 e2 e3)
(while e1 e2)
(set x e)
(begin e1 ... en)
(f e1 ... en)
Each one has a value and may have side effects!
Functions may be primitive (+  * / = < > print
)
or defined with (define f ...)
.
The only type of data is "machine integer" (deliberate oversimplification)
Scopes also called "names spaces"; we will call them "environments" because that's the pointyheaded theory termand if you want to read some of the exciting theory papers, pointyheaded theory has to be second nature.
Impcore has 2 levels of environments for variables: globals, formals
There are no local variables
Functions live in their own environment (not shared with variables)
Abuse of separate name spaces: [online demo if possible]
> (val f 33)
33
> (define f (x) (+ x x))
f
> (f f)
66
def ::= (val x exp)
 exp
 (define f (formals) e)
 (use filename)
exp ::= integerliteral
 variablename
 (set x exp)
 (if exp1 exp2 exp3)
 (while exp1 exp2)
 (begin exp1 ... expn)
 (op exp1 ... expn)
op ::= functionname  primitivename
Exp = LITERAL (Value)
 VAR (Name)
 SET (Name name, Exp exp)
 IFX (Exp cond, Exp true, Exp false)
 WHILEX (Exp cond, Exp exp)
 BEGIN (Explist)
 APPLY (Name name, Explist actuals)
Notice that there is one kind of "application" for both userdefined and primitive functions.
C code:
typedef struct Exp *Exp;
typedef enum {
LITERAL, VAR, SET, IFX, WHILEX, BEGIN, APPLY
} Expalt; /* which alternative is it? */
struct Exp { // only two fields: 'alt' and 'u'!
Expalt alt;
union {
Value literal;
Name var;
struct { Name name; Exp exp; } set;
struct { Exp cond; Exp true; Exp false; } ifx;
struct { Exp cond; Exp exp; } whilex;
Explist begin;
struct { Name name; Explist actuals; } apply;
} u;
};
Example AST for (f x (* y 3))
(using Explist)
Example Ast for (define abs (x) (if (< x 0) ( 0 x) x))
(using Namelist)
Trick question:
What's the value of
(* y 3)
? OK, what's its meaning?
Environment associates each variable with one value
Greek letter rho, mapping { x1 > n1, ..., xk > nk }
Environment is a finite map, aka partial function (theorists love functions):
x in dom rho // x is defined in environment rho rho(x) // the value of x in environment rho
First, an abstract type. (For 40 vets: like a Hanson Table_T
, only simpler because it's monomorphic.)
typedef struct Valenv *Valenv;
Valenv mkValenv(Namelist vars, Valuelist vals);
int isvalbound(Name name, Valenv env);
Value fetchval(Name name, Valenv env);
void bindval(Name name, Value val, Valenv env);
Implementation uses pair of lists. Consider
(val x 1)
(val y 2)
(val z 3)
Global environment xi says
nl > z > y > x
vl > 3 > 2 > 1
Representing environments efficiently a key challenge for language implementor. (You'll tackle Exercise 18.)
Let's simplify to
exp ::= numeral
 (op exp exp)
op :: +    *  /
There's no such thing as a "recursive definition." Instead there are only recursion equations and solutions.
Consider
E = Z union { op  e1 in E, e2 in E, op in { +, , *, / } } / \ e1 e2
What is the intersection of all sets satisfying this equation?
Tricky bit: proving the intersection is not empty
Solution is called an "initial algebra" and we won't go there.
Key fact: in any expression, child expressions are smaller
Therefore, recursive functions on expressions terminate
(provided they make recursive calls on child expressions).
This is called structural recursion.
Structural recursion is the simplest known argument for
a. Writing recursive functions
b. Proving they terminate
An expression is evaluated wrt environment (composition of xi, phi, rho)
In our interpreters, implemented by structural recursion
Base cases:
Numerals stand for numbers
Meaning of a name is given by the environment
Questions
syntactic categories and judgement forms
what are the judgment forms?
what's the mathematical interpretation?
what's the operational interpretation?
what's a derivation?
what's a syntactic proof?
what's a metatheoretic proof?
Example: what changes the environment?
set
cannot change the contents of the environmentTime in class is the tip of the iceberg
It's your job to make connections:
Each data type leads to a coding pattern
Metatheory gives interesting facts with easy proofs.
What is the inductive definition of the natural numbers?
To discover recursive functions, write algebraic laws:
exp x 0 = 1
exp x (n + 1) = x * (exp x n)
Can you find a direction in which something gets smaller?
Another example:
binary 0 = 0
binary (2n + 1) = binary n .. 1
binary (2n + 0) = binary n .. 0
Righthand sides in arithmetic?
binary (2n + 1) = 10 * binary n + 1
binary (2n + 0) = 10 * binary n + 0
Which direction gets smaller? What's the code?
Expression, statement, definition, declaration, type
Lots of environments:
Typical complexity!
About any language, you can ask these questions:
What is the abstract syntax? What are the syntactic categories, and what are the terms in each category?
What are the values? What do expressions/terms evaluate to?
What environments are there? That is, what can names stand for?
How are terms evaluated? What are the judgments? (What are the rules?)
What's in the initial basis? Primitives and otherwise, what is built in?
Two new kinds of data:
Pointer to automatically managed cons cell (mother of civilization)
The function closure: not just a datum, a feature
(The setting is Scheme)
LISP, circa 1960; Scheme, circa 1975
Applicative programming ('define a function' vs 'write a program')
('evaluate a function' vs 'run a program')
Instead of mutating machine state, just compute with values
requires higherquality values
Addition by subtraction:
Easier to develop code by case analysis over structure of data
Easier to reason about code (no set
 environments never change)
Powerful tools for composing functions (coming soon)
Syntax (ultimate simple, no operator precedence)
The standard data type is recursive: SExpressions
The standard control structure is (recursive) function call
Programs as data within the language
Automatic memory management (anything else would be uncivilized)
Values are Sexpressions.
Comforting lies:
An Sexpression is a symbol or a literal integer or a literal Boolean or a list of Sexpressions.
A list of Sexpressions is either  the empty list '()
 an Sexpression followed by a list of Sexpressions
Like any other abstract data type
creators create new values of the type
producers make new values from existing values
observers examine values of the type
mutators change values of the type
N.B. creators + producers = constructors
No mutators in uScheme
Creators and producers: nil and cons
'()
(cons S' S)
More examples:
(cons 'a '())
(cons 'a '(b))
Note the literal '(b)
.
Observers: car
, cdr
, null?
, pair?
(also known as "first" and "rest", "head" and "tail", and many other names)
Algebraic laws of lists:
(null? '()) == ...
(null? (cons v vs)) == ...
(car (cons v vs)) == ...
(cdr (cons v vs)) == ...
Do these laws hold more generally?
(car (cons e1 e2)) = ...
What about
set
(strengthen our metatheorem from Monday: if the program has no set
...)print
error
nontermination
More Sexpression observers:
number?
symbol?
symbol literal  symbol?   
integer literal  number?   
cons  pair?  car, cdr 
'()  null?   
More observers: <
, =
, >
Equality: number, symbol, Boolean, nilnever cons cells
Inductively defined data type: small set satisfying this equation:
Inductive definition of lists:
Any list is therefore constructed with nil or with cons.
Example: length
Polymorphic means "operating on values of many types"
The primitive equal?
is polymorphic but returns false on cons cells
It is useful only for comparing atoms
*** Atoms and polymorphic equality ***
(define atom? (x)
(or (number? x)
(or (symbol? x)
(or (boolean? x)
(null? x)))))
(define equal? (s1 s2)
(if (or (atom? s1) (atom? s2))
(= s1 s2)
(and (equal? (car s1) (car s2))
(equal? (cdr s1) (cdr s2)))))
cons
Allocation is the big cost.
Algebraic law for append, by cases, what is xs ys
?
'() .. ys == ys
(x .. xs) .. ys == x .. (xs .. ys)
(cons x xs) .. ys = (cons x (xs .. ys))
From there to
*** Equations and function for append ***
(append '() ys) == ys
(append (cons z zs) ys) == (cons z (append zs ys))
(define append (xs ys)
(if (null? xs) ys
(cons (car xs) (append (cdr xs) ys))))
Why does it terminate?
How many cons cells are allocated?
(N.B. You won't find xs
in the book; that update missed the print deadline. The book calls a list l
, which is not as good.)
Other laws
(append (list1 x) ys) = ???
(if p #t #f) = ???
What about list reversal?
reverse '() = '()
reverse (x .. xs) = reverse xs .. reverse x = reverse xs .. x
And the code?
*** Naive list reversal ***
(define reverse (xs)
(if (null? xs) '()
(append (reverse (cdr xs))
(list1 (car xs)))))
How many cons cells are allocated?
Let's try a new algebraic law:
reverse (x .. xs) .. zs = reverse xs .. x .. zs = reverse xs .. (cons x zs)
reverse '() .. zs = zs
The code
*** Reversal by accumulating parameters ***
(define revapp (xs zs)
(if (null? xs) zs
(revapp (cdr xs) (cons (car xs) zs))))
(define reverse (xs) (revapp xs '()))
Parameter zs
is the accumulating parameter.
(A powerful, general technique.)
Implementation: list of keyvalue pairs
'((k1 v1) (k2 v2) ... (kn vn))
Picture with spine of cons cells, car
, cdar
, caar
, cadar
.
*** Alist observer: find ***
(define caar (xs) (car (car xs)))
(define cdar (xs) (cdr (car xs)))
(define cadar (xs) (car (cdar xs)))
(define find (x alist)
(if (null? alist) '()
(if (equal? x (caar alist))
(cadar alist)
(find x (cdr alist)))))
*** Alist producer: bind ***
(define bind (x y alist)
(if (null? alist)
(list1 (list2 x y))
(if (= x (caar alist))
(cons (list2 x y) (cdr alist))
(cons (car alist)
(bind x y (cdr alist))))))
*** Alist example ***
> (find 'Room '((Course 105) (Room H111A)
(Instructor Ramsey)))
H111A
> (val nr (bind 'Office 'Extension006
(bind 'Courses '(105)
(bind 'Email 'comp105staff '()))))
((Email comp105staff) (Courses (105)) (Office Extension006))
> (find 'Office nr)
Extension006
> (find 'Favoritefood nr)
()
Notes:
*** Laws of assocation lists ***
(find k (bind k v l)) = v
(find k (bind k' v l)) = (find k l), provided k != k'
(find k '()) = '()  bogus!
Depth, not breadth.
Today:
Notes:
The people upstairs broke clang. I think I fixed it.
If you are only getting started, you are in for a rough week.
Local variables: the two honorable choices
Simplify, simplify, simplify
Accumulating parameters can improve efficiency but are usually less readable.
Delay by snow means Andrew and I just started reading yesterday.
Option: Show your theory work to your section leader.
Metatheory is a powerful, wonderful tool, but you can get by with some soft spots in your understanding.
Important: understand and use operational semantics as a descriptive tool. You must understand this completely.
Example: undeclared variables:
x notin dom xi x notin dom rho

<VAR(x), xi, phi, rho> evals < ???? >
let
binding *** Introduce local names into environment ***
(let ((x1 e1)
...
(xn en))
e)
Evaluate e1
through en
, bind answers to x1
, ... xn
Creates new environment for local use only:
rho {x1 > v1, ..., xn > vn}
Also let*
(one at a time) and letrec
(local recursive functions)
Note that we really have definititions and it might be easier to read if McCarthy had actually used definition syntax, which you'll see in ML, Haskell, and other functional languages:
*** What McCarthy should have done ***
(let ((val x1 e1)
...
(val xn en))
e)
Things that should offend you about Impcore:
Look up function vs look up variable requires different interfaces!
To get a variable, must check 2 or 3 environments.
All these problems have one solution: lambda
From Church's lambdacalculus:
(lambda (x) (+ x x))
"The function that maps x to x plus x"
At top level, like define
. (Or more accurately, define
is a synonym for lambda
.)
In general, \x.E
or (lambda (x) E)
x
is bound in E
E
The ability to "capture" free variables is what makes it interesting:
(lambda (x) (+ x y)) ; means what??
What matters is that y
can be a parameter or letbound variable of an enclosing functions.
First example:
*** A zerofinder ***
(define findzerobetween (f lo hi)
; binary search
(if (>= (+ lo 1) hi)
hi
(let ((mid (/ (+ lo hi) 2)))
(if (< (f mid) 0)
(findzerobetween f mid hi)
(findzerobetween f lo mid)))))
(define findzero (f) (findzerobetween f 0 100))
findzerobetween
is a higherorder function because it takes another function as an argument. You can do this in C.
*** Function escapes! ***
> (define tothenminusk (n k)
(let
((xtothenminusk (lambda (x)
( (exp x n) k))))
xtothenminusk))
> (val xcubedminus27 (tothenminusk 3 27))
> (xcubedminus27 2)
19
tothenminusk
is a higherorder function because it returns another (escaping) function as a result. Much more interesting!
*** No need to name the escaping function ***
> (define tothenminusk (n k)
(lambda (x) ( (exp x n) k)))
> (val xcubedminus27 (tothenminusk 3 27))
> (xcubedminus27 2)
19
*** The cube root of 27 and the square root of 16 ***
> (findzero (tothenminusk 3 27))
3
> (findzero (tothenminusk 2 16))
4
*** What's an "escaping" function? ***
> (define tothenminusk (n k)
(lambda (x) ( (exp x n) k)))
"Escape" means "outlive the activation in which the lambda
was evaluated."
Typically returned
More rarely, stored in a global variable or a heapallocated data structure
Escaping and lifetimes are some of those universal decisions every programmer has to think about.
Where are n
and k
stored???
C programmers do this explicitly with malloc
In a language with firstclass, nested functions, part of the semantics of lambda
Closures:
Operational semantics of closures:
Preview: in math, what is the following equal to?
(f o g)(x) == ???
Another algebraic law, another function:
*** Functions create new functions ***
> (define o (f g) (lambda (x) (f (g x))))
> (define even? (n) (= 0 (mod n 2)))
> (val odd? (o not even?))
> (odd? 3)
> (odd? 4)
Plan for today:
Preparation for some particular homework problems
An important example of functions creating functions: Currying (and its partner, partial application)
Start (but don't finish) HOFs on lists
Imposing your will on Sexpressions
Must use data to guide the code
Yet data types exist only in the mind of the programmer (a weakness Scheme shares with the scripting languages)
Board:
LIST(A) = UNIT u A * LIST(A)
ALIST(K, V) = LIST( { (list2 k v)  k in K, v in V } )
Not every Sexpression is a list:
Rose trees ($k$ary trees):
Board:
ROSE(A) = A u LIST(ROSE(a))
Your job:
Figure out the recursion pattern that goes with ROSE
trees.
For problem 20 and problem T, don't wait for continuations
(figure out now what is the inductive definition of an algebraic formula)
Write formula?
. Write satisfies?
.
Q: What can you do if you write a function that consumes a function?
A: apply it!
You can't mutate or modify a function, but you can create as many new ones as you like:
*** SLIDE ***
> (define o (f g) (lambda (x) (f (g x))))
*** Classic functional technique: Currying ***
> (val positive? (lambda (y) (< 0 y)))
> (positive? 3)
> (val <curried (lambda (x) (lambda (y) (< x y))))
> (val positive? (<curried 0)) ; "partial application"
> (positive? 0)
No need to Curry by hand!:
What's the algebraic law for curry
?
... (curry f) ... = ... f ...
Keeping in mind all you can do with a function is apply it?
*** Bonus content: vulnerable variables? ***
> (val seed 1)
> (val rand (lambda ()
(set seed (mod (+ (* seed 9) 5) 1024)))))
> (rand)
14
> (rand)
131
> (set seed 1)
1
> (rand)
14
*** Bonus: Lambda as abstraction barrier! ***
> (val mkrand (lambda (seed)
(lambda ()
(set seed (mod (+ (* seed 9) 5) 1024))))))
> (val rand (mkrand 1))
> (rand)
14
> (rand)
131
> (set seed 1)
error: set unbound variable seed
> (rand)
160
Capture common patterns of computation
exists?
, all?
(is there a number?)filter
(take only the numbers?)map
(add 1 to every element)Folds also called reduce
, accum
, "catamorphism"
exists?
Algebraic law:
(exists? p? '()) == ???
(exixts? p? '(cons a as)) == ???
*** SLIDE ***
> (define exists? (p? xs)
(if (null? xs)
(if (p? (car xs))
(exists? p? (cdr xs)))))
> (exists? pair? '(1 2 3))
> (exists? pair? '(1 2 (3)))
> (exists? ((curry =) 0) '(1 2 3))
> (exists? ((curry =) 0) '(0 1 2 3))
filter
Problem: Give me a list of numbers; return only the even elements.
What are the laws?
(filter even? '()) == ???
(filter even? '(cons a as)) == ???
(filter even? '(cons a as)) == ???
*** SLIDE ***
> (define filter (p? xs)
(if (null? xs)
'()
(if (p? (car xs))
(cons (car xs) (filter p? (cdr xs)))
(filter p? (cdr xs)))))
> (filter (lambda (n) (> n 0)) '(1 2 3 4 5 6))
(1 2 5 6)
> (filter (lambda (n) (<= n 0)) '(1 2 3 4 5 6))
(3 4)
> (filter ((curry <) 0) '(1 2 3 4 5 6))
(1 2 5 6)
> (filter ((curry >=) 0) '(1 2 3 4 5 6))
(3 4)
*** List filtering: composition revisited ***
> (val positive? ((curry <) 0))
<procedure>
> (filter positive? '(1 2 3 4 5 6))
(1 2 5 6)
> (filter (o not positive?) '(1 2 3 4 5 6))
(3 4)
map
Problem: square every element of a list.
What are the laws?
(map square '()) ==
(map square (cons a as)) ==
*** SLIDE ***
> (define map (f xs)
(if (null? xs)
'()
(cons (f (car xs)) (map f (cdr xs)))))
> (map number? '(3 a b (5 6)))
(#t #f #f #f)
> (map ((curry *) 100) '(5 6 7))
(500 600 700)
> (val square* ((curry map) (lambda (n) (* n n))))
<procedure>
> (square* '(1 2 3 4 5))
(1 4 9 16 25)
fold
Another view of operator folding:
foldr
is a catamorphism
Another catamorphism: foldl
associates to left
*** Existence, by laws ***
(exists? p? '()) == #f
(exists? p? '(cons a as)) == (p? a) \/ (exists p? as)
*** Filter, by laws ***
(filter p? '()) == '()
(filter p? '(cons a as)) == (cons a (filter p? as)),
if (p? a)
(filter p? '(cons a as)) == (filter p? as),
otherwise
*** Filter, by code ***
> (define filter (p? xs)
(if (null? xs)
'()
(if (p? (car xs))
(cons (car xs) (filter p? (cdr xs)))
(filter p? (cdr xs)))))
> (filter ((curry <) 0) '(1 2 3 4 5 6))
(1 2 5 6)
> (filter ((curry >=) 0) '(1 2 3 4 5 6))
(3 4)
What have we got:
What we haven't got:
map
Problem: square every element of a list.
What are the laws?
(map square '()) ==
(map square (cons a as)) ==
*** SLIDE ***
> (define map (f xs)
(if (null? xs)
'()
(cons (f (car xs)) (map f (cdr xs)))))
> (map number? '(3 a b (5 6)))
(#t #f #f #f)
> (map ((curry *) 100) '(5 6 7))
(500 600 700)
> (val square* ((curry map) (lambda (n) (* n n))))
<procedure>
> (square* '(1 2 3 4 5))
(1 4 9 16 25)
fold
* SLIDE* > (define foldr (plus zero xs) (if (null? xs) zero (plus (car xs) (foldr plus zero (cdr xs))))) > (val sum (lambda (xs) (foldr + 0 xs))) > (val prod (lambda (xs) (foldr * 1 xs))) > (sum '(1 2 3 4)) 10 > (prod '(1 2 3 4)) 24
Another view of operator folding:
foldr
is a catamorphism
Another catamorphism: foldl
associates to left
Direct style  last action is to return a value
Continuationpassing style  last action is to "throw" value to a continuation
Simulate with an ordinary tail call.
How functions finish:
Problem in interface design:
Ideas?
Bad choices:
'fail
Good choice:
Solution: new interface:
*** Coding \lit{witness} with continuations ***
(define witnesscps (p? xs succ fail)
(if (null? xs)
(fail)
(let ((x (car xs)))
(if (p? x)
(succ x)
(witnesscps p? (cdr xs) succ fail)))))
Example use: instructor lookup:
*** Simple continuations at work ***
> (val 2011s '((Ramsey 105)(Hescott 170)(Chow 116)))
> (instructorinfo 'Ramsey 2011s)
(Ramsey teaches 105)
> (instructorinfo 'Chow 2011s)
(Chow teaches 116)
> (instructorinfo 'Souvaine 2011s)
(Souvaine isnotonthelist)
Crucial question:
Review: CPS as an interface:
Writing code in CPS:
{Continuations for search problems}:
Concluding remark (board): functions are cheap
First three of five questions: Syntax, values, environments
Key changes from Impcore
New constructs: let, lambda, application (not just names)
New values, including functions (closures)
A single environment
Environments get copied
Environment maps names to mutable locations, not values
{\uscheme\ vs \impcore \ (Probably overdue)}:
{Evaluation rules}:
{Implementation of closures}:
{Applying closures}:
{Locations in closures}:
On the heap, but
rho
Example: SML/NJ has closure register, arg register.
{Closures in pictures}:
Closure optimizations
Major issue in making functional programs efficient
eliminating closures (when functions don't escape)
{\uscheme\ in one slide}:
A Scheme program is just another Sexpression
and
, let
, others... *** Real Scheme Conditionals ***
(cond (c1 e1) ; if c1 then e1
(c2 e2) ; else if c2 then e2
... ...
(cn en) ; else if cn then en
; Syntactic sugar'if' is a macro:
(if e1 e2 e3) == (cond (e1 e2)
(#t e3))
A call in tail position.
What is tail position?
Tail position is defined inductively:
(if e1 e2 e3)
is in tail position, so are e2
and e3
(let (...) e)
is in tail position, so is e
, and similary for letrec
and let*
.(begin e1 ... en)
is in tail position, so is en
.Anything in tail position is the last thing executed!
Key idea is tailcall optimization!
Tailcall optimization:
Example of tail position:
Example of tail position:
Question: how much stack space is used by the call?
Another example of tail position:
Another example of tail position:
Question: how much stack space is used by the call?
A third example of tail position:
A third example of tail position:
Question: how much stack space is used by the call?
Question: in your previous life, what did you call a construct that
Common Lisp, Scheme:
Down sides:
Bottom line: it's all about lambda
Problems A and 12 (the strange let binding): making your head explode, creating room for new knowledge
Problem 1: understanding the structure of data and the recursion patterns that go with it (preparation for the solver)
Problems 5 and 6: mastering higherorder functions
Problem 13 (characteristic functions): understanding lambda
Problem B (quicksort): putting it all together in an application
Problems 20 and T (formula solver): inductive data types, continuationpassing style
Problems 26 and 35: more practice with semantics and implementation
Back to fundamentals:
Wellchosen names
Documentation of contracts, not implementations (as a good example, think witnesscps
)
Apply your new knowlwedge in Standard ML:
Much less intellectual effort
What's the one thing I wanted you to remember from Friday's lecture?
Different kind of effort:
This week:
Meta: Not your typical introduction to a new language
ML Overview
Designed for programs, logic, symbolic data
Apples to Oranges: uScheme in ML
Theme: talking about data
Board: ML = uScheme + pattern matching + exceptions + static types
Three new ideas:
And lots of new concrete syntax!
MLThe Five Questions
Syntax: expressions, definitions, patterns, types
Values: num/string/bool, record/tuple, \rlap{\high{algebraic data}}
Environments: names stand for values (and types)
Evaluation: uScheme + case
and pattern matching
Initial Basis: medium size; emphasizes lists
\medskip
(Question Six: type systema coming attraction)
A note about books
Ullman is easy to digest
Ullman is clueless about good style
Suggestion:
(My style is not flawlesssome Haskell style has crept inbut it will serve you better than Ullman. And I've made some effort at consistency.)
Tidbits:
The most important idea in ML!
Originated with Hope (Burstall, MacQueen, Sannella), in the same lab as ML, at the same time (Edinburgh!)
(Wikipedia article among the worst ever.)
Board:
A "Boolean" is produced using true
or false
A "list of A" is produced using nil
or a :: as
, where a
is an A and as
is a "list of A"
A "Rose tree of A" is produced using a A or a list of rose trees of A
*** SLIDE ***
datatype bool = true  false (* copy me NOT! *)
datatype 'a list = nil (* copy me NOT! )  op :: of 'a 'a list
datatype 'a srose = LEAF of 'a  INTERNAL of 'a srose list
type bool val true : bool type 'a list val false : bool type 'a srose val nil : forall 'a . 'a list val op :: : forall 'a . 'a * 'a list > 'a list val LEAF : forall 'a . 'a > 'a srose val INTERNAL : forall 'a . 'a srose list > 'a srose
Exegesis (on board):
Notation 'a
is a type variable
Name before =
introduces a new type constructor into the type environment. Type constructors may be nullary.
Alternatives separated by bars are value constructors of the type
They are new and hide previous names
(Do not hide builtin names list
and bool
from the initial basis!)
Value constructors participate in pattern matching (coming)
Complete by themselves: true
, false
, nil
Expect parameters to make a value or pattern: ::
, LEAF
, INTERNAL
op
enables an infix operator to appear in a nonfix context
Type application is postfix
'a srose
is the type parameter to list
New names into two environments:
bool
, list
, srose
stand for new type constructors
true
, false
, nil
, ::
, LEAF
, INTERNAL
stand for new value constructors
Algebraic datatypes are inherently inductive (list
appears in its own definition)to you, that means finite trees
'a * 'a list
is a pair type  infix operators are always applied to pairs
*** Syntactic sugar for lists ***
 1 :: 2 :: 3 :: 4 :: nil; (* :: associates to the right *)
> val it = [1, 2, 3, 4] : int list
 INTERNAL (LEAF 1 :: INTERNAL (LEAF 2 :: LEAF 3 :: nil) ::
LEAF 4 :: nil);
> val it = INTERNAL [LEAF 1, INTERNAL [LEAF 2, LEAF 3],
LEAF 4] : int srose
Consuming algebraic types
New language construct case
(an expression)
fun length xs =
case xs
of [] => 0
 (x::xs) => 1 + length xs
Exegesis:
fun
is close to define
nil
+
is an infix identifiercase
is checked by the compiler:
case
is still no fun!Types and their uses:
Type  Produce  Consume 
Introduce  eliminate  
arrow  Lambda (fn)  Application 
algebraic  Apply constructor  Pattern match 
tuple  (e1, ..., en)  Pattern match! 
Consuming algebraic typesbetter
Case analysis in definition
fun length [] = 0
 length (x::xs) = 1 + length xs
(Very close to an algebraic law.)
This is the standard.
Harper: my students don't believe...
*** Type of a function consuming lists ***
 map;
> val ('a, 'b) it = fn : ('a > 'b) > 'a list > 'b list
(* means this:
val map: forall 'a, 'b . ('a > 'b) > ('a list > 'b list)
^^^^^^^^^^^^^^^
*)
Milner left out the forall
. A tragic mistake ("language you need a PhD to use").
And the arrow type associates to the right
*** Code for a function consuming lists ***
fun map f [] = []
 map f (x::xs) = f x :: map f xs
(* FUNCTION APPLICATION HAS HIGHER
PRECEDENCE THAN ANY INFIX OPERATOR! *)
*** SLIDE ***
val filter : forall 'a . ('a > bool) >
'a list > 'a list
fun filter p [] = []
 filter p (x::xs) =
case p x
of true => x :: filter p xs
 false => filter p xs
fun filter p [] = []
 filter p (x::xs) = (* 'if' better than 'case' *)
if p x then x :: filter p xs
else filter p xs
Exercise  write
val flatten : forall 'a . 'a srose > 'a list
where
datatype 'a srose
= LEAF of 'a
 INTERNAL of 'a srose list
*** SLIDE ***
fun flatten (LEAF a) = [a]
 flatten (INTERNAL ts) = (List.concat o map flatten) ts
fun flatten (LEAF a) = [a]
 flatten (INTERNAL ts) = List.concat (map flatten ts)
fun flatten t =
let fun flatapp (LEAF a, tail) = a :: tail
 flatapp (INTERNAL ts, tail) =
foldr flatapp tail ts
in flatapp (t, [])
end
Board:
curry flip infixr 0 $ optmap
uncurry fst
snd
Basic values and expression
Infix arithmetic with "high minus"
~3 + 3 = 0
String concatenation: concat
, infix hat ^
Shortcircuit andalso
, orelse
for Booleans
Strings "as in C and C++"
Characters #"a"
, #"b"
, ...
'
marks a type variable or is "prime"Identifiers
Formation rules:
_
, prime '
+/*<>=!@#$%^&~\?:
(and `)'
? You're a type variableFixity:
infixr 5 @ (* associativity, precedence *)
Revert using op
*** Lists and fixity ***
 [+, , *, div];
! Toplevel input:
! [+, , *, div];
! ^
! Illformed infix expression
 [op +, op , op *, op div];
> val it = [fn, fn, fn, fn] : (int * int > int) list
 length it;
> val it = 4 : int

The value environment:
Beware the semicolon!
Interactive REPL uses semicolons
Never put a semicolon in a file
(Did I mention Ullman was clueless?)
Nesting environments
At top level, definitions
Definitions contain expressions:
def ::=
val
ident=
exp
Expressions contain definitions:
exp ::=
let
defsin
expend
Sequence of defs has letstar semantics
Structured values and terms: lists:
Structured values and terms: tuples:
Function pecularities: lambda
Lambda is spelled fn
:\normalsize
val rec factorial =
fn n => if n = 0 then 1 else n * fact (n1)
Function pecularities: 1 argument
Each function takes 1 argument, returns 1 result
For "multiple arguments," use tuples!\normalsize
fun factorial n =
let fun f (i, prod) =
if i > n then prod else f (i+1, i*prod)
in f (1, 1)
end
fun factorial n = (* you can also Curry *)
let fun f i prod =
if i > n then prod else f (i+1) (i*prod)
in f 1 1
end
Tuples are "usual and customary."
Mutual recursion:
Patternmatching in definitions
def ::=
val
pattern=
exp
Example:
val (pivot, left, right) = split xs
Function definition:
def ::=
fun
ident patterns=
exp
{
ident patterns=
exp }
Cases must be exhaustive! (Overlap discouraged)
What is a pattern?
pattern ::= variable
 wildcard
 valueconstructor [pattern]
 tuplepattern
 recordpattern
 integerliteral
 listpattern
Design bug: no lexical distinction between
Workaround: programming convention
Patternmatching example:
Type examples:
Values and types:
Syntax of ML types:
Syntax of ML types:
Polymorphic types:
Old and new friends:
Board:
Syntax:
raise E
where E : exn
exception EmptyQueue
Semantics:
Handler uses pattern matching
e handle pat1 => e1  pat2 => e2
*** Exception handling in action ***
loop (evaldef (reader (), rho, echo)) handle EOF => finish ()  Div => continue "Division by zero"  Overflow => continue "Arith overflow"  RuntimeError msg => continue ("error: " ^ msg)  IO.Io {name, ...} => continue ("I/O error: " ^ name)  SyntaxError msg => continue ("error: " ^ msg)  NotFound n => continue (n ^ "not found")
Gotcha  Value polymorphism:
*** Gotcha  overloading ***
 fun plus x y = x + y;
> val plus = fn : int > int > int
 fun plus x y = y + y : real;
> val plus = fn : real > real > real
Gotcha  equality types:
Gotcha  parentheses
Put parentheses around anything with 
case
,handle
,fn
Function application has higher precedence than any infix operator
To run Moscow ML:
ledit mosml P full
And tell it:
help "";
use "warmup.sml";
Also /usr/bin/sml
, /usr/bin/mlton
Recursion for lists:
(if (null? xs)
... one case ...
... the other case ...)
Recursion for rose trees:
(if (atom? t)
... one case ...
... the other case ..)
Never correct:
... (atom? (car l)) ... ; there is no such data structure
Code improvement:
(append (list1 x) l)
[N.B. "Static" means "at compile time"]
Why have a static type system?
1970 answer: efficient code
Why are we learning about types?
But why are we learning this theory stuff?
Two flavors:
What is a type?
As a working definition, a set of values
As a precise definition, a classifier for terms!!
Examples: types
These are types:
int
bool
int * bool
int * int > int
Board: little language of types
Examples: not really types
These "types in waiting" don't classify any terms
list
(but int list
is a type)array
(but char array
is a type)ref
(but (int > int) ref
is a type)These are utter nonsense
int int
bool * array
Type constructors
Technical name for "types in waiting"
Zero or more args, produce type
int
, bool
, char
also called base typeslist
, array
, ref
>
More complex type constructors:
Rules for constructors:
Connecting types to values:
Type soundness:
Types help programmers
What do I write next?
Plan:
Typical syntactic support for types
Explicit types on lambda
and define
:
For lambda
, argument types:\normalsize
(lambda ((int n) (int m)) (+ (* n n) (* m m)))
For define
, argument and result types:
(define int max ((int x) (int y)) (if (< x y) y x))
Abstract syntax: \small
datatype exp = ...
 LAMBDA of (name * tyex) list * exp
...
datatype def = ...
 DEFINE of name * tyex * ((name * tyex) list * exp)
...
Type formation rules (Typed Impcore):
Board: what datatype goes with these rules?
Typical syntactic support for types
Explicit types on lambda
and define
:
For lambda
, argument types:\normalsize
(lambda ((int n) (int m)) (+ (* n n) (* m m)))
For define
, argument and result types:
(define int max ((int x) (int y)) (if (< x y) y x))
Type judgments for monomorphic system
Two judgments
Type environment gives type of each variable.
Type soundness again:
Type rules for variables:
Notice: one rule for if!! (and while)
Type rules for control:
Product types: both x and y:
Pair rules generalize to product types with many elements (tuples,''
structs,'' ``records'')
Sum types: either x or y:
Sum types generalize to ``union'' of N alternatives
*** Digression: Product and sum in ML ***
type ('a, 'b) prod = 'a * 'b
val pair = fn (a, b) => (a, b)
val fst = fn (a, b) => a
val snd = fn (a, b) => b
datatype ('a, 'b) sum (* often "either" *)
= LEFT of 'a
 RIGHT of 'b
Arrow types: function from x to y:
Array types: array of x:
Type checking:
Board: Lambda rule
*** Typechecking LAMBDA ***
datatype exp = LAMBDA of (name * tyex) list * exp
...
fun ty (LAMBDA (formals, body)) =
let val Gamma' = (* body gets new env *)
foldl (fn ((n, ty), g) => bind (n, ty, g))
Gamma formals
val bodytype = typeof (body, Gamma')
val formaltypes =
map (fn (n, ty) => ty) formals
in funtype (formaltypes, bodytype)
end
Monomorphic types are limiting
Each new type constructor requires
Monomorphic burden: Array types:
Notes:
Monomorphism hurts programmers too
Users can't create new syntax
Userdefined functions are monomorphic:\normalsize
(define int lengthI ((list int) l)
(if (null? l) 0 (+ 1 (lengthI (cdr l)))))
(define int lengthB ((list bool) l)
(if (null? l) 0 (+ 1 (lengthB (cdr l)))))
(define int lengthS ((list sym) l)
(if (null? l) 0 (+ 1 (lengthS (cdr l)))))
Quantified types:
*** Substitute for quantified variables ***
> length
<proc> : (forall ('a) (function ((list 'a)) int))
> (@ length int)
<proc> : (function ((list int)) int)
> (length '(1 2 3))
type error: function is polymorphic; instantiate before applying
> ((@ length int) '(1 2 3))
3 : int
*** Substitute what you like ***
> length
<proc> : (forall ('a) (function ((list 'a)) int))
> (@ length bool)
<proc> : (function ((list bool)) int)
> ((@ length bool) '(#t #f))
2 : int
*** More "Instantiations" ***
> (val lengthint (@ length int))
lengthint : (function ((list int)) int)
> (val consbool (@ cons bool))
consbool : (function (bool (list bool))
(list bool))
> (val cdrsym (@ cdr sym))
cdrsym : (function ((list sym)) (list sym))
> (val emptyint (@ '() int))
() : (list int)
Bonus instantiation:
> map
<proc> :
(forall ('a 'b)
(function ((function ('a) 'b)
(list 'a))
(list 'b)))
> (@ map int bool)
<proc> : (function ((function (int) bool)
(list int))
(list bool))
Create your own!
Abstract over unknown type using typelambda
> (val id (typelambda ('a)
(lambda (('a x)) x)))
id : (forall ('a) (function ('a) 'a))
\large 'a
is type parameter (an unknown type)
This feature is parametric polymorphism
Board: two forms of abstraction:
term  type 
`lambda`  `function` (arrow) 
`typelambda`  `forall` 
Power comes at notational cost
Function composition
> (val o (typelambda ('a 'b 'c)
(lambda (((function ('b) 'c) f)
((function ('a) 'b) g))
(lambda (('a x)) (f (g x))))))
o : (forall ('a 'b 'c)
(function ((function ('b) 'c)
(function ('a) 'b))
(function ('a) 'c)))
Aka \black o : \mbox{$\forall \alpha, \beta, \gamma \mathrel. (\beta \arrow \gamma) \cross (\alpha \arrow \beta) \arrow (\alpha \arrow \gamma) $}
Midterm evaluations (board  participation grade)
Typed uScheme policy reversal
Fresh issue of extension tokens to total 7
(blots!)
Also length
considered harmful
*** SLIDE ***
datatype 'a tree
= NODE of 'a tree * 'a * 'a tree
 LEAF
*** SLIDE ***
fun tfr f LEAF tail = tail
 tfr f (NODE(LEAF, x, LEAF)) tail = f (x, tail)
 tfr f (NODE(LEAF, x, right)) tail =
f (x, tfr f right tail)
 tfr f (NODE(left, x, LEAF)) tail =
tfr f left (f (x, tail))
 tfr f (NODE(left, x, right)) tail =
tfr f left (f (x, tfr f right tail))
*** SLIDE ***
fun length (nil) = 0
 length (x::xs) =
foldr (fn (_, n) => n + 1) 0 (x::xs)
From a monomorphic language to a polymorphic language:
Shrink the language (always an agenda)
begin{center} \trule {\ptypeis e {\forall\ldotsn\alpha.\tau}} {\ptypeis {\xtyapply(e, \ldotsn\tau)} {\tau[\ldotsmapston \alpha \tau]}} \end{center} es
Generalize with typelambda:
Arrowintroduction, corrected {\&} compared:
Type formation through kinds:
Use kinds to give arities:
The kinding judgment:
*** The phase distinction revealed ***
fun checkThenEval (d, (delta, gamma, rho)) =
let val (gamma', tystring) = elabdef (d, gamma, delta)
val (rho', valstring) = evaldef (d, rho)
val _ = print (valstring ^ " : " ^ tystring)
in (delta, gamma', rho')
end
How to write elabdef
:
Kinding rules for types:
Three environments:
Three environments revealed:
Three environments:
Three environments revealed:
Relationship between case
and either
:
case e
of LEFT x => a
 RIGHT y => b
becomes
(either e
(lambda (x) a)
(lambda (y) b))
and it is continuationpassing style
Why do this???*
*** Working up to type inference ***
> (define double (x) (+ x x))
double ;; uScheme
> (define int double ((int x)) (+ x x))
double : (function (int) int) ;; Typed uSch.
> (define double (x) (+ x x))
double : int > int ;; uML
*** Infer a polymorphic type ***
> (lambda (f) ;; uScheme
(lambda (x) (f (f x))))
<procedure>
> (typelambda ('a) ;; Typed uScheme
(lambda (((function ('a) 'a) f))
(lambda (('a x)) (f (f x)))))
<procedure> : (forall ('a)
(function ((function ('a) 'a)) (function ('a) 'a)))
> (lambda (f) ;; uML
(lambda (x) (f (f x))))
<procedure> : forall 'a . ('a > 'a) > ('a > 'a)
 fn f => fn x => f (f x); ;; Moscow ML
> val 'a it = fn : ('a > 'a) > 'a > 'a
The HindleyMilner type system
Restricted polymorphism:
Advantage: no notation
let
/val
bindingML to Typed uScheme:
val id = fn x => x
becomes
(val id (typelambda ('a) (lambda (('a x)) x)))
Term
3 :: []
becomes
((@ cons int) 3 (@ '() int))
Representing HindleyMilner types:
*** Type schemes in codeland ***
datatype ty
= TYVAR of name
 TYCON of name
 CONAPP of ty * ty list
datatype type_scheme
= FORALL of name list * ty
Key ideas:
Key ideas repeated:
{ML type rules can be nondeterministic!}:
{ML type rules, continued}:
{Things to notice}:
{Type inference}:
{Instances and substitution}:
{Instance properties}:
{Instantiation intuition}:
{More instantiation}:
John Hughes roadshow 29 March
Tucket Taft 31 March
Board
{Unification}:
{Most General Unifiers}:
{Implementing unification}:
{From type rules to type inference}:
{Type rules to type inference, cont'd}:
{Type rules to type inference, finish}:
Unification review:
{Type inference review: Lambda}:
{Type rules to type inference: apply}:
{More explicit substitutions}:
{Type inference, operationally}:
{Implementing type inference}:
{Type inference example}
Try
val f = fn x => op + (x, 1)
{Type systems: Things to remember}:
On to the lambda calculus!
What is theory and why is it important:
Precise answer to what does this program mean
Guides implementors and programmers
Elements of theory
World's simplest language
exp ::= var  \var.exp  exp exp
Also typically written as
M,N ::= x  \x.M  M N
Application associates to the left and binds tighter than abstraction
(Equivalent: body of lambdaterm extends to the closing right paren)
Lambda is as powerful as any language known
Ideal for
Abstract syntax: application, abstraction, variable
Values: no values! Values are just terms
Environments: not used! (But names stand for terms)
Evaluation rules: coming
Initial basis: often add constants like + or 2
New kind of semantics: smallstep
New judgment form
M > N ("M reduces to N in one step")
No context!! No turnstile!!
Just pushing terms around == calculus
{Reduction rules}:
Example reduction to normal form:
(\x.\y.yx)(+ 3 4)(\x.+ x 2) > (beta)
(\y.y(+ 3 4))(\x.+ x 2) > (beta)
(\x.+ x 2)(+ 3 4) > (beta)
+(+ 3 4)2
This is a normal form. If we have rules for constants, we can go further (to 9)
The substitution in the beta rule is the heart of the lambda calculus
Free variables:
Examples of free variables:
\x . + x y
\x. \y. x
Substitution:
Over 20 pages of feedback
Focus on quick improvements
Substitution:
Example:
(\yes.\no.yes)(\time.no) >
\z.\time.no
and never
\no.\time.no // WRONG!!!!!!
{Renaming of bound variables}:
Conversion:
Nondeterminism of conversion:
A
/ \
V V
B C
Now what??
ChurchRosser Theorem:
What if we have A but no B with A > B
(Each set of reduction rules induces its own normal form)
Normal forms are values:
Given a betaredex
(\x.M) N
do the betareduction only if N
is in normal form
Q: Does applicative order ever prevent you from making progress?
A: No. We can prove it by induction on the number of lambdas in N
Always choose leftmost, outermost redex
"Normalorder" stands for produces a normal form, not for "the normal way of doing things"
Normalorder illustration:
Not your typical callbyvalue semantics!
Anybody remember propositions, logic, tautologies?
Logic  PL 
Propositions  Types 
Proofs  Terms 
Basis of many modern theorem provers
Logical formulas:
A > B > A
A > B > B
A /\ B > A
A /\ B > B
A > A \/ B
B > A \/ B
Types:
A > B > A
A > B > B
A * B > A
A * B > B
A > A + B
B > A + B
What terms have these types?
Is
A > B
a tautology?
Q: What term could possibly have that type?
A: Type soundness guarantees that if it terminates, it's good
A: So, a term that doesn't terminate
Q: How would you represent a pair, using only functions?
Followup: given a pair, what can a function do
A: continuationpassing style
Q: So then what function type might represent A * B
A: forall C . (A > B > C) > C
Followup: does (forall C . (A > B > C) > C) > A
?
A: Yes; substitute A for C and ((A > B > A) > A) > A
Followup: a term of that type?
A: \f. f (\a.\b.a)
Agenda: code a simple functional language, like Scheme, using lambda calculus.
Idea: continuationpassing style
if M then N else P ===> M N P
M
chooses N
or P
based on truth or falsehood?
Q: How are true and false coded??
A: \x.\y.x
and \x.\y.y
Q: What is the capability of a pair?
A: To access both elements
Q: Then what does the continuation look like?
A: Takes both elements and produces an answer
Q: How do we represent the pair (M, N)?
A: \k. k M N
Q: What does the pair
function look like?
A: `pair = \x.\y.\f.f x y
Q: How do we implement fst
and snd
?
A: fst = \p.p(\x.\y.x)
A: snd = \p.p(\x.\y.y)
Q: What is the capability of a pair?
A: To access either one element or the other
Q: How many continuations should a sum expect?
A: Two
Q: What does a continuation look like?
A: Takes one element and produces an answer
Q: How do we represent the value (LEFT M)
A: \f.\g. f M
Q: How do we represent the value (RIGHT N)
A: \f.\g. g N
Q: So what do the LEFT and RIGHT functions look like?
A: left = \a.\f.\g.f a
A: right = \b.\f.\g.g b
Q: How do we do case analysis?
A: Our old friend either
!
case X of LEFT a => M  RIGHT b => N
either X (\a.M) (\b.N)
either s f g = s f g
Simulating Sexpressions:
datatype bool = true  false
datatype 'a heap = EMPTY  HEAP of 'a heap * 'a * 'a heap
datatype 'a seq = ESEQ  SINGLE of 'a  APPEND of 'a seq * 'a seq
What are the Church encodings of the following?
true
false
EMPTY
HEAP
ESEQ
SINGLE
APPEND
Natural numbers as functions:
Church Numerals:
*** Church Numerals to machine integers ***
> (define toint (n)
((n ((curry +) 1)) 0))
> (toint three)
3
> (toint ((times three) four))
12
*** Church Numerals in $\lambda$ ***
zero = \f.\x.x;
succ = \n.\f.\x.f (n f x);
plus = \n.\m.n succ m;
times = \n.\m.n (plus m) zero;
...
> four;
\f.\x.f (f (f (f x)))
> three;
\f.\x.f (f (f x))
> times four three;
\f.\x.f (f (f (f (f (f (f (f (f (f (f (f x)))))))))))
Typed lambda is a simplified version of Typed uScheme:
M ::= x
 \x:tau . M
 M N
 /\'a . M
 M [tau] // instantiation
 constant
tau ::= mu // type constructor
 'a // type variable
 forall 'a . tau
 tau tau
There is also a simplytyped lambda calculus without typelambda, instantiation, or quantified types.
Church numerals in a typed calculus:
Lambda calculus is just pushing terms around
Operational semantics is just executing some automaton
How do we justify calling it "functional programming?" Where are the functions????
Functions are mathematical objects
Denotational semantics says what mathematical object corresponds to
A function is a value.
If values are V, then V must include V > V.
What is V > V? It can't be all functions from V to V
But wait! There are only countably many source codes!
Intuition: include only those functions whose behavior is required by the source code
Two heroes here:
Dana Scott figured out which mathematical functions (and other objects) are denoted by computer programs. He described semantic domains; you'll get a taste of domain theory.
Christopher Strachey related it to language by syntaxdirected definitions which are compositional
Scott's ideas underly classical compiler optimizations based on dataflow analysis. Plus they apply to all recursion equations
Strachey's ideas are used in every compiler and interpreter.
(N.B. First book ever written on these topics was written by Joe Stoy.)
Our goals (board):
Understand how recursion equations are solved by the fixedpoint method
Learn to compute fixed points using the method of successive approximations
Write definitional interpreters in Strachey's style, try out some simple control operators
Avoid "junk" values, including noncomputable functions
Deal with nontermination through a bottom element
Solve recursion equations with fixed points
Bottom element
Partial order approximation
Functions in semantic domains are not "overdefined"
Partial order review:
Examples:
set inclustion
approximation on partial functions
instance relation on types
trivial partial order for flat domains (Int, Bool)
Complete partial order or CPO
 <= x for all x
every ascending chain a1 <= a2 <= ... has a least upper bound _ { a_i }
The "least" upper bound makes the fewest assumptions
Given a cpo S, what's the semantic domain of functions from S to S?
Lift approximation to functions pointwise:
Refinement ordering on functions:
Wellbehaved functions:
Bonuses:
Recursion = Fixed point:
Recursion = Fixed point:
Suppose g F = F
. Proof that F
is factorial.
For all n
, g F n
= n!
, by induction:
F 0 = g F 0 = 1
F n
= { by assumption }
g F n
= { definition of g }
if n = 0 then 1 else n * F (n1)
= { assumption, n > 0 }
n * F (n1)
= { induction hypothesis }
n * (n1)!
= { definitiion of factorial }
n!
Now you do it
*** Conversion to fixed point ***
length = \xs.null? xs 0 (+ 1 (length (cdr xs)))
lg = \lf.\xs.null? xs 0 (+ 1 (lf (cdr xs)))
Monotonic, continous $g$ has fixed point:
Fixedpoint construction:
What do I want from you?
Fixed points in (untyped) lambda:
Fixed points work on domains:
This evening, my friends and I discovered that we had a hard time answering a question on the assignment because we had no way to frame what it meant to "write the function whose fixed point is a solution to the equation".
Write a function f such that if (f ys = ys) then
ys = (cons 0 (map ((curry +) 1) ys)))
Could you produce a different, dissimilar recursion equation and use the steps listed on the assignment to solve it? I think such an example would be helpful.
Equation
p = pair 0 (fst p)
Function:
f = \p.pair 0 (fst p)
zero approximation:
p0 = __
p1 = f p0 = pair 0 (fst __) = pair 0 __
p2 = f p1 = pair 0 (fst (pair 0 __)) = pair 0 0
p3 = f p2 = pair 0 (fst (pair 0 0)) = pair 0 0
p2
is the least fixed point of function f
and the least defined solution to the recursion equation.
Directstyle value semantics for a stratified language with statements and expression.
Domains: (board)
V = Int
Ide
sigma : S = Ide > V
Exp = S > V
Stm = S > S
Notes:
Semantic functions on base types and on syntax:
I : ID > Ide
N : NUM > V
E : exp > Exp
S : stm > Stm
Semantic functions for values and store
plus : V > V > V
update : S * Ide * V > S
Abstract syntax and semantic equations for statements:
S [[ID := exp]] sigma = update(sigma, I[[ID]], E[[exp]]sigma)
S [[s1; s2]] sigma = S[[s2]](S[[s1]] sigma)
Abstract syntax and semantic equations for expressions:
E [[NUM]] sigma = N[[NUM]] // note state is ignored
E [[ID]] sigma = sigma(I[[ID]])
E [[e1 + e2]] sigma = plus (E [[e1]] sigma) (E [[e2]] sigma)
Things to notice:
__
)Today's plan:
DON'T FORGET JOE STOY'S TALK TOMORROW, 3PM
Ideas:
Answer
Domains: (board)
V = Int
Ide
l : L = Locations // new
rho : Env = Ide > L // new
sigma : S = L > V // changed
Exp = Env > S > V // changed
Stm = Env > L > S > S // changed
Notes:
Semantic functions for base types and syntax
I : ID > Ide
N : NUM > V
E : exp > Exp
S : stm > Stm
P : prog > Env > L > S > L > V
^ ^
 + location where answer is stored

+ location for next local variable
Semantic functions for values, environment, and store
plus : V > V > V
update_S : S * L * V > S
update_E : Env * Ide * L > Env
New functions:
l0 : L // location supply
next : L > L
sigma0 : S // initial state
rho0 : Env // initial env
rho1 : Env = update_E(rho0, I [[Answer]], l0)
Abstract syntax and semantic equations for programs:
P [[s]] = S [[s] rho_1 (next l0) sigma0 l0
Abstract syntax and semantic equations for statements:
S [[ID := exp]] rho l sigma = update_S(sigma, rho(I[[ID]]), E[[exp]] rho sigma)
S [[{ VAR x ; s }]] rho l sigma =
S [[s]](update_E(rho, I[[x]], l))(next l) sigma
S [[s1; s2]] rho l sigma = S[[s2]] rho l (S[[s1]] rho l sigma)
Abstract syntax and semantic equations for expressions:
E [[NUM]] rho sigma = N[[NUM]] // note state is ignored
E [[ID]] rho sigma = sigma(rho(I[[ID]]))
E [[e1 + e2]] rho sigma = plus (E [[e1]] rho sigma) (E [[e2]] rho sigma)
Things to notice:
??????
Ideas:
Domains: (board)
V = Int
Ide
sigma : S = Ide > V // changed
Exp = S > V // changed
Stm = S > S // changed
Semantic functions for base types and syntax
I : ID > Ide
N : NUM > V
E : exp > Exp
S : stm > Stm
Semantic functions for values, environment, and store
plus : V > V > V
update : S * L * V > S
ifnz : V * S * S > S
Abstract syntax and semantic equations for statements:
S [[ID := exp]] sigma = update(sigma, I[[ID]], E[[exp]] sigma)
S [[s1; s2]] sigma = S[[s2]] (S[[s1]] sigma)
S [[IF exp THEN s1 ELSE s2]] sigma =
ifnz(E[[exp]]sigma, S[[s1]] sigma, S[[s2]]sigma)
S [[WHILE exp DO stm]] sigma =
Y(\f.\sigma.ifnz(E[[exp]]sigma,f(S[[stm]]sigma),sigma)) sigma
S [[skip]] sigma = sigma
Abstract syntax and semantic equations for expressions:
E [[NUM]] sigma = N[[NUM]] // note state is ignored
E [[ID]] sigma = sigma(I[[ID]])
E [[e1 + e2]] sigma = plus (E [[e1]] sigma) (E [[e2]] sigma)
Things to notice:
??????
Ideas:
Why are we doing this?
We want to model those constructs.
Domains: (board)
V = Int
Ide
sigma : S = Ide > V
A = Answers // new
theta : C = S > A // new
Exp = S > V // same
Stm = C > C // changed
Prog = A // by definition, the answer you get when
// you run it (e.g., stdout, stderr, exit code)
Semantic functions for base types and syntax
I : ID > Ide
N : NUM > V
E : exp > Exp
S : stm > Stm
P : prog > Prog
Semantic functions for values, environment, and store
plus : V > V > V
update : S * L * V > S
ifnz : V * S * S > S
done : C // end of the program
Abstract syntax and semantic equations for a program:
P[[stm]] = S[[stm]] done sigma_0
Abstract syntax and semantic equations for statements:
S [[ID := exp]] theta = \sigma.theta(update(sigma, I[[ID]], E[[exp]] sigma))
S [[s1; s2]] theta = S[[s1]] (S[[s2]] theta) // order restored!
S [[IF exp THEN s1 ELSE s2]] theta =
\sigma.ifnz(E[[exp]]sigma, S[[s1]] theta sigma, S[[s2]]theta sigma)
Abstract syntax and semantic equations for expressions:
E [[NUM]] sigma = N[[NUM]] // note state is ignored
E [[ID]] sigma = sigma(I[[ID]])
E [[e1 + e2]] sigma = plus (E [[e1]] sigma) (E [[e2]] sigma)
Things to notice:
Context: operational, type theory, denotational
Gotos and labels: Important recursion equation
rho_final, theta = S[[stm]] (rho_final, done)
where
Stm = C env * C > C env * C
Domains: (board)
V = Int
Ide
sigma : S = Ide > V
A = Answers // new
theta : C = S > A // new
Exp = S > V // same
Stm = C > C // changed
Prog = A // by definition, the answer you get when
// you run it (e.g., stdout, stderr, exit code)
Semantic functions for base types and syntax
I : ID > Ide
N : NUM > V
E : exp > Exp
S : stm > Stm
P : prog > Prog
Semantic functions for values, environment, and store
plus : V > V > V
update : S * L * V > S
ifnz : V * S * S > S
done : C // end of the program
Abstract syntax and semantic equations for a program:
P[[stm]] = S[[stm]] done sigma_0
Abstract syntax and semantic equations for statements:
S [[ID := exp]] theta = \sigma.theta(update(sigma, I[[ID]], E[[exp]] sigma))
S [[s1; s2]] theta = S[[s1]] (S[[s2]] theta) // order restored!
S [[IF exp THEN s1 ELSE s2]] theta =
\sigma.ifnz(E[[exp]]sigma, S[[s1]] theta sigma, S[[s2]]theta sigma)
Abstract syntax and semantic equations for expressions:
E [[NUM]] sigma = N[[NUM]] // note state is ignored
E [[ID]] sigma = sigma(I[[ID]])
E [[e1 + e2]] sigma = plus (E [[e1]] sigma) (E [[e2]] sigma)
Things to notice:
Ideas:
What if evaluating an expression can raise an exception or cause another controlflow event?
Expression appears in a context, which
Domains: (board)
V = Int
Ide
sigma : S = Ide > V
A = Answers
theta : C = S > A
kappa : K = V > C // new
Exp = K > C // changed
Stm = C > C // same
Prog = A // same
Semantic functions for base types and syntax
I : ID > Ide
N : NUM > V
E : exp > Exp
S : stm > Stm
P : prog > Prog
Semantic functions for values, environment, and store
plus : V > V > V
update : S * L * V > S
ifnz : V * A * A > A // changed
done : C // end of the program
error : A // an error answer
Abstract syntax and semantic equations for a program:
P[[stm]] = S[[stm]] (\sigma.error) sigma_0
Abstract syntax and semantic equations for statements:
S [[ID := exp]] theta = E[[exp]](\v.\sigma.theta(update(sigma, I[[ID]], v)))
S [[RETURN exp]] theta = E[[exp]](\v.\sigma.done v) // ignores theta!!
S [[s1; s2]] theta = S[[s1]] (S[[s2]] theta) // unchanged
Abstract syntax and semantic equations for expressions:
E [[NUM]] kappa = kappa(N[[NUM]]) // feed value to context
E [[ID]] kappa = \sigma.kappa(sigma(I[[ID]]))sigma // lookup and continue
E [[e1 + e2]] kappa = // must make contexts for e1 and e2
E[[e1]](\v1.E[[e1]](\v2.kappa (plus v1 v2)))
E [[stm; exp]] kappa = S[[stm]](E[[exp]]kappa) // bonus construct!
Things to notice:
Definitional interpreter!
*** ML domains ***
type V = int
type Ide = string
type S = Ide > V
datatype A = ERROR
 ANSWER of V (* value returned *)
type C = S > A
type K = V > C
type Exp = K > C
type Stm = C > C
*** Starting values ***
exception NotFound
val plus : V * V > V = op +
val upd : S * Ide * V > S
= fn (sigma, n, v) => fn n' =>
if n = n' then v else sigma n'
val done : V > A = fn x => ANSWER x
val sigma0 : S = fn n => raise NotFound
*** Syntax of programs ***
datatype prog = PROG of stm
and stm = op := of Ide * exp
 RETURN of exp
 SEQs of stm * stm
and exp = SEQe of stm * exp
 NUM of V
 ID of Ide
 op + of exp * exp
*** Semantic equations, part I ***
fun P (PROG stm1) = S stm1 (fn s => ERROR) sigma0
and S (id := exp1) th =
E exp1 (fn v => fn s => th (upd(s, id, v)))
 S (RETURN exp1) th =
E exp1 (fn v => fn s => done v)
 S (SEQs (stm1, stm2)) th = S stm1 (S stm2 th)
val _ = P : prog > A
val _ = S : stm > C > C
*** Semantic equations, part II ***
and E (SEQe (stm1, exp1)) k = S stm1 (E exp1 k)
 E (NUM v) k = k v
 E (ID n) k = (fn s => k (s n) s)
 E (exp1 + exp2) k =
E exp1 (fn v1 =>
E exp2 (fn v2 => k (plus(v1, v2))))
val _ = E : exp > K > C
Ideas:
Put it in a giant recursion equation:
Domains: (board)
V = Int
Ide
Lbl // labels, new
sigma : S = Ide > V
A = Answers
theta : C = S > A
rho : Lenv= Lbl > C // label meanings, new
Exp = S > V // back to directstyle expressions
Stm = Env * C > Env * C // changed
Prog = A
Semantic functions for base types and syntax
I : ID > Ide
N : NUM > V
E : exp > Exp
S : stm > Stm
P : prog > Prog
Semantic functions for values, environment, and store
plus : V > V > V
update : S * L * V > S
ifnz : V * S * S > S
done : C // end of the program
fst, snd
Abstract syntax and semantic equations for a program:
P[[stm]] = snd(Y(\(rho_L, theta).S[[stm]](rho_L, done))) sigma_0
rho_final, theta = S[[stm]] (rho_final, done)
Abstract syntax and semantic equations for statements:
S [[ID := exp]] (rho_L,theta) =
(rho_L,\sigma.theta(update(sigma, I[[ID]], E[[exp]] sigma)))
S [[s1; s2]] (rho_L,theta) = S[[s1]] (S[[s2]] (rho_L,theta))
S [[IF exp THEN GOTO L]] (rho_L,theta) =
(rho_L, \sigma.ifnz(E[[exp]]sigma, rho_L([[L]]), theta sigma))
S [[L:]] (rho_L, theta) = (update(rho_L, [[L]], theta), theta)
Abstract syntax and semantic equations for expressions:
E [[NUM]] sigma = N[[NUM]] // note state is ignored
E [[ID]] sigma = sigma(I[[ID]])
E [[e1 + e2]] sigma = plus (E [[e1]] sigma) (E [[e2]] sigma)
Things to notice:
80 and up Excellent
60 to 79 Very Good
42 to 59 Good
30 to 41 Fair
below 30 Poor
Threequarters of exams are Good or better.
Median grade is 53.
40 or below is cause for concern
15% or more improvement on the final replaces the midterm grade completely
Theory is so over!
Where have we been?
Success stories:
What about big programs?
An area of agreement and a great divide:
INFORMATION HIDING
/ \
/ \
reuse / \ modular reasoning
/ \
/ \
OBJECTS MODULES
Two kinds of reuse:
Don't conflate them!
Why Smalltalk?
Alive and well today
The five questions:
Values are objects (even true
, 3, "hello"
)
Even classes are objects!
There are no functionsonly methods on objects
Syntax:
mutable variables
message send
sequential composition of mutations and message sends (side effects)
"blocks" (really closures, objects and closures in one, used as continuations)
No if
or while
. These are implemented by passing continuations to Boolean objects.
(Smalltalk programmers have been indoctrinated and don't even notice)
Environments
Name stands for a mutable cell containing an object:
Dynamic semantics
The initial basis is enormous
*** Abstract syntax (approximate) ***
datatype exp = VAR of name
 SET of name * exp
 BEGIN of exp list
 SEND of name * exp * exp list
 BLOCK of name list * exp list
 LITERAL of rep
Analysis of syntax (board):
VAR
, SET
, BEGIN
as in ImpcoreSEND
is message passing
BLOCK
and LITERAL
are special objects.
Protocol is Smalltalk term of art:
Ruby dialect "duck typing" is a statement about protocols
Protocol is determined by the class of which an object is an instance
Arities:
Every object gets not just the protocol but the implementations from its class. And a class can inherit from its superclass (more undefined terms) all the way up to class Object
.
*** Simple examples ***
> (isKindOf: 3 Object)
<True>
> (isMemberOf: 3 Object)
<False>
> (isNil 3)
<False>
> (isNil nil)
<True>
> (println nil)
nil
nil
*** SLIDE ***
> true
<True>
> True
<class True>
> Object
<class Object>
Key strategy for reuse in objectoriented languages: subtype polymorphism
A value of the subtype can be used wherever a value of the supertype is expected.
Board:
Only crippled languages like C++ identify subtype with subclass
Only the ignorant and uneducated don't know the difference
Subtyping mathematically:
Subtyping is not inheritance:
On offer:
Metaclasses
Object!
/
/
Class
/
/
Object ===> Object's metaclass
/ /
/ /
beetle ====> Car ===> Car's metaclass
/ /
/ /
prius ====> HybridCar ===> Hybrid Car's metaclass
Smalltalk
hides all internal state (instance variables are private)
exposes all methods (methods are public)
"Private method" is just a programming convention
(Other objectoriented languages have elaborate controls for public/private)
Typical objectorientation:
What if you need to choose code based on both receiver and argument?
Solution: method name encodes both operation and type of argument
Examples:
addIntegerTo:
addFloatTo:
Blocks are closures
(block (formals) expressions)
For paremeterless blocks (normally continuations), [expressions]
Blocks are objects
You don't "apply" a block; you "send it the value
message"
*** SLIDE ***
> (val twice (block (n) (+ n n)))
Booleans use continuationpassing style
Blocks delay evaluation
*** Boolean example: minimum ***
> (val x 10) > (val y 20) > (ifTrue:ifFalse: (<= x y) [x] [y]) 10
Protocol for Booleans:
Booleans implemented with two classes True
and False
one value apiece
*** Classes True and False ***
(class True Boolean () (method ifTrue:ifFalse: (trueBlock falseBlock) (value trueBlock)) ) (class False Boolean () (method ifTrue:ifFalse: (trueBlock falseBlock) (value falseBlock)) )
*** Common methods in superclass ***
(class Boolean Object () (method ifTrue:ifFalse: (trueBlock falseBlock) (subclassResponsibility self)) (method ifTrue: (trueBlock) (ifTrue:ifFalse: self trueBlock [])) (method not ()
(ifTrue:ifFalse: self [false] [true])) ... )
*** Blocks manage loops ***
> (val x 10)
> (val y 20)
> (whileTrue: [(<= x (* 10 y))]
[(set x (* x 3))])
nil
> x
270
Protocol for blocks:
Goal of objects is reuse
Key to successful reuse is a welldesigned class hierarchy
Killer app: toolkits for building user interfaces
Smalltalk blue book is 90 pages on language, 300 pages on library
Lots of abstract classes
Boolean
``Collection hierarchy'':
Collection  contain things 
Set  objects in no particular order 
KeyedCollection  objects accessible by keys 
Dictionary  any key 
SequenceableCollection  keys are consecutive integers 
List  can grow and shrink 
Array  fixed size, fast access 
Collection mutators:
Collection observers:
Collection iterators:
Most subclass methods work by delegating all or part of work to list members
N.B. Set
is a client of List
, not a subclass!
ENGINEERING FELLOWS FOCUS GROUPS
Unlocking the final door for building large software systems
You have all gotten good at firstclass functions, algebraic data types, and polymorphic types
Modules are the last tool you need to build big systems
Many paths will be open to you:
Direct use of functional programming with modules, largely at hedge funds and investment banks
Ideas translate to many other languages
Functions of a true modules system:
Hide representations, implementations, private names
"Firewall" separately compiled units (promote independent compilation)
Possibly reuse units
Real modules include separately compilable interfaces and implementations
Designers almost always choose static type checking, which should be "modular" (i.e., independent)
C and C++ are cheap plastic imitations
Collect declarations
Things typically declared:
Variables or constants (values, mutable or immutable)
Types
Procedures (if different from values)
Exceptions
Terminology: a module is a client of the interfaces it depends on
Roles of interfaces in programming:
The unit of sharing and reuse
Explainer of libraries
Underly component technology
The bestproven technology for structuring large systems.
Means of hiding information (ask "what secret does it hide?")
A contract between programmers
A way to limit what we have to understand about a program
Interface is the specification or contract that a module implements
Two approaches to writing interfaces
Interface "projected" from implementation:
Full interfaces:
Structure of implementations
Holds all dynamically executed code (expressions/statements)
May include private names
May depend only on interfaces, or on interfaces and implementations both (max cognitive benefits when all dependency is mediated by interfaces)
Dependencies may be implicit or explicit (import
, require
, use
)
The Perl of module languages?
Has all known features
Supports all known styles
Rejoice at the expressive power
Weep at the terminology, the redundancy, the bad design decisions
What we've been using so far is the core language
Modules are a separate language layered on top.
ML module terminology
Interface is a signature
Implementation is a structure
Generic module (parameterized implementation) is a functor
You have too many choices
Implementations
May depend directly on other implementations (bad design  no true separate compilation)
May be parameterized over interfaces (signatures) and then instantiated with implementations
May be "constrained" by interfaces (good)
using "signature ascription"
Structures match signatures
The implementation relation is called matching
Foo
matches a signature FOO
" if Foo
is an implementation of FOO
Surprise! A single structure can match many signatures
A structure has a private name space
To refer to other structures, use qualified names
Int.toString
TextIO.stdIn
Real.fromInt
Structures relate to other structures
No explicit imports (terrible decision)
Explicit exports handled by signature ascription (good decision)
Structures may be nested (great decision)
Workaround for lack of imports
External structures should usually be abbreviated:
structure S = String
structure C = Char
val ucase = S.map C.toUpper
val doubleVision =
S.translate (fn c => S.implode [c, c])
If you use open
, I will hunt you down and hurt you
(breaks modularity)
Signature says what's in a structure
Specify types (w/kind), values (w/type), exceptions.
Ordinary type examples:
type t // abstract type, kind *
eqtype t
type t = ... // 'manifest' type
datatype t = ...
Type constructors work too
type 'a t // abstract, kind * => *
eqtype 'a t
type 'a t = ...
datatype 'a t = ...
*** Example: environment signature ***
signature ENV = sig
type 'a env
eqtype name
exception NotFound of name
val empty : 'a env
val bind : name * 'a * 'a env > 'a env
val find : name * 'a env > 'a
end
*** Environment structure ***
structure ListEnv : ENV = struct
type name = string
type 'a env = (name * 'a) list
exception NotFound of name
val empty : 'a env = []
fun bind(n, x, rho) = (n,x) :: rho
fun find(n, []) = raise NotFound n
 find(n, (n', x')::l) =
if n = n' then x' else find(n, l)
Matching is OK, but too many details exposed
Two kinds of ascription (matching)
What is hidden?
Transparent: hides only names
(all types are made manifest!)
Opaque: hides names and types
Transparent ascription ListENV : ENV
Opaque ascription ListENV :> ENV
>
symbol plugs the leak)But wait! The opaque version is useless! We don't know what type name
is.
Reveal that names are strings:
Going back, signature does not really work with search trees
Equality types are a bad idea anyway
*** More abstract environments ***
signature ENV = sig
type 'a env
type name
exception NotFound of name
val empty : 'a env
val bind : name * 'a * 'a env > 'a env
val find : name * 'a env > 'a
end
*** Environment with binarysearch tree ***
signature ORD_KEY = sig
type ord_key
val compare : ord_key * ord_key > order
end
functor SearchTreeFn(Key: ORD_KEY) : ENV =
struct
... binarytree implementation ...
end
(ord_key
?! These people have no taste!)
You give me an ordered type, and I'll give you an environment:
Heavier weight than corelanguage solution
But you don't have to futz with passing comparison functions
And therefore it's much harder to get wrong!
Key idea: SearchTreeFn
is compiled once, used over and over, and supports modular type checking
It is guaranteed to work (or not) based on signature alone
C++ templates are desperate to have this feature.
Let's see the code!
*** Implementation of searchtree functor ***
functor SearchTreeFn(Key: ORD_KEY) : ENV = struct
type name = Key.ord_key
datatype 'a tree
= NULL
 NODE of name * 'a * 'a tree * 'a tree
type 'a env = 'a tree
exception NotFound of name
val empty = NULL
fun find (name, NULL) = raise NotFound name
 find (name, NODE (n', x', L, R)) =
case Key.compare (name, n')
of EQUAL => x'
 LESS => find (name, L)
 GREATER => find (name, R)
*** Searchtree functor, finale ***
fun bind (name, x, NULL) = NODE (name, x, NULL, NULL)
 bind (name, x, NODE (n', x', L, R)) =
case Key.compare (name, n')
of EQUAL => NODE (name, x, L, R)
 LESS => NODE (n', x', bind(name, x, L), R)
 GREATER => NODE (n', x', L, bind(name, x, R))
end
*** Using the functor ***
structure OrderedString = struct
type ord_key = string
val compare = String.compare
end
structure StringEnv =
SearchTreeFn(OrderedString)
Note that ORD_KEY
is not required
Twostep process
Signature matching in two steps:
Step one: realize the signature
Step two: cut down the structure to match
Discard components (like NODE
and NULL
)
Implementation may be more polymorphic than required
A form of subtyping!!
"More polymorphic"
name list > int
could be represented by forall 'a . 'a list > int
.signatures, structures, functors
signature matching:
When are two types equal?
Abbreviation defined with type
equals RHS
Type defined with datatype
is always fresh
datatype
construct is generativeAbstract types are never equal,
Example with sharing
Interactive interpreter with persistent store.
signature INTERPRETER = sig
type value
type exp
val eval : exp * value ref Env.env > value
end
signature PERSISTENT_STORE = sig
type value
val save : (string * value) list > unit
val restore : unit > (string * value) list
end
Abstractions must agree
Interpreter and store must have same value
type:
functor INTERACTIVE_SYSTEM (
structure I : INTERPETER
structure S : PERSISTENT_STORE
sharing type I.value = S.value
) = struct ... end
Could use SML like Modula, Ada
Doesn't work as well as Modula, Ada:
Goal of SML is flexibility, not control.
Tofte style:
(See "four lectures on standard ML")
Crucial advantages:
Because there are no structures until the end, properties of structures can't "leak" by accident.
Dependencies between modules are explicit and totally controlled.
What most people do: a haphazard mix of functors and structures
Core language  Modules language 
Function maps value to value  Functor maps structure to structure 
Cannot depend on a type  Can depend on a type 
Unknown type? Must carry around clunky type parameter  Unknown type? No problem! It's invisible. (Think `type 'a env` does not depend on names) 
Lightweight notation  Heavyweight notation 
Inconvenient if there are a lot of arguments  Easy when there are a lot of arguments (names!) 
Has a runtime cost  Essentially no runtime cost 
Modules more flexible about information hiding
Comparing modules and objects
A module can encapsulate two types:
signature CONTROL_FLOW_GRAPH = sig
type graph
type node
val entry : graph > node
val exit : graph > node
val graph_of : node > graph
val successors : node > node list
...
end
Both graph and node ``methods'' know all
Modules, objects, and reuse
Like Smalltalk protocols and Java interfaces,
SML signatures support subtyping
But no automatic code reuse (no inheritance)
SML modules: powerful but strange
Power beyond mortal ken
Clean winner on expressive power
Reliability, not so much
Implicit imports bad, open
worse
No true separate compilation.
Too much freedom?
Module semantics is considered intimidating!
Needed: good models, guidelines, tutorials