COMP 105 pages:	Admin	Calendar	Syllabus	ML	Homework	Solutions	Coding Style	Coding Rubric	Lecture Notes	Recitation Notes	Stats

9 September 2020: Introduction to Comp 105

Introduction

My pronouns are she, her, hers.

Please call me “Kathleen,” or “Professor Fisher”

Reflection: Are all programming languages created equal?

Under what circumstances is one programming language better than another one? Write 2-3 sentences explaining your thoughts.

Why we are here

Write code from scratch
In a language you’ve never seen
That sails past code review

Why it’s different from COMP 40:

Machines haven’t really changed since 1970 (mainframes), 1980 (minicomputers), 1982 (microcomputers)
Languages are changing all the time

What it involves:

Programming practice (emphasize widely used features: functions, types, modules, objects)
Mathematical description (how things work, see patterns)
(Example: see patterns of recursion in homework)

Today:

Start with technical work
Then a little about experience and expectations

What is a programming language?

What is a PL?

If you’ve taken 170, a caution: every programming language is a formal language, but not every formal language is a programming language.

My answer: a language with a compositional structure that has an element of run time to it.

Key idea: phase distinction

Something you write down, which just sits there
Something that runs

We will talk about “compile time” and “run time” (and later, type-checking time).

Key ideas: static and dynamic

The thing that you write down is syntax

What flies around at run time is values

(You know about syntax and values already—but this semester we’re going to think about them very systematically.

First three classes: new ways of thinking about stuff you already know

Check your understanding: The big picture

The goal of this course is to teach you principles of programming languages so you’ll be able to teach yourself new languages quickly and improve your coding in many different languages.

True
False

Almost all languages are equally powerful, and so are equally well suited to solving any programming task.

True
False

Check your understanding: Syntax vs. semantics

Indicate whether each statement is more about the syntax or the semantics of a program:

A valid C program always has a matching } for every {.

syntax
semantics

A valid C program never dereferences an uninitialized pointer.

syntax
semantics

In a valid C program, the right-hand side of an = operation must be an expression.

syntax
semantics

In our house style, every if statement has a corresponding else.

syntax
semantics

When a ternary conditional expression x ? y : z is evaluated, it acts a lot like an if

syntax
semantics

Reflection: Inductive definitions for numerals

Numerals are an example of syntax: the rules for how things are written. At run time, a numeral stands for a natural number; the natural number is its semantics. Numerals are things like 0, 2020, 73, and 1852.
They are not things like 311t3, 2+2, or -12.

Write an inductive definition of numerals (think base case and inductive case).
There’s more than one! When you finish, look for another.

Value structure for numerals: Induction

Example: Numerals

Numerals are syntax; Natural numbers are semantics

Numerals:

2018
73
1852

Not numerals:

3l1t3
2+2
-12

Approach 1: the cons approach

if d is a digit then d is a numeral
if d is a digit and n is a numeral then dn is a numeral

Approach 2: the snoc approach

if d is a digit then d is a numeral
if d is a digit and n is a numeral then nd is a numeral

Approach 3: the tree approach

if d is a digit then d is a numeral
if n1 is a numeral and n2 is a numeral then n1 n2 is a numeral

have written a list or sequence of digits as a definition. That’s good thinking, but it leaves open the problem of defining “list” or “sequence”. These definitions take care of that issue.

Check your understanding: Representations of numerals

Different representations make different operations easier. Consider the three different ways we discussed to represent numerals, which we called the cons, snoc, and tree approaches.

Which representation makes it easiest to access the most significant digit?

cons
snoc
tree

Which representation makes it easiest to access the least significant digit?

cons
snoc
tree

Which representation makes it equally easy to access either the first or last digit?

cons
snoc
tree

Reflection: Natural numbers

We have limited operations on numerals: stick a digit before or after a numeral, or stick two numerals together. But on numbers we have many more operations. What are two simple, useful operations that are defined on natural numbers?

Value structure for natural numbers: induction again

Natural numbers are semantics for numerals.

Infinite set, that we’ll define using induction.

Base case: a digit is a nat
Inductive case: if m is a nat and d is a digit then 10*m + d is a nat

Check your understanding: Representations of numerals

The decimal definition of natural numbers in the video ignores almost all operations on natural numbers. It uses just addition and multiplication. And not just any addition and multiplication, but a highly stylized one: multiply by ten and then add. So out of all the gazillions of ways we might compute a natural number, we are limiting ourselves. According to this definition, how many different ways of forming natural numbers must we consider to be able to account for every natural number?

one
two
ten
gazillions
infinite

Reflection: Defining functions over natural numbers

We want to define a function that examines a natural number (actually a machine integer) to find out if its numeral can written using only the digit 4. How many cases will such a function have to consider? Do any of the cases look like they will require recursion?

Writing code with recursion and induction

Data-driven design process. Connections to design ideas from 1970s, 1980s, 1990s, plus test-driven development (2000s).

Recursive-function problem

Exercise: all-fours?

Write a function that takes a natural number n and returns true (1) if and only if all the digits in n’s numeral are 4’s.

Key design step: Form of number

Choose inductive structure for natural numbers:

Which case analysis do we want?

Step 1: Forms of DECNUMERAL proof system (1st lesson in program design):

Either a single digit d
Or 10 × m + d, where m ≠ 0

Example inputs

Step 2:

Single digits: 4, 9
Multi-digits: 44, 907, 48

Function’s name and contract

Steps 3 and 4:

Function (all-fours? n) returns nonzero if and only if the decimal representation of n can be written using only the digit 4.

Example results

Step 5: write expected results as unit tests:

(check-assert      (all-fours?  4))
(check-assert (not (all-fours?  9)))
(check-assert      (all-fours? 44))
(check-assert (not (all-fours? 48)))
(check-assert (not (all-fours? 907)))

Algebraic laws

Step 6: Generalize example results to arbitrary forms of data

(all-fours? d) ==  (= d 4)

(all-fours? (+ (* 10 m) d)) ==
   (= d 4) && (all-fours? m)

Left-hand sides turn into case analysis

Step 7:

; (all-fours? d) == ...
; (all-fours? (+ (* 10 m) d)) ==  ...

(define all-fours? (n)
  (if (< n 10)
    ... case for n = d ...
    ... case for n = (+ (* 10 m) d),
        so m = (/ n 10) and
        d = (mod n 10) ...))

Each right-hand side becomes a result

Step 8:

; (all-fours? d) ==  (= d 4)
; (all-fours? (+ (* 10 m) d)) ==
;           (= d 4) && (all-fours? m)

(define all-fours? (n)
   (if (< n 10)
       (= n 4)
       (and (= 4 (mod n 10))
            (all-fours? (/ n 10)))))

Revisit tests:

Step 9:

(check-assert      (all-fours?  4))
(check-assert (not (all-fours?  9)))
(check-assert      (all-fours? 44))
(check-assert (not (all-fours? 907)))

(check-assert (not (all-fours? 48)))

Checklist:

For each form of data, one true and one false
One extra corner case (partly fours)
Tests pass

Check your understanding: Design process

Put the nine steps of the design process for writing functions guided by the structure of the data and algebraic laws in order.

Turn right-hand sides of algebraic laws into results
Specify example results
Choose function’s name
Revisit Tests
Describe function’s contract
List example inputs
Choose representation of inputs
Turn left-hand sides of algebraic laws into case analysis
Write algebraic laws

The `all-fours?` function

;; forms of data
;; Either a single digit d
;; OR 10 * m + d where m not equal 0

;; example inputs
(val x1 4 )
(val x2 9)
(val x3 44)
(val x4 907)
(val x5 48)

;; Pick the name now
;; Write the contract now
;; Write check assert cases now

;; Algebraic Laws
;; (all-fours? d) = (= d 4)
;; (all-fours? (+ (* 10 m) d)) == 
;;     (= d 4) && (all-fours? m)       

;; The code!
  ;; return nonzero if and only if 
  ;; decimal rep of n can be written
  ;; using only the digit 4
(define all-fours? (n)
  (if (< n 10)
      (= n 4)
      (and (= 4 (mod n 10)) 
           (all-fours? (/ n 10)))))

        (check-assert      (all-fours? 4))
        (check-assert (not (all-fours? 9)))
        (check-assert      (all-fours? 44))
        (check-assert (not (all-fours? 48)))
        (check-assert (not (all-fours? 907)))

Coding Interlude: `all-fours?`

Consider the all-fours? code from the previous video. Run the code on the sample data.

Change the code to make it be the all-nines? function.

Reflection: Various syntactic forms

Consider the following two functions that show all the syntactic forms in Impcore.

(define even? (n) (= (mod n 2) 0))

(define 3n+1-sequence (n) ; from Collatz
  (begin
    (while (!= n 1)
       (begin
         (println n)
         (if (even? n)
             (set n (/ n 2))
             (set n (+ (* 3 n) 1)))))
    n))

What do you think these functions do?
What do you notice about the syntax of these functions (ie, how they are written down)?
How might you write these same two functions in C?

Syntax in programming languages

Idea of LISP syntax

Parenthesized prefix syntax:

Names and numerals are basic atoms
Other constructs bracketed with (…) or […] (Possible keyword after opening bracket)

Examples:

(+ 2 2)
(if (isbound? x rho) (lookup rho x) (error 99))

(For now, we use just the round brackets)

Impcore structure

Two syntactic categories: expressions, definitions

No statements!—expression-oriented

(if e1 e2 e3)
(while e1 e2)
(set x e)
(begin e1 ... en)
(f e1 ... en)

Evaluating e has value, may have side effects

Functions f named (e.g., + - * / = < > print)

The only type of data is “machine integer” (deliberate oversimplification)

Syntactic structure of Impcore

An Impcore program is a sequence of definitions

(define mod (m n) (- m (* n (/ m n))))

Compare

int mod (int m, int n) { 
  return m - n * (m / n); 
}

Impcore variable definition

Example

(val n 99)

Compare

int n = 99;

14 September 2020: Abstract syntax and operational semantics

There are PDF slides for 9/15/2020.

Retrospective: Introduction to the study of programming languages, induction and recursion, and Impcore.

Pedagogical studies show that reflection about what you are learning really helps the learning process, so we will start each module by asking you to briefly reflect on what you learned in the previous module.

Write one thing you learned in the previous module and one question you have about that material.

Your responses may be anonymously shared with the rest of the class.

Reflection: Exploring composition in programming languages

Composition, which is the ability to build many sensible things using just a few construction principles, is a critical to programming languages. An example at the syntactic level is the ability to build bigger programs out of smaller pieces. To get some practice with this idea, complete the following exercises:

In C++, write down three expressions and two statements.
Now compose one of the three expressions with the two statements to make another statement.
Now compose all three expressions to make another expression.

Programming-language semantics

“Semantics” means “meaning.”

What problem are we trying to solve?

Know what’s supposed to happen when you run the code

Ways of knowing:

People learn from examples
You can build intuition from words
(Book is full of examples and words)
To know exactly, unambiguously, you need more precision

You may recall the beginner exercise “make a peanut butter and jelly sandwich”

You can watch and learn, but a computer can’t
We must specify exactly what must be done.

Why bother with precise semantics?

Prepare to program in languages you’ve never seen before
- “X-ray glasses”
Compare languages by translating into semantics
- Python and JavaScript are much more similar than they look
- C++ and Java are much different than they look
Prove properties of programs and programming languages
- Useful for security audits and program optimizations
Understand what a correct language implementation must do

The programming languages you encounter after 105 will certainly look different from what we study this term. But most of them will actually be the same. Studying semantics helps you identify that.

The payoff: Your new skills will apply

Behavior decomposes

We want a computational notion of meaning.

What happens when we run (* y 3)?

We must know something about *, y, 3, and function application.

Knowledge is expressed inductively

Atomic forms: Describe behavior directly (e.g., constants, variables)
Compound forms: Behavior specified by composing behaviors of parts

Example of domain where compositionaliy doesn’t apply: English spelling and pronunciation

fish vs ghoti
Both composed from letters, but no rules of composition guide pronunciation

Check your understanding: Principle of composition

The following questions are designed to give you practice with composition by asking you to think about how C statements and expressions are composed to form more complex C statements and expressions.

C has a construct of the form ___?___:___. For the result to be syntactically well formed, what kinds of things can go in the blanks?

statements only
expressions only
a mix

Is the result of filling in the blanks in the above C construct

always a statement?
always an expression?
either a statement or an expression, depending on whether the second two blanks were filled with expresions or statements?

C also has an if construct of the form if (___) ___; else ___;.
For the result to be syntactically well formed, what kinds of things can go in the blanks?

statements only
expressions only
a mix

Is the result of filling in the blanks in the C if construct

always a statement?
always an expression?
either a statement or an expression, depending on whether the second two blanks were filled with expresions or statements?

Recall Impcore concrete syntax

Definitions and expressions:

Define behaviors inductively

We’ll focus on expressions

Base cases (plural!): numerals, names

Inductive steps: compound forms

To determine behavior of a compound form, look at behaviors of its parts

First, simplify the task of specification

What’s different? What’s the same?

 x = 3;               (set x 3)

 while (i * i < n)    (while (< (* i i) n)
   i = i + 1;            (set i (+ i 1)))

Abstract away gratuitous differences

Abstract syntax

Same inductive structure as concrete syntax

But,

More uniform representation
Designed for compiler, not programmer

Concrete syntax: sequence of symbols

Abstract syntax: ???

The abstraction is a tree

The abstract-syntax tree (AST):

Exp = LITERAL (Value)
    | VAR     (Name)
    | SET     (Name name, Exp exp)
    | IFX     (Exp cond, Exp true, Exp false)
    | WHILEX  (Exp cond, Exp exp)
    | BEGIN   (Explist)
    | APPLY   (Name name, Explist actuals)

One kind of “application” for both user-defined and primitive functions (One syntax, two behaviors)

In C, trees are fiddly

typedef struct Exp *Exp;
typedef enum {
  LITERAL, VAR, SET, IFX, WHILEX, BEGIN, APPLY
} Expalt;        /* which alternative is it? */

struct Exp {  // 'alt' combines with any field in union
    Expalt alt;
    union {
        Value literal;
        Name var;
        struct { Name name; Exp exp; } set;
        struct { Exp cond; Exp true; Exp false; } ifx;
        struct { Exp cond; Exp exp; } whilex;
        Explist begin;
        struct { Name name; Explist actuals; } apply;
    };
};

ASTs

Question: What do we assign behavior to?

Answer: The Abstract Syntax Tree (AST) of the program.

An AST is a data structure that represents a program.
A parser converts program text into an AST.

Question: How can we represent all while loops?

while (i < n && a[i] < x) { i++ }

Answer:

Tag code as a while loop
Identify the condition, which can be any expression
Identify the body, which can be any expression

As a data structure:

WHILEX(exp1, exp2), where
exp1 is the representation of (i < n && a[i] < x), and
exp2 is the representation of i++

Let’s picture some trees

An expression:

  (f x (* y 3))

Let’s picture some trees

And a definition:

  (define abs (n)
    (if (< n 0) (- 0 n) n))

Check your understanding: AST Representations

Which of the following is the AST representation of the concrete Impcore syntax (if (> x 9) 1 0)?

WHILEX( APPLY (Name '>', Explist ( VAR 'x', LITERAL LITERAL 1, LITERAL 0))]
(IFX( APPLY (Name '>', Explist ( VAR 'x', LITERAL 9)), LITERAL 0, LITERAL 1))
(IFX( APPLY (Name '>', Explist ( VAR 'x', LITERAL 7)), LITERAL 1, LITERAL 0))
(IFX( APPLY (Name '>', Explist ( VAR 'y', LITERAL 9)), LITERAL 1, LITERAL 0))
(IFX( APPLY (Name '>', Explist ( VAR 'x', LITERAL 9)), LITERAL 1, LITERAL 0))

Key question: names

Reflection: What’s in a name?

We’ve talked about how composition is really important in giving meaning to programs: you can understand a program fragment such as (+ e1 e2) by understanding each of its pieces (+, e1, and e2) and then putting them together.

Some parts of programs can’t be broken down further. An integer, for example, doesn’t have any smaller pieces that are still an integer. But we can understand an integer just by look at the bits that comprise it (aka, its representation).

What about names, such as a variable x? A name doesn’t have any smaller pieces, so we can’t understand it by understanding its pieces. And a name doesn’t contain in itself its meaning the way an integer does.

Explain how you think language implementors might give meaning to a name.

Hint: think about how dictionaries give meanings to words.

From ASTs to behaviors

Behaviors of ASTs, part I: Atomic forms

Numeral: stands for a value

Name: stands for what?

``Environment’’ is formal term

You may also hear:

Symbol table
Name space

So-called “Scope rules” determine which environment governs where

Find behavior using environment

Consider the program fragment

  (* y 3)   ;; what does it mean?

Key Idea: Look up the meaning of variables in an environment

Impcore uses three environments

Global variables ξ (“ksee”)

Functions ϕ (“fee”)

Formal parameters ρ (“roe”)

There are no local variables

Just like awk; if you need temps, use extra formal parameters

Function environment ϕ not shared with variables—just like Perl

Check your understanding: Operational semantics judgments

We find the meaning of variables by looking them up in an environment.
- True
- False
If an Impcore name f stands for a function, which environment will it be in?
- ξ (“ksee”)
- ϕ (“fee”)
- ρ (“roe”)
If an Impcore name x is defined in environment ξ (meaning x is in the domain of ξ), then it represents
- a global variable
- a function
- a formal parameter

16 September 2020: Judgments, Rules, and Proofs

There are PDF slides for 9/17/2020.

Handout: Redacted Impcore rules

Handout: Overview of Induction

Reflection: Modeling computation in math

Suppose the notation e ⇓ 3 means that expression e evaluates to 3 and the notation n ∈ dom ρ means that variable n is in the environment ρ (and hence is defined in ρ).

How would you say an expression evaluates to 7?
How would you say formally that an expression like (+ n 1) evaluates successfully only if n is defined?
How would you say what (set x 0) does, and also (set n 0), and (set m 0), without having to repeat yourself for every variable you want to set to 0?
What information would you need to know what (all-fours? 344) evaluates to?
Why would you want to write these expressions in math, instead of just writing in natural language?

Rules and metatheory

Reflection: Operational semantics judgments

What do you think the rules are for evaluating a literal and for evaluating a variable? Fill in the question marks with your best guess!

⟨LITERAL(v), ξ, ϕ, ρ⟩ ⇓ ⟨?, ?, ?, ?⟩
⟨VAR(x), ξ, ϕ, ρ⟩ ⇓ ⟨?, ?, ?, ?⟩
In the second judgment, how can you tell which environment x belongs to?

Check your understanding: Semantics of variables

Consider the semantic rules for assignment to variables FormalAssign and GlobalAssign.

If x is in both the global environment ξ and the formal environment ρ, which version takes precedent?

Global (ξ)
Formal (ρ)

During the execution of the expression set(x, e), which environments may change? (Check all that apply)

Reflection: Mapping semantics to code

Consider again the semantic rules for assignment to variables FormalAssign and GlobalAssign.

How might these rules guide a language implementor?

Check your understanding: C implementation of environment operations

In the implementation of variable lookup, what C operation corresponds to the semantic check that x ∈ dom ρ?

isvalbound(e->var, formals)
isvalbound(e->var, globals)
fetchval(e->var, formals)

In the implementation of assignment, what part of the semantic rules corresponds to the C operation of bindval(e->set.name, v, globals)?

x∈ dom(ρ)
ρ′{x ↦ v}
x∈ domξ
ξ′{x ↦ v}

Note that the implementation doesn’t precisely follow the semantics. Instead of returning a new environment as directed by the semantics, the implementation instead updates the old environment in place using a side-effecting operation. This choice is more efficient than copying the old environment. It’s a metatheorem of Impcore that this optimization is sound. It’s a metatheorem that will no longer hold when we shift to Scheme and start copying environments to form closures.

Reflection: Using semantics to understand how programs are evaluated

Consider the following code:

      (define and (p q)
        (if p q 0))

      (define digit? (n)
        (and (<= 0 n) (< n 10)))

Suppose we evaluate (digit? 7)

In the body of digit?, what expressions are evaluated in what order?

The body of the digit? function matches the template (f e₁ e₂) where f, e₁, and e₂ are meta-variables that stand for different program fragments. In matching the template to the body of digit?,
- What is f?
- What is e₁?
- What is e₂?

Good and bad judgments

Reflection: Good and bad judgments

Just because something is written in math notation doesn’t make it true!

Which of the following judgements correctly describe what happens at runtime? Explain your answers. Pay special attention to the lack of primes on the concluding environments.

⟨(+ 2 2), ξ, ϕ, ρ⟩ ⇓ ⟨4, ξ, ϕ, ρ⟩
⟨(+ 2 2), ξ, ϕ, ρ⟩ ⇓ ⟨99, ξ, ϕ, ρ⟩
⟨(+ 2 2), ξ, ϕ, ρ⟩ ⇓ ⟨4, ∅, ϕ, ρ⟩
⟨(while 1 0), ξ, ϕ, ρ⟩ ⇓ ⟨77, ξ, ϕ, ρ⟩
⟨(begin (set n (+ n 1)) 17), ξ, ϕ, ρ⟩
⇓ ⟨17, ξ, ϕ, ρ⟩

Proofs: Putting rules together

Every terminating computation is described by a data structure—we’re going to turn computation into a data structure: a tree. Proofs about computations are hard (see: COMP 170), but proofs about trees are lots easier (see: COMP 61).

How do we know it’s right?

From rules to proofs

Judgment speaks truth when ``derivable’’

Special kind of proof: derivation

It’s a data structure (a derivation tree)
- Leaves are axioms: variables and constants
- Nodes are compound expressions
Valid derivation formed by instantiating semantic rules
Spacelike representation of timelike behavior

A form of “syntactic proof”

Recursive evaluator constructs derivation

Root of derivation at the bottom (surprise!)

Build

Start on the left, go up
Cross the ⇓
Finish on the right, go down

First let’s see a movie

<– The ``Tony Hawk’’ algorithm –>

Building a derivation

Building derivations

Check your understanding: Derivation trees

In constructing the derivation tree for the program (digit? 7), what is the rule at the bottom of the tree?

ApplyUser
Var
Literal

What environment will be used to find the meaning of the name digit??

What rule will be used to find the meaning of the sub-expression 7?

ApplyUser
Var
Literal

21 September 2020: Derivations and metatheory

There are PDF slides for 9/22/2020.

All your operational-semantics questions are answered.

Reflection: Metatheory vs theory

Metatheory allows us to prove properties about all derivations, which means it allows us to prove properties about all programs in a language.

The derivation that demonstrates that (digit? 7) evaluates to true is a proof about that particular program, so it theory but not metatheory. A metatheoretic property about Java is that in a well-typed program, every variable mentioned in the program has previously been defined.

Classify the following properties as theory or metatheory:

The evaluation of any Impcore program will never access undefined memory.
- theory
- metatheory
The evaluation of (and (<= 0 n) (< n 10)) will access the value associated with the variable n.
- theory
- metatheory
For any Impcore program p, every variable mentioned in p will be in the domain of either ξ, ϕ, or ρ.
- theory
- metatheory

Note: a theoretic or metatheoretic property can be either true or false. The distinction is whether the property would apply only to specific programs or to all programs were it to be true. The proof methods used to prove theoretic and metatheoretic properties differ because a theory proof applies only to a specific program while a metatheoretic proof applies to all programs in a language.

Proofs about all derivations: metatheory

Syntactic proofs empower metatheory

Derivation/proof 𝒟 is a data structure

Got a fact about all derivations?

It’s a fact about all terminating evaluations
They are in 1 to 1 correspondance

Prove meta-theoretic properties by structural induction over derivations

aka “induction on height of derivation tree”

Example: Evaluating an expression doesn’t create or destroy any global variables (the set of defined global variables is invariant)

Metatheorems often help implementors

More example metatheorems:

OK to mutate environments if you use a stack (Impcore)
Interactive browser doesn’t leak space (POPL 2012)
Device driver can’t harm kernel (Microsoft Singularity)

Reflection: Writing metatheoretic properties

Using the judgment form for Impcore evaluation that we are already familiar with, ⟨e, ξ, ϕ, ρ⟩ ⇓ ⟨v, ξ′, ϕ′, ρ′⟩, how would you formally write “In Impcore, evaluation doesn’t change the set of defined global variables?”?

Using the judgment form for Impcore evaluation that we are already familiar with, ⟨e, ξ, ϕ, ρ⟩ ⇓ ⟨v, ξ′, ϕ′, ρ′⟩, how would you formally write “In Impcore, evaluation doesn’t change the values of any global variables”?

Metatheorems are proved by induction

Induction over height of derivation trees 𝒟

These are “math-class proofs” (not derivations)

Proof

Goes by case analysis of the last rule in the derivation.
Has one case for each rule
Base cases don’t have proper sub-derivations.
Inductive cases assume the induction hypothesis for any proper sub-derivation

Let’s try it!

Cases to try:

Literal
GlobalVar
SetGlobal
IfTrue
ApplyUser2

For your homework, “Simplified Impcore” leaves out While and Begin rules.

Check your understanding: Proving metatheoretic properties

The following questions all relate to the metatheoretic proof that evaluation in Impcore does not change the set of defined global variables, ie, for any Impcore expression e, if ⟨e, ξ, ϕ, ρ⟩ ⇓ ⟨v, ξ′, ϕ′, ρ′⟩, then dom(ξ) = dom(ξ′). We prove this property by induction on the derivation of ⟨e, ξ, ϕ, ρ⟩ ⇓ ⟨v, ξ′, ϕ′, ρ′⟩.

The proof proceeds by case analysis on the last rule used in the derivation ⟨e, ξ, ϕ, ρ⟩ ⇓ ⟨v, ξ′, ϕ′, ρ′⟩.
- True
- False

One of the rules that must be considered is the IfFalse rule
- True
- False

The case for GlobalVar is an inductive case.
- True
- False

The case for GlobalAssign is an inductive case.
- True
- False

Summary and Next Steps

Theory: Operational semantics

A “high-level assembly language”
Justify algebraic laws
Guide to implementation
(Will return to formal rules about code in about a month)

Practice: For the next month, programming!

Return to algebraic laws! Lean on them!

23 September 2020: Scheme I: Recursive Programming with Lists and S-Expressions

There are PDF slides for 9/24/2020.

Retrospective: Operational Semantics

Write one thing you learned about operational semantics and one question you still have. Your answers may be anonymously shared with your classmates.

Reflection: From inference rules to recursive functions

In the previous module, you encountered a whole bunch of inference rules. How do you think those inference rules related to functions? When might they be recursive functions?

For a new language, five powerful questions

As a lens for understanding, you can ask these questions about any language:

What are the values? What do expressions/terms evaluate to?
What environments are there? That is, what can names stand for?
What is the abstract syntax? What are the syntactic categories, and what are the forms in each category?
How are terms evaluated? What are the judgments? What are the evaluation rules?
What’s in the initial basis? Primitives and otherwise, what is built in?

(Initial basis for μScheme on page 99)

Introduction to Scheme

Why Scheme?

Vehicle to study recursion and composition, two powerful programming techniques:

Recursive functions in depth
Two recursive data structures: the list and the S-expression
More powerful ways of putting functions together (compositionality again, and it leads to reuse)

Today: programming with lists and S-expressions using algebraic laws

Two new kinds of data:

The function closure: the key to “first-class” functions
Pointer to automatically managed cons cell (mother of civilization)

Scheme Values

Most values are S-expressions.

An fully general S-expression is one of

a symbol 'Halligan 'tufts
a literal integer 0 77
a literal Boolean #t #f
(cons v₁ v₂), where v₁ and v₂ are S-expressions

It is very common to work with a list of S-expressions, which is either

the empty list '()
(cons v vs), where v is an S-expression and vs is a list of S-expressions

We say “an S-expression followed by a list of S-expressions”

Lists: A subset of S-Expressions.

These are your linked lists from COMP 11 or COMP 15.

Can be defined via a recursion equation or by inference rules:

Check your understanding: S-Expressions

Every Scheme list is an S-expression.

True
False

Every Scheme S-expression is a List(A) for some A.

True
False

Select each of the following S-expressions that are also List(A) where A is the set of symbols.

(cons 'Tufts 'Jumbos)
(cons 'Tufts '())
(cons '2020 '())

Reflection: Manipulating values

Values are only useful to the extent that they can be manipulated. What operations do you think would be useful to have on a cons cell?
Recall how we defined numbers via different inductive definitions in Impcore. Using similar thinking, what cases will we need to handle for lists?

Understanding values through their operations

Constructors:

Form values
List constructors: '() and cons
Any list is therefore constructed with '() or with cons applied to a value and a smaller list.

Observers:

These ask, “how were you formed, and from what parts?”
They are null?, pair?, car, and cdr (car and cdr also known as “first” and “rest”, “head” and “tail”, and many other names)

Laws of lists:

(null? '()) == #t
(null? (cons v vs)) == #f
(car (cons v vs)) == v
(cdr (cons v vs)) == vs

Check your understanding: Deconstructing lists

A list is formed with either '() or cons.

What operation distinguishes between these two types of list?

car
cdr
null?

Given this structure, how many cases will a function that consumes a list typically have?

Why are lists useful?

Sequences a frequently used abstraction
Can easily approximate a set
Can implement finite maps with association lists (aka dictionaries)
You don’t have to manage memory

These “cheap and cheerful” representations are less efficient than balanced search trees, but are very easy to implement and work with—see the book.

The only thing new here is automatic memory management. Everything else you could do in C. (You can have automatic memory management in C as well.)

Programming with lists: The `member?` function

;; forms of data
;; '()
;; (cons n ns) where n is a number and 
               ns is a list of numbers

;; example inputs
(val x1 '())
(val x2 '(1 2 3 4))
(val x3 '(8))
(val x4 '(10 10))

;; Write check assert cases now

;; Algebraic Laws
;; (member? n '()) = #f
;; (member? n (cons n ns)) = #t
;; (member? n (cons n' ns)) = (member? n ns), 
                              where n' != n

;; The code!
  ;; return true if and only if 
  ;; list of numbers ms contains
  ;; number m
(define member? (m ms)
  (if (null? ms)
      (if (= m (car ms))
      (member? m (cdr ms)))))

        (check-assert (not (member? 8 '())))
        (check-assert (not (member? 8 '(1 2 3))))
        (check-assert      (member? 8 '(8)))
        (check-assert      (member? 8 '(10 7 8 12)))

Coding Interlude: `member?`

Consider the member? code from the previous video. Run the code on the sample data.

Add a check-assert case where the element is in the list.
Add a check-assert case where the element is not in the list.

Live coding: do two distinct elements sum to 16?

Programming with lists: The `sum-16?` function

Do two distinct list elements sum to 16?

;; forms of data
;; '()
;; (cons n ns) where n is a number and 
               ns is a list of numbers

;; example inputs
(val x5 '(6 10))
(val x6 '(3 6 12 7 8 9 10))

;; Write check assert cases now

;; Algebraic Laws
;; (sum-16? '()) = #f
;; (sum-16? (cons n ns)) = 
;;      (sum-16? ns)  OR
;;      n and an element in ns sum to 16
;;      i.e., (member?  (- 16 n) ns))


;; The code!
  ;; returns #t if and only if ms contains 
  ;; two distinct elements that sum to 16, 
  ;; where ms is a list of numbers

(define sum-16? (ms)
  (if (null? ms)
      (|| (sum-16? (cdr ms))  
       (member? (- 16 (car ms)) (cdr ms)))))

        (check-assert (not (sum-16? x1)))
        (check-assert (not (sum-16? x2)))
        (check-assert (not (sum-16? x3)))
        (check-assert (not (sum-16? x4)))
        (check-assert      (sum-16? x5)))
        (check-assert      (sum-16? x6)))

Coding Interlude: `sum-16?`

Consider the sum-16? code from the previous video. Run the code on the sample data.

Modify the algebraic laws and the code to check if the product of two elements in the list equals 16. Note that * and / are the multiplication and division operations in μScheme.

Reflection: Quote syntax

Nobody wants to write a constant list of numbers as (cons 0 (cons 1 (cons 2 (cons 3 (cons 4 (cons 5 (cons 6 (cons 7 (cons 8 (cons 9 '())))))))).

What notation would you suggest instead?

Quoted literals

Defining constants can be verbose:

(cons 6 (cons 7 (cons 8 (cons 9 '()))))

Quotation syntax more compact:

(quote (6 7 8 9)) or '(6 7 8 9)

Works with any constant S-expression:

(cons 'a (cons (cons 'b '()) '()))

replaces

'(a (b))

Check your understanding: Quoted literals

Write each ordinary S-expression using quote syntax:

(cons 1 '()) = BLANK1
(cons 1 (cons 2 '())) = BLANK2
(cons (cons 1 '()) '()) = BLANK3
(cons (cons 'a (cons 'b '())) (cons 'c '())) = BLANK4

Reflection: New syntactic forms

To program with lists and S-expressions, we absolutely have to have functions like null?, cons, car, and cdr. And the quote syntax is certainly useful. To make it even easier to program with S-expressions, what other new syntax might be useful?

μScheme’s new syntax

28 September 2020: First-class and higher-order functions

There are PDF slides for 9/29/2020.

Big picture:

Think of S-expressions and lists, not pointers
Write less code and reuse more code

Retrospective: Scheme I: Recursive programming with lists

Write one thing you learned in the previous module. Your answers may be anonymously shared with your classmates.

Reflection: Where is there shared structure?

Consider writing code to answer the following:

Do two list elements add to 16?
Do two list elements multiply to 16?
Are two list elements identical (duplicate?)
Are two students driving to same location?
Do two TAs combine to cover hours 4:00–6:00?

Now answer these questions:

How is the code for each question similar?
How does the code for each question differ?

Functions are values

Review

Do two elements in a list add to 16?

;; (sum-16? '())         = #f
;; (sum-16? (cons n ns)) =
;;      (sum-16? ns)  OR
;;      ns has an element m with n + m = 16

C implementation: nested for loops

Functional implementation: recursive functions

Reflection: Shared structure

Consider again the following questions:

Do two list elements add to 16?
Do two list elements multiply to 16?
Are two list elements identical (duplicate?)
Are two students driving to same location?
Do two TAs combine to cover hours 4:00–6:00?

For each question, describe the question-specific search criteria. For example, for the first question, the criterion is

  (define sum-is-16? (n m) (= (+ m n) 16))

Answers in pseudo-code or English are fine.

Implementing `any-two?`

Implementing

Algebraic laws for special case:

;; (sum-16? '())         = #f
;; (sum-16? (cons n ns)) = (sum-16? ns)  OR
;;                         ns has an element m
;;                         with m + n = 16

Algebraic laws for general case:

;; (any-two? q? '())         == ...
;; (any-two? q? (cons y ys)) == ...

Implementing

Algebraic laws for special case:

;; (sum-16? '())         = #f
;; (sum-16? (cons n ns)) = (sum-16? ns)  OR
;;                         ns has an element m
;;                         with m + n = 16

Algebraic laws for general case:

;; (any-two? q? '())         == #f
;; (any-two? q? (cons y ys)) == (any-two? q? ys) OR
;;                              ys has z with (q? y z)

Start with linear search!

Linear search

;; (exists? p? '())         == ...
;; (exists? p? (cons y ys)) == ...

Linear search

;; (exists? p? '())         == #f
;; (exists? p? (cons y ys)) == (|| (p? y)
;;                                 (exists? p? ys))

Operator || short-circuits:
Evaluates second argument only if first returns #f

Check your understanding: The `exists?` function

The call (exists? even? '()) returns

#t
#f

The call (exists? odd? '(0 2 4 7 8)) returns

#t
#f

How many of the items in the list does the call to (exists? odd? '(0 2 4 7 8)) examine?

Introduction to `lambda`

Two-element search

Algebraic laws:

;; (any-two? q? '())         == #f
;; (any-two? q? (cons y ys)) == (any-two? q? ys) OR
;;                              (exists? q-with-y? ys)
;;               where (q-with-y? z) == (q? y z)

But where does q-with-y? come from?

It is different for every y!

Functional programming, Part Two

(q-with-y? z) == (q? y z)

q-with-y? = a function that takes z, returns

q-with-y? = (lambda (z) (q? y z))

Built into Scheme since 1973!
Added to C++ in 2011

Two-element search, complete {&} formal

;; (any-two? q? '())         == #f
;; (any-two? q? (cons y ys)) ==
;;   (|| (any-two? q? ys)
;;       (exists? (lambda (z) (q? y z)) ys))

Code in lecture notes.

(define any-two? (q? xs)
  ;; tell if any two elements y and z 
  ;; in order satisfy (q? y z)
  (if (null? xs)
      (|| (any-two? q? (cdr xs))
          (exists? (lambda (z) (q? (car xs) z))
                   (cdr xs)))))

Lambda expressions in general

In general, (lambda (x1 ... xn) e)

lambda is an anonymous function
- x1 to xn are parameters
- e is the body
x1 to xn are bound in e
Other variables are free in e

Capture: lambda “takes” free variables with it.

Programming with functions

Functions are “first-class” values

Passed as parameters
Returned as results
Stored in data structures

Created with lambda
(In Scheme, define combines “val” and “lambda”)

Check your understanding: Lambda

Consider the following code when answering the questions below:

(define combine (p1? p2?)
   (lambda (x) (if (p1? x) (p2? x) #f)))

(define divvy (p1? p2?)
   (lambda (x) (if (p1? x) #t (p2? x))))

(val c-p-e (combine prime? even?))
(val d-p-o (divvy   prime? odd?))

The expression (c-p-e 9) equals
- #t
- #f
The expression (c-p-e 8) equals
- #t
- #f
The expression (c-p-e 7) equals
- #t
- #f
The expression (d-p-o 9) equals
- #t
- #f
The expression (d-p-o 8) equals
- #t
- #f
The expression (d-p-o 7) equals
- #t
- #f

Escaping functions

μScheme’s semantics: Overview and `let`

Reflection: Scheme vs Impcore

We’ve now studied two different languages: Impcore and μScheme.

Name two things you can do in μScheme that you can’t do in Impcore.
Speculate on how the semantics of μScheme might differ from that of Impcore to account for those features.

Impcore: Things that should offend you

Look up function vs look up variable:

Different interfaces!

To get variable, check multiple environments

Create a function? Must give it a name

High cognitive overhead
A sign of second-class citizenship

It’s not precisely true that rho never changes.
New variables are added when they come into scope.
Old variables are deleted when they go out of scope.
But the location associated with a variable never changes.

Strange syntax, but simple semantics!

Key idea: don’t worry about memory

Check your understanding: meta-variables

Consider the semantic rule for let.

Match the meta-variables with their meanings.

Meta-variables:

Meanings:

The names that are being given values.
The expressions that specify the values to associate with each variable.
The values the expressions evaluate to.
The locations where values will be stored.
The environment that associates variable names with locations.
The store that maps locations to values.

Another way to write this one:

The names that are being given values.
The expressions that specify the values to associate with each variable.
The values the expressions evaluate to.
The locations where values will be stored.
The environment that associates variable names with locations.
The store that maps locations to values.

Check your understanding: The semantics of `let`

Consider the semantic rule for let.

Expression e₁ is evaluated after expression e₂.

True
False

The memory cells allocated for x₁ and x₂ must be distinct, meaning they cannot alias.

True
False

In evaluating the body of the let e, any side effects that occured during the evaluation of e₁ and e₂ will be visible and variables x₁ and x₂ will be in scope.

True
False

Semantics of `lambda`

Key idea: ``every name I can see—remember where it is stored.’’

Check your understanding: Semantics of lambda

Consider the semantic rule for lambda.

Side effects (changes to the global store) triggered by the evaluation of the body of the lambda (ie, e) will take place during the evaluation of the lambda.

True
False

The environment captured in the closure is the same as the environment in force at the point where the lambda appears in the source program

True
False

Semantics of function application

Questions about ApplyClosure:

What if we used σ instead of σ₀ in evaluation of e₁?
What if we used σ instead of σ₀ in evaluation of arguments?
What if we used ρ_c instead of ρ in evaluation of arguments?
What if we did not require ℓ₁, ℓ₂ ∉ dom(σ)?
What is the relationship between ρ and σ?

Check your understanding: Semantics of application

Consider the semantic rule for ApplyClosure2.

Which expression evaluates to a closure?

e
e₁
e₂
e_c

Which expression represents the first argument to the function?

e
e₁
e₂
e_c

Which expression represents the body of the function being applied?

e
e₁
e₂
e_c

In what order are the expressions mentioned in the rule evaluated?

e₁, e₂, e_c, e
e_c, e₁, e₂, e
e, e₁, e₂, e_c
e_c, e₂, e₁, e

Which environment is in force at the point where the body of the function is evaluated?

ρ
ρ_c
ρ_c{x₁ ↦ ℓ₁, x₂ ↦ ℓ₂}

Immutable data: its costs and benefits (not covered in class)

Key idea of functional programming: constructed data. Instead of mutating, construct a new one. Supports composition, backtracking, parallelism, shared state.

Already doing it: immutable data (append)

Always safe to share data (I can’t mess up things for you)
Perfect for parallel/distributed (think Erlang)
Perfect for web (JSON, XML)

More functional programming

Back to lists!

Append two lists

(define append (xs ys)
  ;; returns a new list that holds all
  ;; the elements of xs followed by
  ;; all the elements of ys
  ...)

Rapid design process

(define append (xs ys) ...)

Forms of data?

Algebraic laws?

(append '()         ys) == 

(append (cons z zs) ys) ==

Equations and function for append

(append '()         ys) == ys

(append (cons z zs) ys) == (cons z (append zs ys))


(define append (xs ys)

  (if (null? xs)

      ys

      (cons (car xs) (append (cdr xs) ys))))

Cost model

The major cost center is cons because it corresponds to allocation.

How many cons cells are allocated?

Let’s rigorously explore the cost of append.

Structural Induction Principle for List(A)

Suppose I can prove two things:

IH (’())
Whenever a in A and also IH(as), then IH (cons a as)

then I can conclude

Forall as in List(A), IH(as)

Claim: Cost (append xs ys) = (length xs)

Proof: By induction on the structure of xs.

Base case: xs = ’()

I am not allowed to make any assumptions.
```
(append '() ys)
= { because xs is null }
ys
```
Nothing has been allocated, cost is zero

(length xs) is also zero.

Therefore, cost = (length xs).

Inductive case: xs = (cons z zs)

I am allowed to assume the inductive hypothesis for zs.

Therefore, I may assume the number of cons cells allocated by (append zs ys) equals (length zs)

Now, the code:

(append (cons z zs) ys)
  = { because first argument is not null }
  = { because (car xs) = z }
  = { because (cdr xs) = zs }
(cons z (append zs ys))

The number of cons cells allocated is 1 + the number of cells allocated by (append zs ys).

cost of (append xs ys)
 = { reading the code }
1 + cost of (append zs ys)
 = { induction hypothesis }
1 + (length zs)
 = { algebraic law for length }
(length (cons z zs))
 = { definition of xs }
(length xs)

Conclusion: Cost of append is linear in length of first argument.

More functional programming (not covered in class)

List-design shortcuts

Three forms of “followed by”

Given: Element .. list of values = (cons x xs)
Define: List of values .. list of values = (append xs ys)
Ignore: List of values .. element = (snoc xs y)
Or (append xs (list1 y))

Two lists? Four cases!

Using informal math notation with .. for “followed by” and e for the empty sequence, we have these laws:

xs .. e         = xs
e .. ys         = ys
(z .. zs) .. ys = z .. (zs .. ys)
xs .. (y .. ys) = (xs .. y) .. ys

The underlying operations are append, cons, and snoc. Which ..’s are which?

But we have no snoc
If we cross out the snoc law, we are left with three cases… but case analysis on the first argument is complete.

So cross out the law xs .. e == xs.
Which rules look useful for writing append?

Costs of list reversal

Algebraic laws for list reversal:

reverse '() = '()
reverse (x .. xs) = reverse xs .. reverse '(x) = reverse xs .. '(x)

And the code?

Naive list reversal

(define reverse (xs)
   (if (null? xs)
       '()
       (append (reverse (cdr xs))
               (list1 (car xs)))))

The list1 function maps an atom x to the singleton list containing x.

How many cons cells are allocated? Let’s let n = |xs|.

Q: How many calls to reverse? A: n
Q: How many calls to append? A: n
Q: How long a list is passed to reverse? A: n-1, n-2, … , 0
Q: How long a list is passed as first argument to append? A: n-1, n-2, … , 0
Q: How many cons cells are allocated by call to list1? A: one per call to reverse.
Conclusion: O(n²) cons cells allocated. (We could prove it by induction.)

The method of accumulating parameters

Write laws for

(revapp xs ys) = (append (reverse xs) ys)

Who could write the code?

Reversal by accumulating parameters

(define revapp (xs ys)
   ; return (append (reverse xs) ys)
   (if (null? xs)
       ys
       (revapp (cdr xs) 
               (cons (car xs) ys))))

(define reverse (xs) (revapp xs '()))

The cost of this version is linear in the length of the list being reversed.

Parameter ys is the accumulating parameter.
(A powerful, general technique.)

30 September 2020: Higher-order functions

There are PDF slides for 10/1/2020.

Reflection: List functions

Consider the function, with the algebraic laws:

This algorithm searches for an element in a list.

List other algorithms you are familiar with that operate over a list (or linked list, or an array) and explain briefly what they do.

Predefined list algorithms

Some classics:

exists? (Example: Is there a number?)
all? (Example: Is everything a number?)
filter (Example: Select only the numbers)
map (Example: Add 1 to every element)
foldr (Visit every element;
also called reduce, accum, a “catamorphism”)

The power of lambda

Using first-class functions to enlarge your vocabulary

List computations
Cheap functions from other functions

Supported in many languages: Haskell, ML, Python, JavaScript

List search: `exists?`

Algorithm encapsulated: linear search

Example: Is there a number in the list?

Algebraic laws:

(exists? p? '())          == ???
(exixts? p? (cons x xs))  == ???


(exists? p? '())          == #f
(exists? p? (cons x xs))  == #t,
                             if (p? x)
(exixts? p? (cons x xs))  == exists? p? xs,
                             otherwise

Defining

; (exists? p? '())         = #f
; (exists? p? (cons y ys)) = #t,          if (p? y)
; (exists? p? (cons y ys)) = (exists? p? ys),
                                          otherwise
-> (define exists? (p? xs)
      (if (null? xs)
          #f
          (if (p? (car xs)) 
              #t
              (exists? p? (cdr xs)))))
-> (exists? number? '(1 2 zoo))
#t
-> (exists? number? '(apple orange))
#f

Coding Interlude: `exists?`

Define a list ys and two predicates p1? and p2? such that exists? p1? ys evaluates to #t and exists? p2? ys evalutes to #f. Run the code to confirm your answers are correct.

An alternative formulation of the algebraic laws for the exists? function is:

;; (exists? p? '() = #f
;; (exists? p? (cons x xs)) = (|| (p? x) 
;;                                (exists? p? xs))

Modify the definition of exists? following this algebraic formulation instead, and re-run your examples.

List selection: filter

Algorithm encapsulted: Filter list according to a predicate

Example: Keep positive numbers in a list.

Defining

; (filter p? '())         == '()
; (filter p? (cons y ys)) ==
;         (cons y (filter p? ys)), when (p? y)
; (filter p? (cons y ys)) ==
;         (filter p? ys), when (not (p? y))

-> (define filter (p? xs)
     (if (null? xs)
       '()
       (if (p? (car xs))
         (cons (car xs) (filter p? (cdr xs)))
         (filter p? (cdr xs)))))

Running filter

-> (filter (lambda (n) (>  n 0)) '(1 2 -3 -4 5 6))
(1 2 5 6)
-> (filter (lambda (n) (<= n 0)) '(1 2 -3 -4 5 6))
(-3 -4)

Coding Interlude: `filter`

Use lambda to define a predicate that returns true iff the argument is a number larger than 9.

Define a list ys that contans a mix of one- and two-digit numbers.

Use your lambda function and the filter function defined in the video to produce a new list zs that contains only the elements in ys with at least two-digits.

Run your code.

How would you modify the code to produce a list with only one-digit numbers?

Reflection: Lifting functions to lists: `map`

Consider the map function. The algorithm it encapsulates is to transform every element according to an argument function. If sq is the function that squares its argument, then map sq '(1,2,3) will produce the list '(1,4,9).

Please complete the algebraic laws for the map function:

(map f '())         ==  ???
(map f (cons n ns)) ==  ???

Algorithm encapsulated: Transform every element according to argument function

Example: Square every element of a list.

Algebraic laws:

(map f '())         ==  '()
(map f (cons n ns)) ==  (cons (f y) (map f ys))

Defining and running map


-> (define map (f xs)
     (if (null? xs)
       '()
       (cons (f (car xs)) (map f (cdr xs)))))
-> (map number? '(3 a b (5 6)))
(#t #f #f #f)
-> (map (lambda(x)(* x x)) '(5 6 7))
(25 36 49)

The universal list function: `foldr`

Algorithm encapsulated: Aggregate all elements in a list.

Example: What is the sum of the numbers in a list?

foldr takes two arguments:

zero: What to do with the empty list.
plus: How to combine next element with running results.

Example: foldr plus zero '(a b)

cons a (cons b '())
 |       |      |
 v       v      v
plus a (plus b zero)

Check your understanding:

What does the following code evaluate to?

-> (define foldr (plus zero xs)
     (if (null? xs)
       zero
       (plus (car xs) (foldr plus zero (cdr xs)))))
-> (define combine (x a) (+ 1 a))
-> (foldr combine 0 '(2 3 4 1))

0
10
24
4

Reflection: Language design: Why the redundancy?

We’ve just seen that functions like exists?, map, and filter can all be easily written in terms of the foldr function. The foldr function, in turn, can easily be written in terms of basic recursion. Other languages have similar redundancy involving loops: for, while, etc.

It takes extra effort for library designers to code more functions or for language designers to support more features.

What do you think makes it worth their effort to support this range of functions and looping constructs?

Currying

Deriving one-argument functions from two-argument functions

Currying: The motivation

Remember me?

q-with-y? = (lambda (z) (q? y z))

Happens so often, there is a function for it:

q-with-y? = ((curry q?) y)

Called a partial application (one now, one later)

Map/search/filter love curried functions

-> (map     ((curry +) 3) '(1 2 3 4 5))
; add 3 to each element


-> (exists? ((curry =) 3) '(1 2 3 4 5))
; is there an element equal to 3?


-> (filter  ((curry >) 3) '(1 2 3 4 5))
; keep elements that 3 is greater then

The idea of currying

Input: a binary function f(x,y)
Output: a function f'
- Input: argument x
- Output: a function f''
  - Input: argument y
  - Output: f(x,y)

What is the benefit?

Functions like exists?, all?, map, and filter expect a function of one argument.

To get there, we use currying and partial application.

Slogan: Curried functions take their arguments “one-at-a-time.”

Check your understanding:

What does the following code evaluate to?

-> (map     ((curry +) 3) '(1 2 3 4 5))

15
18
'(4 5 6 7 8)
'(1 2 3 3 4 5)
'(3 4 5 6 7 8)
4

-> (exists? ((curry =) 3) '(1 2 3 4 5))

'()
#t
#f
0

-> (filter  ((curry >) 3) '(1 2 3 4 5))
                           ; tricky

'(1 2)
'(1 2 3)
’()
'(3 4 5)
'(4 5)

Composing functions

Composing Functions

In math, what is the following equal to?

(f o g)(x) == ???

Another algebraic law, another function:

(f o g) (x) = f(g(x))
(f o g) = lambda x. (f (g (x)))

One-argument functions compose

-> (define o (f g) (lambda (x) (f (g x))))
-> (define even? (n) (= 0 (mod n 2)))
-> (val odd? (o not even?))
-> (odd? 3)
#t
-> (odd? 4)
#f

Coding Interlude: `compose`

The predicate null? returns #t iff the argument list is empty, while the function not inverts a boolean. Use the compose function (o) to write a predicate that tests whether a list is non-empty.

Run your code on both empty and non-empty lists.

Proofs about functions

Function consuming A is related to proof about A

Q: How to prove two lists are equal?

A: Prove they are both ’() or that they are both cons cells cons-ing equal car’s to equal cdr’s
Q: How to prove two functions equal?

A: Prove that when applied to equal arguments they produce equal results.

5 Oct 2020: Costs, tail calls, and continuations

There are PDF slides for 10/6/2020.

Retrospective: Higher-order functions retrospective

Write one thing you learned about higher-order and anonymous functions. Your answers may be anonymously shared with your classmates.

Reflection: Expected performance

Consider the following algebraic laws for list reversal:

reverse '() = '()
reverse (x .. xs) = reverse xs .. reverse '(x) 
                  = reverse xs .. '(x)

where .. denotes the append operation on lists.

Now consder the corresponding code:

(define reverse (xs)
   (if (null? xs)
       '()
       (append (reverse (cdr xs))
               (list1 (car xs)))))

where list1 is a function that makes a one-element list from its argument.

Answer these questions:

For a list reversal function to be considered to have good performance, what should its running time be in terms of the length of its argument?
What is the running time of the reverse function above?

Where have we been, where are we going: functions

Today: costs and control flow

Costs of list reversal

Naive list reversal

(define reverse (xs)
   (if (null? xs)
       '()
       (append (reverse (cdr xs))
               (list1 (car xs)))))

Consider the call reverse xs and let n = |xs|.

How many stack frames are allocated for reverse and append?

How many cons cells are allocated?

Q: How many calls to reverse? A: n
Q: How many calls to append?
A: n
Q: How long is list passed to reverse?
A: n-1, n-2, … , 0
Q: How long is list passed as 1st arg to append?
A: n-1, n-2, … , 0
Q: How many cons cells are allocated by call to list1? A: one per call to reverse.
Conclusions: (We could prove by induction.)
- O(n) stack frames are allocated.
- O(n²) cons cells allocated.
- Function is quadratic.

Alarm bells should be ringing: Reversing a list should be linear!

We’ll get to a linear verson of list reversal via tail calls and the method of accumulating parameters.

Tail calls

What is tail position?

Tail position is defined inductively:

The body of a function is in tail position
When (if e1 e2 e3) is in tail position, so are e2 and e3
When (let (...) e) is in tail position, so is e, and similary for letrec and let*.
When (begin e1 ... en) is in tail position, so is en.

Idea: The last thing that happens

Check your understanding: Tail Position

Consider the following code

-> (define foldr (plus zero xs)
     (if (null? xs)
       zero
       (plus (car xs) (foldr plus zero (cdr xs)))))

Now answer the following questions.

The occurrence of expression zero in the true-branch of the if expression is in tail position.

True
False

The occurrence of expression plus in the false-branch of the if expression is in tail position.

True
False

The occurrence of expression foldr in the false-branch of the if expression is in tail position.

True
False

Tail calls and the method of accumulating parameters

Tail-call optimization & \ accumulating parameters

Tail-call optimization

Anything in tail position is the last thing executed,
so its stack frame can be recycled
Before executing a call in tail position,
abandon current stack frame
Results in asymptotic space savings

Accumulating Parameter

Add a parameter to accumulate answer
Moves recursive call to tail position

Example: Accumulating reverse function

Key ideas:

Recursive call is in a tail position
Accumulating parameter collects answer

Contract:

  (revapp xs ys) = (append (reverse xs) ys)

Laws:

  (revapp '()         ys) == ys
  (revapp (cons z zs) ys) ==
             (revapp zs (cons z ys))

Reversal by accumulating parameters

; laws: (revapp '()         ys) = ys
;       (revapp (cons z zs) ys) =
;                     (revapp zs (cons z ys))

(define revapp (xs ys)
   ; return (append (reverse xs) ys)
   (if (null? xs)
       ys
       (revapp (cdr xs) 
               (cons (car xs) ys))))


(define reverse (xs) (revapp xs '()))

We’ll seee that the cost of this version is linear in the length of the list being reversed.

Parameter ys is the accumulating parameter.

Example: revapp '(1 2) '()

Call stack: (each line replaces previous one)

revapp '(1 2) '()    --> 
revapp '(2)   '(1)   -->
revapp '()    '(2 1)

Consider the call revapp xs '() and let n = |xs|.

How many stack frames are allocated?

How many cons cells are allocated?

Q: How many calls to revapp? A: n
Q: How many of them are tail calls? A: n
Q: How many stack frames are allocated for revapp? A: 1
Q: How much work is done per call to revapp? A: constant.
Conclusions: (We could prove by induction.)
- O(1) stack frames for revapp are allocated.
- O(n) cons cells allocated.
- Function is linear in length of list being reversed.

Check your understanding: Accumulating Parameters

An accumulating parameter to a recursive function f accumulates the answer so that a recursive call to f can be in tail position.

True
False

The stack frame of a function call that is not in tail position cannot be shared with subsequent calls because its values may still be needed.

True
False

Tail-call optimizations and the use of accumulating parameters can only result in constant-time improvements to the complexity of a function.

True
False

Continuations

Reflection: Another construct that transfers control

Are tail calls familiar? What other programming construct

Transfers control to another point in the code?
Uses no stack space?

Tail calls as gotos with arguments

Gotos

Transfer control to another point in the code
Use no stack space
Can only go to code marked with labels

Tail calls

Transfer control to another point in the code
Use no stack space
Can transfer control to any function
Can take parameters!

Continuations: First-class tail calls

A continuation is code that represents “the rest of the computation.”

Not a normal function call because continuations never return
Think “goto with arguments”

Context: an advanced technique, heavily used in event-driven programming:

GUIs, games, web, embedded devices

Different coding styles

Direct style

Last action of a function is to return a value. (This style is what you are used to.)

Continuation-passing style (CPS)

Last action of a function is to “throw” answer to a continuation.
(For us, tail call to a parameter.)

Uses of continuations

Compiler representation:
- Compilers for functional languages often convert direct-style user code to CPS because CPS matches control-flow of assembly.
First-class continuations.
- Some languages provide a construct for capturing the current continuation and giving it a name k.
- Control can be resumed at captured continuation by throwing to k.
A style of coding that can mimic exceptions
Callbacks in GUI frameworks and web services
Event loops in game programming and other concurrent settings

Implementation

We’re going to simulate continuations with function calls in tail position.
First-class continuations require compiler support.
- primitive function that materializes “current continuation” into a variable.

Check your understanding: Continuation Introduction

A continuation is different from a normal function because it never returns to the context from which it was invoked.

True
False

Continuations cannot take arguments.

True
False

Continuations are used in GUI frameworks and to mimic exceptions.

True
False

Motivating Example: Using continuations to signal exceptions

Reflection: Handling unusual circumstances

Consider the following algebraic laws for a witness function:

  (witness p? xs) == x, where (member x xs),
                        provided (exists? p? xs)

Now answer this questions:

What should the function do if there exists no such x?

Design Problem: Missing Value

Provide a witness to existence:

 (witness p? xs) == x, where (member x xs),
                       provided (exists? p? xs)

Problem: What if there exists no such x?

Bad ideas:

nil
special symbol fail
run-time error

Good idea: exception (but not available in uScheme)

Question: How much stack space is used by the call?

Answer: Constant

Example Use: Instructor Lookup

-> (val 2020f '((Fisher 105)(Monroe 40)(Chow 116)))
-> (instructor-info 'Fisher 2020f)
(Fisher teaches 105)

-> (instructor-info 'Chow 2020f)
(Chow teaches 116)

-> (instructor-info 'Votipka 2020f)
(Votipka is-not-on-the-list)

Coding Interlude: `witness-cps`

Consider the following code:

(define witness-cps (p? xs succ fail)
  (if (null? xs)
      (fail)
      (let ([z (car xs)])
        (if (p? z)
            (succ z)
            (witness-cps p? (cdr xs) succ fail)))))


(val 2020f '((Fisher 105)(Monroe 40)(Chow 116)))


(define instructor-info (instructor classes)
  (let (
         [s  (lambda (info)    ; success continuation
                (list3 instructor 'teaches (cadr info)))]
         [f  (lambda ()        ; failure continuation
                (list2 instructor 'is-not-on-the-list))])
        (witness-cps (o ((curry =) instructor) car)
                      classes s f)))

(instructor-info 'Fisher 2020f)
(instructor-info 'Chow 2020f)
(instructor-info 'Votipka 2020f)

Modify the code to add another professor and course number, and add a test case for that professor.

Run your code.

Modify the code to return 'is-not-teaching-this-semester if the instructor’s name is not on the list.

Run your code to confirm it returns your new response when given a name not on the list.

Explain how would you modify the code to search for a course number instead of an instructor name.

7 Oct 2020: Continuations for backtracking

There are PDF slides for 10/8/2020.

Expressions to avoid

(if <p> #t #f)  ; worst

(append (list1 <x>) <ys>) ; most common

(cons <x> (cons <y> '())) ; prefer `list2`

Homework alert

Functions list-of?, formula?

Can be passed any value
Must handle all cases (Figure 2.2, page 94)

Reflection: Solving Boolean formulas

On your homework, you will use continuations to organize a search to find satisfying assignments for Boolean formulas.

A formula has one of these forms:

given the following record declarations:

For each formula, find a satisfying assignment or explain why none exists.

(val f1 (make-and (list4 'x 'y 'z (make-not 'x))))
   ; x /\ y /\ z /\ !x

(val f2 (make-not (make-or (list2 'x 'y)))) 
   ; !(x \/ y)

(val f3 (make-or (list2 'x 'y)))
   ; x \/ y

(val f4 (make-not (make-and (list3 'x 'y 'z))))
   ; !(x /\ y /\ z)

Solving Boolean formulas with continuations

A formula is one of these:

given the following record definitions (in book):

Solving Boolean Formulas

(val f1 (make-and (list4 'x 'y 'z (make-not 'x))))
   ; x /\ y /\ z /\ !x

Answer: NONE

Solving Boolean Formulas

(val f2 (make-not (make-or (list2 'x 'y)))) 
   ; !(x \/ y)

Answer:

{ x |-> #f, y |-> #f }

Solving Boolean Formulas

(val f3 (make-or (list2 'x 'y)))
   ; x \/ y

Answer: Three solutions!

{ x |-> #t, ... }
{ y |-> #t, ... }

Solving Boolean Formulas

(val f4 (make-not (make-and (list3 'x 'y 'z))))
   ; !(x /\ y /\ z)

Answer: Many solutions
(all 7 ways the variables can not all be #t):

{ x |-> #f, ... }, ...

Searching via continuations: An example

Finding a satisfying assignment

Example formula:

  (x \/ y) /\ (!x /\ !z)

Continuations for Search

the name of the literal in the box.

Solving a Literal

start carries a partial truth assignment to variables current

Box describes how to extend current to make a variable, say x, true.

Case 1: current(x) = #t

Call success continuation with current

Pass fail as resume continuation (argument to success)

Case 2: current(x) = #f

Call fail continuation

Case 3: x not in current

Call success continuation with current{x -> #t}

Pass fail as resume continuation

Solving a literal

; (satisfy-literal-true x current succ fail) =
;   (succ current fail), when x is bound to #t in cur
;   (fail),              when x is bound to #f in cur
;   (succ (bind x #t current) fail), x unbound in cur

(define satisfy-literal-true (x current succ fail)
  (if (bound? x current)
      (if (find x current)
          (succ current fail)
          (fail))
      (succ (bind x #t current) fail)))

Check your understanding: Solving a Negated Literal

Consider the schema we’ve been using to find satisfying assignments for Boolean formulas. In the previous video, we walked through how to instantiate the schema for a positive literal. The following questions ask you to do the same thing for a negated literal, which is a term of the form not x where x is a literal.

The start continuation is called with argument current, the partial truth assignment constructed so far. The box needs to describe how to extend current to make the negated variable not x true.

If current(x) = #t, what should the box do?

Call success continuation with current as argument.
Call fail continuation
Call success continuation with current extended with {x->#t}
Call success continuation with current extended with {x->#f}

If current(x) = #f, what should the box do?

Call success continuation with current as argument.
Call fail continuation
Call success continuation with current extended with {x->#t}
Call success continuation with current extended with {x->#f}

If x is not in current, what should the box do?

Call success continuation with current as argument.
Call fail continuation
Call success continuation with current extended with {x->#t}
Call success continuation with current extended with {x->#f}

The success continuation takes two arguments: current (the partial solution found so far) and resume (the continuation that takes over if current is found to be inconsistent with some new part of the formula. When calling the success continuation, what should we pass as the resume continuation?

(lambda (x) x)
fail
(lambda (x) (not x))
start

Please explain your answer.

Solving A and B

Solver enters A
If A is solved, newly allocated success continuation starts B
If B succeeds, we’re done! Use success continuation from context.
If B fails, use resume continuation A passed to B as fail.
If A fails, the whole thing fails. Use fail continuation from context.

Solving A or B

Solver enters A
If A is solved, we’re good! But what if context doesn’t like solution? It can resume A using the resume continuation passed out as fail.
If A can’t be solved, don’t give up! Try a newly allocated failure continuation to start B.
If ever B is started, we’ve given up on A entirely. So B’s success and failure continuations are exactly the ones in the context.
If B succeeds, but the context doesn’t like the answer, the context can resume B.
If B fails, abject failure all around; call the original fail continuation.

Lisp and Scheme Retrospective

Five powerful questions

What are the values?
What do expressions/terms evaluate to?
What environments are there?
What can names stand for?
What is the abstract syntax?
Syntactic categories? Terms in each category?
How are terms evaluated?
Judgments? Evaluation rules?
What’s in the initial basis?
Primitives and predefined, what is built in?

Next steps

Before tomorrow’s lecture,

Consider good and bad points of Scheme

(Informed by your experience)

Common Lisp, Scheme

Advantages:

High-level data structures
Cheap, easy recursion
Automatic memory management (garbage collection!)
Programs as data!
Hygenic macros for extending the language
Big environments, tiny interpreters, everything between
Sophisticated Interactive Development Environments
Used in AI applications; ITA; Paul Graham’s company Viaweb

Down sides:

Hard to talk about data
Hard to detect errors at compile time

Bottom line: it’s all about lambda

Major win
Real implementation cost (heap allocation)

Next time: think about good and bad points

Bonus content: Scheme as it really is

Macros!
Cond expressions (solve nesting problem)
Mutation
…

Macros!

Full Scheme: Macros

A Scheme program is just another S-expression

Function define-syntax manipulates syntax at compile time
Macros are hygienic—name clashes impossible
let, &&, record, others implemented as macros

(See book sections 2.16, 2.17.4)

Conditional expressions

Full Scheme: Conditionals

(cond [c1 e1]    ; if c1 then e1
      [c2 e2]    ; else if c2 then e2
       ...            ...
      [cn en])   ; else if cn then en

; Syntactic sugar---'if' is a macro:
(if e1 e2 e3) == (cond [e1 e2]
                       [#t e3])

Mutation

Full Scheme: Mutation

Not only variables can be mutated.

Mutate heap-allocated cons cell:

(set-car! '(a b c) 'd)  => (d b c)

Circular lists, sharing, avoids allocation

still for specialists only

12 Oct 2020: Introduction to ML

There are PDF slides for 10/13/2020.

Retrospective: Continuations

Write one thing you learned about continuations. Your answers may be anonymously shared with your classmates.

Reflection: μScheme evaluation

Evaluating an idea or an artifact is an advanced cognitive task:

Remember
Understand
Apply
Analyze
Evaluate
Create

Given what you have learned so far in this course, you are in a good position to start to evaluate μScheme.

Please enter at least two strengths of μScheme:

Please enter at least two weaknesses of μScheme.

Your answers may be anonymously shared with your classmates.

The language that we will study next, ML, addresses some of the weaknesses of μScheme.

Introduction to ML

Meta: Apply your new knowledge in Standard ML:

You’ve already learned (most of) the ideas
There will be a lot of new detail
Not your typical introduction to a new language
Not definition before use, as in a manual
Not tutorial, as in Ullman
Instead, the most important ideas that are most connected to your work up to now

ML Overview

Designed for manipulating programs, logic, and symbolic data
Good language for implementing language features
Good language for studying type systems
Theme: Precise ways to describe data

Three new ideas

ML = uScheme + pattern matching + static types + exceptions

Pattern matching is big and important. You will like it. It’s “coding with algebraic laws”
Static types get two to three weeks in their own right.
Exceptions are easy

Pattern matching makes code look more like algebraic laws: one pattern for each case

Static types tell us at compile time what the cases are.

Exceptions solve the problem “I can’t return anything sensible!”

And lots of new concrete syntax!

Example: The `length` function.

Algebraic laws:

length []      = 0
length (x::xs) = 1 + length xs

The code:

fun length []      = 0
  | length (x::xs) = 1 + length xs

val res = length [1,2,3]

Things to notice:
- No brackets! (I hate the damn parentheses)
- Function application by juxtaposition
- Function application has higher precedence than any infix operator
- Compiler checks all the cases (try in the interpreter)

Coding Interlude: `length`

Use the length function

   fun length []      = 0
     | length (x::xs) = 1 + length xs

to calculate the length of the list [1,2,3,4].

Now write the sum function in ML that calculates the sum of the same list.

μScheme to ML Rosetta stone

uScheme                    SML

 (cons x xs)             x :: xs

 '()                     []
 '()                     nil

 (lambda (x) e)          fn x => e

 (lambda (x y z) e)      fn (x, y, z) => e

 ||  &&                  andalso    orelse


 (let* ([x e1]) e2)      let val x = e1 in e2 end

 (let* ([x1 e1]          let val x1 = e1
        [x2 e2]              val x2 = e2
        [x3 e3]) e)          val x3 = e3
                         in  e
                         end

Check your understanding: ML Syntax

Match the μScheme syntax with its ML equivalent.

μScheme:

 &&                  

 (lambda (x y z) e)      

 '()                     

 ||  

 (lambda (x) e)          

 (let* ([x e1]) e2)      

 (cons x xs)             

 '()                     


 (let* ([x1 e1]          
        [x2 e2]          
        [x3 e3]) e)

 fn (x, y, z) => e

 nil

 fn x => e

 x :: xs

 andalso    

 []

 false

 let val x = e1 in e2 end

 true   

 let val x1 = e1
     val x2 = e2
     val x3 = e3
 in  e
 end

 orelse

ML forms of data

Live coding:

(* ML's comment form *)
bool (* true, false *)
string (* "Tufts" *)
char (* #"A" *)
int  (* including negative integers, eg ~12 *)
list of int  [1,2,3]
list of bool [true, false]
ill-typed list [1, true]
pair  (1, "Halligan")
triple  (105, "Scheme", true)

Check your understanding: Simple ML Types

Which of the following is not syntactically correct ML code?

[1,2,3,5]
(* This is a comment *) [true, false, true]
(true, "hello")
~12
#"K"
None of the above

ML Functions

(* Can't use ? in the names of functions in ML *)
fun even x = (x mod 2 = 0)

fun map f []        = []
  | map f (y :: ys) = f y  :: map f ys

(* Anonymous functions are more verbose in ML *)
val res1 = map (fn (x) => (x + 1)) [1,2,3]

(* ML also has a case statement;
   It's not as elegant as clausal definitions. *)
(* orelse shortcircuits *)
fun exists p xs = 
    case xs 
      of [] => false
       | y :: ys => p y orelse exists p ys


val res2 = exists even [1,2,3,4]

(* Higher order functions exist in ML too *)
(* Notation (op +) converts infix to prefix *)
fun sum xs = foldr (op +) 0 [1,2,3,4]

Check your understanding: ML Functions

Which of the following functions correctly implements the filter function, defined by the following algebraic laws? (Note, all of the functions are syntatically correct.)

    filter p [] = []
    filter p (y :: ys) = 
        if p y then y :: filter p ys
               else filter p ys

fun filter p xs =
  case xs
    of [] => []
     | y :: ys => 
          if p y then y :: filter p ys
                 else filter p ys

fun filter p [] = []
  | filter p (y :: ys) = 
       if p y then y :: filter p ys
             else filter p ys

fun filter p xs = 
     foldr (fn(x,a) => 
              if p x then (x::a) else a) 
           [] xs

All of the above.

New Forms of Data

Example: Binary Tree of Integers

(* LEAF and NODE are called data type constructors *)
datatype itree = LEAF
               | NODE of itree * int * itree


(* Data type constructors build data structures*)
(* and can be used in pattern matching *)
fun insert n LEAF                    = 
        NODE (LEAF, n, LEAF)
  | insert n (NODE (left, m, right)) =
        if n < m then
       NODE (insert n left, m, right)
        else if n > m then
       NODE (left, m, insert n right)
    else
       NODE (left, m, right)


(* insert is curried; 
   we might also want an uncurried version of insert *)
fun insert' (n, tree) = insert n tree

(* @ is ML's notation for appending two lists *)
fun elements LEAF = []
  | elements (NODE (left, n, right)) = 
      elements left @ (n :: elements right)

(* o is ML's composition operation *)
val isort = elements o foldl insert' LEAF
  (* uses binary-search tree for sorting *)

val result = isort [4,2,7,3,4,2,8,4,8,9];

Check your understanding: ML Data Type Introduction

Consider the itree datatype declaration:

datatype itree = LEAF
               | NODE of itree * int * itree

and the function elements

fun elements LEAF = []
  | elements (NODE (left, n, right)) = 
      elements left @ (n :: elements right)

If the elements function were applied to each of the following itree data structures, which would return the list [1,0,5]?

NODE (NODE (LEAF,0,LEAF),1,NODE (LEAF,5,LEAF))
NODE (NODE (LEAF,5,LEAF),1,NODE (LEAF,0,LEAF))
NODE (NODE (LEAF,1,LEAF),0,NODE (LEAF,5,LEAF))

Exceptions in ML

Example: Take from a list

(* Note use of exceptions. *)
(* And ; to sequence imperative operations *)
exception TooShort
fun take 0 _       = []  (* wildcard! *)
  | take n []      = raise TooShort
  | take n (x::xs) = x :: (take (n-1) xs)

val res5 = take 2 [1,2,3,4]
val res6 = take 3 [1]
           handle TooShort =>
             (print "List too short!"; [])

Check your understanding: ML Exceptions

Recall the code for the take function:

exception TooShort
fun take 0 _       = []  (* wildcard! *)
  | take n []      = raise TooShort
  | take n (x::xs) = x :: (take (n-1) xs)

and some code that exercises it:

val res5 = take 2 [1,2,3,4]
val res6 = take 3 [1]
           handle TooShort =>
             (print "List too short!"; [])

Consider the partially-written function find which satisfies the following algebraic laws:

find p [] = raise NotFound
find p (x::xs) = x          if p x
                 find p xs  otherwise

and some code that exercises it:

exception Blank1
fun find p [] = Blank2 NotFound
  | find p (x::xs) = 
       if p x then x 
              else find p xs
val res = find even [1,3,5] 
            Blank3 NotFound => Blank4

Fill in the blanks to correctly complete the find function and return an answer of 0?

ML—Five Questions

Values: num/string/bool, constructed data

Environments: names stand for values (and types)

Syntax: definitions, expressions, patterns, types

Evaluation: uScheme + case and pattern matching

Initial Basis: medium size; emphasizes lists

(Question Six: type system—a coming attraction)

in a course-specific video or in some other medium.

A note about books

Ullman is easy to digest

Ullman costs money but saves time

Ullman is clueless about good style

Suggestion:

Learn the syntax from Ullman
Learn style from Ramsey, Harper, & Tofte

Details in course guide Learning Standard ML

Length

fun length []      = 0
  | length (x::xs) = 1 + length xs

val res = length [1,2,3]

Map

fun map f []      = []
  | map f (x::xs) = (f x) :: (map f xs)

val res1 =
  map length [[], [1], [1,2], [1,2,3]]

Map, without redundant parentheses

fun map f []      = []
  | map f (x::xs) =  f x  ::  map f xs

val res1 =
  map length [[], [1], [1,2], [1,2,3]]

Filter

fun filter pred []      = [] 
  | filter pred (x::xs) =  (* no 'pred?' *)
      let val rest = filter pred xs
      in  if pred x then
            (x :: rest) 
          else
            rest
      end

val res2 =
  filter (fn x => (x mod 2) = 0) [1,2,3,4]

Filter, without redundant parentheses

fun filter pred []      = [] 
  | filter pred (x::xs) =  (* no 'pred?' *)
      let val rest = filter pred xs
      in  if pred x then
             x :: rest
          else
            rest
      end

val res2 =
  filter (fn x => (x mod 2) = 0) [1,2,3,4]

Exists

fun exists pred []      = false
  | exists pred (x::xs) = 
      (pred x) orelse (exists pred xs)

val res3 =
  exists (fn x => (x mod 2) = 1) [1,2,3,4]
(* Note: fn x => e is syntax for lambda *)

Exists, without redundant parentheses

fun exists pred []      = false
  | exists pred (x::xs) = 
       pred x  orelse  exists pred xs

val res3 =
  exists (fn x => (x mod 2) = 1) [1,2,3,4]
(* Note: fn x => e is syntax for lambda *)

All

fun all pred []      = true
  | all pred (x::xs) =
      (pred x) andalso (all pred xs)

val res4 = all (fn x => (x >= 0)) [1,2,3,4]

All, without redundant parentheses

fun all pred []      = true
  | all pred (x::xs) =
       pred x  andalso  all pred xs

val res4 = all (fn x => (x >= 0)) [1,2,3,4]

Take

exception TooShort
fun take 0 _       = []  (* wildcard! *)
  | take n []      = raise TooShort
  | take n (x::xs) = x :: (take (n-1) xs)

val res5 = take 2 [1,2,3,4]
val res6 = take 3 [1]
           handle TooShort =>
             (print "List too short!"; [])

(* Note use of exceptions. *)

Take, without redundant parentheses

exception TooShort
fun take 0 _       = []  (* wildcard! *)
  | take n []      = raise TooShort
  | take n (x::xs) = x ::  take (n-1) xs

val res5 = take 2 [1,2,3,4]
val res6 = take 3 [1]
           handle TooShort =>
             (print "List too short!"; [])

(* Note use of exceptions. *)

Drop

fun drop 0 zs      = zs
  | drop n []      = raise TooShort
  | drop n (x::xs) = drop (n-1) xs

val res7 = drop 2 [1,2,3,4]
val res8 = drop 3 [1]
           handle TooShort =>
             (print "List too short!"; [])

Takewhile

fun takewhile p [] = []
  | takewhile p (x::xs) = 
      if p x then (x :: (takewhile p xs))
      else []

fun even x = (x mod 2 = 0)
val res8 = takewhile even [2,4,5,7]
val res9 = takewhile even [3,4,6,8]

Takewhile, without redundant parentheses

fun takewhile p [] = []
  | takewhile p (x::xs) = 
      if p x then  x ::  takewhile p xs
      else []

fun even x = (x mod 2 = 0)
val res8 = takewhile even [2,4,5,7]
val res9 = takewhile even [3,4,6,8]

Dropwhile

fun dropwhile p []              = []
  | dropwhile p (zs as (x::xs)) = 
      if p x then (dropwhile p xs) else zs
val res10 = dropwhile even [2,4,5,7]
val res11 = dropwhile even [3,4,6,8]

(* fancy pattern form: zs as (x::xs) *)

Dropwhile, without redundant parentheses

fun dropwhile p []              = []
  | dropwhile p (zs as (x::xs)) = 
      if p x then  dropwhile p xs  else zs
val res10 = dropwhile even [2,4,5,7]
val res11 = dropwhile even [3,4,6,8]

(* fancy pattern form: zs as (x::xs) *)

Folds

fun foldr p zero []      = zero
  | foldr p zero (x::xs) = p (x, (foldr p zero xs))

fun foldl p zero []      = zero
  | foldl p zero (x::xs) = foldl p (p (x, zero)) xs


val res12 = foldr (op +)  0 [1,2,3,4] 
val res13 = foldl (op * ) 1 [1,2,3,4] 

(* Note 'op' to use infix operator as a value *)

Folds, without redundant parentheses

fun foldr p zero []      = zero
  | foldr p zero (x::xs) = p (x,  foldr p zero xs )

fun foldl p zero []      = zero
  | foldl p zero (x::xs) = foldl p (p (x, zero)) xs


val res12 = foldr (op +)  0 [1,2,3,4] 
val res13 = foldl (op * ) 1 [1,2,3,4] 

(* Note 'op' to use infix operator as a value *)

14 Oct 2020: Programming with constructed data and types

There are PDF slides for 10/15/2020.

Handout: Learning ML

be two text answer boxes, one for the C/scheme answer and one for the ML answer.

Reflection: Defining new kinds of data

Suppose you were going to program a system to play card games in which suits (CLUBS, DIAMONDS, HEARTS, and SPADES) matter, such as poker, solitaire, bridge, or gin rummy. Briefly describe how you would represent the suits in C and in μScheme.

Now consider ML’s datatype mechanism that we used to define the itree type. Briefly describe how you could use ML’s datatype mechanism to represent suits.

Programming with Types

Pain points from Scheme

Unsolved:

Clunky records (no dot notation)

Solved:

Land of Infinite Parentheses
Algebraic laws are just comments
car or cdr of empty list
car or cdr of non-list
Too many ways to use cons
Return value of wrong type
Wrong types/number of arguments

Programming with Types

Three tasks:

Define a type
Create value (“introduction”)
Observe a value (“elimination”)

For functions,

All you can do with a function is apply it

For constructed data,

“How were you made & from what parts?”

Foundation: Data

Syntax is always the presenting complaint, but data is what’s always important

“Language support for forms of data”

Base types: int, real, bool, char, string
Functions: A -> B
Constructed data:
- Tuples: pairs A*B, triples A*B*C, etc.
- (Records with named fields)
- Lists and other algebraic types
Deconstruct using pattern matching

“Distinguish one cons cell (or one record) from another”

Lightweight constructed data: Tuples

Types defined wtih *: A₁ * A₂ * … * A_n
Expressions to create: (e₁, e₂, …, e_n)
Patterns to observe: (p₁, p₂, …, p_n)

Example (splitList from merge sort):

    let val (left, right) = splitList xs
        in  merge (sort left, sort right)
        end

Enumerated types

Datatypes can define an enumerated type and associated values.

datatype suit = HEARTS | DIAMONDS | SPADES | CLUBS

Here suit is the name of a new type.

The value constructors HEARTS, DIAMONDS, SPADES, and CLUBS are the values of type suit.

Value constructors are separated by vertical bars.

Pattern matching

Datatypes are deconstructed using pattern matching.

fun toString HEARTS   = "hearts"
  | toString DIAMONDS = "diamonds"
  | toString SPADES   = "spades"
  | toString CLUBS    = "clubs"

val suitName = toString HEARTS

Check your understanding: Basic Datatypes

Consider the following partially written isRed function that is supposed to determine if a given suit is one of the two red suits (ie, is either a heart or a diamond).

fun isRed  HEART  = BLANK1
  | BLANK2 BLANK3 = true
  | isRed  _      = false

But wait, there’s more: Value constructors can take arguments!

datatype int_tree = 
    LEAF | NODE of int * int_tree * int_tree

int_tree is the name of a new type.

There are two data constructors: LEAF and NODE.

NODEs take a tuple of three arguments: a value at the node, and left and right subtrees.

The keyword of separates the name of the data constructor and the type of its argument.

When fully applied, data constructors have the type of the defining datatype (ie, int_tree).

It’s recursive!

Building values with constructors

We build values of type int_tree using the associated constructors:

 val tempty = LEAF
 val t1 = NODE (1, tempty, tempty)
 val t2 = NODE (2, t1, t1)
 val t3 = NODE (3, t2, t2)

What is the in-order traversal of t3?

 [1,2,1,3,1,2,1]

What is the pre-order traversal of t3?

 [3,2,1,1,2,1,1]

Deconstruct values with pattern matching

(The @ symbol denotes append in ML)

fun inOrder LEAF = []
  | inOrder (NODE (v, left, right)) = 
       inOrder left @ [v] @ inOrder right

val il3 = inOrder t3



fun preOrder LEAF = []
  | preOrder (NODE (v, left, right)) = 
       v :: preOrder left @ preOrder right

val pl3 = preOrder t3

Reflection: How general is the code?

Consider the int_tree datatype and the code we’ve written to manipulate it:

   datatype int_tree = 
      LEAF | NODE of int * int_tree * int_tree

   fun inOrder LEAF = []
      | inOrder (NODE (v, left, right)) = 
           inOrder left @ [v] @ inOrder right

    fun preOrder LEAF = []
      | preOrder (NODE (v, left, right)) = 
           v :: preOrder left @ preOrder right

Discuss the extent to which this code requires the value stored at each node to be an integer. Type errors aside, would the code still work if we stored a different kind of value in the tree?

But wait, there’s still more: Polymorphic datatypes!

int_tree is monomorphic because it has a single type.

Note though that the inOrder and preOrder functions only cared about the structure of the tree, not the payload value at each node.

Polymorphic datatypes are written using type variables that can be instantiated with any type.

datatype 'a tree = 
   CHILD | PARENT of 'a * 'a tree * 'a tree

tree is a type constructor (written in post-fix notation), which means it produces a type when applied to a type argument.

Examples:

int tree is a tree of integers
bool tree is a tree of booleans
char tree is a tree of characters
int list tree is a tree of a list of integers.

'a is a type variable: it can represent any type.

It is introduced on the left-hand of the = sign. References on the right-hand side are types.

CHILD and PARENT are value constructors.

CHILD takes no arguments, and so has type 'a tree

When given a value of type 'a and two 'a trees, PARENT produces a 'a tree

Constructors build `tree` values

val empty = CHILD
val tint1 = PARENT (1, empty, empty)
val tint2 = PARENT (2, tint1, tint1)
val tint3 = PARENT (3, tint2, tint2)

val tstr1 = PARENT ("a", empty, empty)
val tstr2 = PARENT ("b", tstr1, tstr1)
val tstr3 = PARENT ("c", tstr2, tstr2)

Pattern matching deconstructs `tree` values

fun inOrder CHILD = []
  | inOrder (PARENT (v, left, right)) = 
       (inOrder left) @ [v] @ (inOrder right)

fun preOrder CHILD = []
  | preOrder (Parent (v, left, right)) = 
       v :: (preOrder left) @ (preOrder right)

Functions inOrder and preOrder are polymorphic: they work on any value of type 'a tree. 'a is a type variable and can be replaced with any type.

Environments

Datatype definitions introduce names into:

the type environment: suit, int_tree, tree
the value environment: HEART, LEAF, PARENT

Inductive

Datatype definitions inherently inductive:

the type int_tree appears in its own definition
the type tree appears in its own definition

Check your understanding: Defining Datatypes

We can define a subset of the S-expressions in Scheme using the following specification:

An SX1 is either an ATOM or a list of SX1s, where ATOM is represented by the ML type atom.

Complete the encoding of this definition by filling in the blanks in the following ML datatype:

datatype sx1 = ATOM of BLANK1
             | LIST of BLANK2 list

Another way of defining a subset of the S-expressions in Scheme uses a different specification:

An SX2 is either an ATOM or the result of consing together two values v1 and v2, where both v1 and v2 are themselves members of the set SX2.

Complete the encoding of this definition by filling in the blanks in the following ML datatype:

datatype sx2 = ATOM of BLANK1
             | PAIR of BLANK2 BLANK3 BLANK4

Coding interlude: Defining a datatype

Designing a datatype is a three step process:

For each form, choose a value constructor
Identify the “parts” type that each constructor is of
Write the datatype definition

Another definition of a Scheme S-expression is that it is one of:

A symbol (string)
A number (int)
A Boolean (bool)
A list of S-expressions

Define an ML datatype sx that encodes this version of an S-expression.

Making types work for you

Type-directed programming

Common idea in functional programming: lifting

val lift : forall 'a . ('a -> bool) -> ('a list -> bool)

The type of a function tells us a lot about how to write it—the type often narrows the code down to a very few sensible choices. This technique works especially well with polymorphic functions.

Here’s the problem we’re going to solve: we have a predicate on single elements, and we want a predicate on lists of elements:

val lift : forall 'a . ('a -> bool) -> ('a list -> bool)

And here is the main tool we are going to use: for every type, we are going to be aware of how to make a value of the type (its introduction form) and what we can do with a value of the type (its elimination form).

Let’s review for the types we have:

Arrow types: A function of any type can always be create with an explicit lambda (fn). And the only thing we can do with a function is apply it.
Boolean: can make Booleans with true and false. We can use them in if expressions and we can pattern match on them.
List types: Any value of list type is always made with ::.
And to take one apart, we use pattern matching (in case or fun).
The unknown type 'a: We have to be prepared for any 'a at all, so there’s no syntax that goes with it. Instead
- If we want an expression of unknown type 'a, our choices are that we have a variable of 'a (maybe a parameter), or maybe we can apply a function that returns a value of 'a.
- All we can do with a value of unknown type are the things we can do with any type: pass it to functions or store it in data structures.

Table:

     Type                 Intro                  Elim
     
      ... -> ...           fn                     application
      
      bool                true, false             if, pattern matching (case/fun)
   
      ... list            [], ::                  pattern matching (case/fun)


      'a                    ---                    ---

Let’s get started: our goal is to make a function of type

val lift : forall 'a . ('a -> bool) -> ('a list -> bool)

We can always use fun

fun lift p = _goal1

Q: What is the type of p? How do we know?

Q: What is the type of _goal1? How do we know?

Answers:

val lift : forall 'a . ('a -> bool) -> ('a list -> bool)

fun lift (p : 'a -> bool) = (_goal1 : 'a list -> bool)

Our new goal is a term of type 'a list -> bool. Which of these forms of expression is always safe and makes progress toward a goal of this type?

A recursive call
A lambda expression of the form (fn ... => ...)
A let expression

Our goal is an arrow type, and it is always safe to make a value of arrow type using lambda:

_goal1 = (fn xs => _goal2)

val lift : forall 'a . ('a -> bool) -> ('a list -> bool)

fun lift (p : 'a -> bool) = (fn xs => _goal2)

Q: What is the type of xs? How do we know?

Q: What is the type of _goal2? How do we know?

Answers:

_goal1 = (fn xs => _goal2)

val lift : forall 'a . ('a -> bool) -> ('a list -> bool)

fun lift (p : 'a -> bool) = (fn (xs : 'a list) => (_goal2 : bool))

Things we have

lift : ('a -> bool) -> ('a list -> bool)   [it could call itself recursively]
p : 'a -> bool
xs : 'a list

Now we have a goal of type bool and we want to use the parameter xs of type 'a list. What is the safest tactic to make progress using xs?

Break it down by cases, that is, ask whether it is nil or cons
Pass it to foldl or foldr
Pass it to lift p

Now we have a new goal of type bool. And we know two ways to make a value of type bool: we can return true or false. But if we choose true or false, we will have defined a function that is not very interesting, and we will have completely ignored all the parameters to lift. That is probably not a great design choice.

Is there some other way to make a bool? Some way to use what we have?

Our p has type 'a -> bool, and that’s an arrow type, so the only thing we can do with it is apply it. But we don’t have a value of 'a to apply it to. Let’s set it aside.
Our xs has type 'a list. This doesn’t solve our immediate goal of a bool, but since it’s a list, we can try to take it apart. We know xs is either nil or cons. So let’s fill the hold [goal2] with a case expression.

Next take:

val lift : forall 'a . ('a -> bool) -> ('a list -> bool)

fun lift (p : 'a -> bool) =
  fn (xs : 'a list) =>
     case xs
       of [ ] => _goal3 : bool
        | y :: ys => _goal4 : bool

Now instead of one hole we have two holes, each of type bool. But we have used xs, so that’s progress. What’s next? Let’s look at the first hole.

This first hole still has type bool. And we now know that xs is empty, so we have used all the information available in xs, but we still haven’t used p. Which of the following tactics is viable for fulfilling our goal of type bool? Mark all that apply.

Look for a value of type 'a to apply p to
Give up on p and fill the hole with true
Give up on p and fill the hole with false

The hole _goal3 has a type bool. We know everything available to be known about xs (namely, it’s empty). All we have left is p.

And p has type 'a -> bool, with no prospect of 'a to apply it to. But we have to fill the hole.

Our only choices are true and false. Which would you like to pick?

Let’s flip a coin. My coin comes up true. So now we have

val lift : forall 'a . ('a -> bool) -> ('a list -> bool)

fun lift (p : 'a -> bool) =
  fn (xs : 'a list) =>
     case xs
       of [ ] => true
        | y :: ys => _goal4 : bool

Our final goal has type bool, but now we have more Booleans than we know what to do with: both p y and lift p ys. What answer best describes what we should do?

Ignore p y and fill the hole with lift p ys
Ignore lift p ys and fill the hole with p y
Combine the two Booleans using andalso
Combine the two Booleans using orelse
Two Booleans can be combined in 16 different ways, and they are all equally good.

We have to fill that last hole. And we have new variables y and ys. What are their types? How do we know?

y : 'a
ys : 'a list

Add them to everything we have and it’s now

lift : ('a -> bool) -> ('a list -> bool)
p : 'a -> bool
y : 'a
ys : 'a list

How can we use y and ys?

We can apply p y and that’s an expression of bool.
But we’d also like to use ys.

Q: Can we use ys?

A: Yes! We can all lift p recursively. We know it terminates because the list argument keeps getting smaller.

So now these are the things we have:

lift : ('a -> bool) -> ('a list -> bool)
p : 'a -> bool
y : 'a
ys : 'a list

p y : bool
lift p ys : bool

We have just one hole to fill, but we have two terms of type bool. If we want to use all our information, both y and ys, we’d better combine them.

Q: What are common ways to combine two Booleans?

A: Two good ways are andalso and orelse.

Q: Which one do you like here?

A: Let’s think about both choices:

Q: Suppose we pick orelse here. How does lift behave?

A: It always returns true
Q: Suppose we pick andalso here. How does lift behave?

A: Hey! It behaves just like the all function! I know that function!

Q: Nice. And here it is:

val lift : forall 'a . ('a -> bool) -> ('a list -> bool)

fun lift (p : 'a -> bool) =
  fn (xs : 'a list) =>
     case xs
       of [ ] => true
        | y :: ys => p y andalso lift p ys

Or if we pretty it up a little,

val lift : forall 'a . ('a -> bool) -> ('a list -> bool)

fun lift p xs = 
     case xs
       of [ ] => true
        | y :: ys => p y andalso lift p ys

or even

fun lift p []        = true
  | lift p (y :: ys) = p y andalso lift p ys

Making types work for you: The `false` case

Q: Suppose back at the empty-list step, the coin had come down differently and we had used true there. What would we want to do then?

A: I know this one! We combine two Booleans with orelse instead of andalso! That’s the exists function!

Q: Indeed:

fun lift' p []        = false
  | lift' p (y :: ys) = p y orelse lift p ys

Potential bonus content: inorder traversal of binary tree.

val find : forall 'a . ('a -> bool) -> ('a list -> 'a option)

Bonus content

The rest of this slide deck is “bonus content”

New vocabulary for ML

Data:

Constructed data
Value constructor

Code:

Pattern
Pattern matching
Clausal definition
Clause

Types:

Type variable (’a)

Structure of algebraic types

An algebraic data type is a collection of alternatives

Each alternative must have a name

The thing named is the value constructor

(Also called “datatype constructor”)

Additional language support for algebraic types: case expressions

“Eliminate” values of algebraic types

New language construct case (an expression)

fun length xs =
  case xs
    of []      => 0
     | (x::xs) => 1 + length xs

Clausal definition is preferred
(sugar for val rec, fn, case)

works for any datatype

 fun toStr t = 
     case t 
       of EHEAP => "empty heap"
        | HEAP (v, left, right) =>
                   "nonempty heap"

But often a clausal definition is better style:

 fun toStr' EHEAP = "empty heap"
   | toStr' (HEAP (v,left,right)) =
                    "nonempty heap"

Bonus content: Exceptions — Handling unusual circumstances

Syntax:

Definition: exception EmptyQueue
Introduction: raise e where e : exn
Elimination: e1 handle pat => e2

Informal Semantics:

alternative to normal termination
can happen to any expression
tied to function call
- if evaluation of body raises exn, call raises exn
Handler uses pattern matching

e handle pat1 => e1 | pat2 => e2

Exception handling in action

loop (evaldef (reader (), rho, echo))
handle EOF            => finish ()
  | Div               => continue "Division by zero"
  | Overflow          => continue "Arith overflow"
  | RuntimeError msg  => continue ("error: " ^ msg)
  | IO.Io {name, ...} => continue ("I/O error: " ^
                                   name)
  | SyntaxError msg   => continue ("error: " ^ msg)
  | NotFound n        => continue (n ^ "not found")

Bonus Content: A different explanation of ML Datatypes

Tidbits:

The most important idea in ML!
Originated with Hope (Burstall, MacQueen, Sannella), in the same lab as ML, at the same time (Edinburgh!)

Notes:

A “suit” is constructed using HEARTS, DIAMONDS, CLUBS, or SPADES
A “list of A” is constructed using nil or a :: as, where a is an A and as is a “list of A”
A “heap of A” is either empty or it’s an A and two child heaps

Datatype definitions

datatype  suit    = HEARTS | DIAMONDS | CLUBS | SPADES

datatype 'a list  = nil           (* copy me NOT! *)
                  | op :: of 'a * 'a list

datatype 'a heap  = EHEAP
                  | HEAP of 'a * 'a heap * 'a heap

type suit          val HEARTS : suit, ...
type 'a list       val nil   : forall 'a . 'a list
                   val op :: : forall 'a . 
                               'a * 'a list -> 'a list
type 'a heap
val EHEAP: forall 'a.                          'a heap
val HEAP : forall 'a.'a * 'a heap * 'a heap -> 'a heap

Exegesis (on board):

Notation 'a is a type variable
- On left-hand side, it is a formal type parameter
- On right-hand side it is an ordinary type
- In both cases it represents a single unknown type
Name before = introduces a new type constructor into the type environment. Type constructors may be nullary.
Alternatives separated by bars are value constructors of the type
- They are new and hide previous names
- (Do not hide built-in names nil and list from the initial basis!)
- Value constructors build constructed data
- Value constructors participate in pattern matching
- Complete by themselves: HEARTS, SPADES, nil
- Expect parameters to make a value or pattern: ::, HEAP
- The value a constructor is applied to is said to be carried
Every val on this slide is a value constructor
op enables an infix operator to appear in a nonfix context
Type application is postfix
- A list of integer lists is written: int list list
New names into two environments:
- suit, list, heap stand for new type constructors
- HEARTS, CLUBS, nil, ::, EHEAP, HEAP stand for new value constructors
Algebraic datatypes are inherently inductive (list appears in its own definition)—to you, that means finite trees
'a * 'a list is a pair type — infix operators are always applied to pairs

Bonus Content: ML traps and pitfalls

Order of clauses matters


fun take n (x::xs) = x :: take (n-1) xs
  | take 0 xs      = []
  | take n []      = []

(* what goes wrong? *)

Gotcha — overloading

- fun plus x y = x + y;
> val plus = fn : int -> int -> int
- fun plus x y = x + y : real;
> val plus = fn : real -> real -> real

Gotcha — parentheses

Put parentheses around anything with |

case, handle, fn

Function application has higher precedence than any infix operator

Bonus content (seen in examples)

Syntactic sugar for lists

- 1 :: 2 :: 3 :: 4 :: nil; (* :: associates to the right *)
> val it = [1, 2, 3, 4] : int list

- "the" :: "ML" :: "follies" :: [];
> val it = ["the", "ML", "follies"] : string list

> concat it;
val it = "theMLfollies" : string

Bonus content: ML from 10,000 feet

Environments

The value environment

Names bound to immutable values

Immutable ref and array values point to mutable locations

ML has no binding-changing assignment

Definitions add new bindings (hide old ones):

val pattern = exp
val rec pattern = exp
fun ident patterns = exp
datatype … = …

Nesting environments

At top level, definitions

Definitions contain expressions:

def ::= val pattern = exp

Expressions contain definitions:

exp ::= let defs in exp end

Sequence of defs has let-star semantics

Patterns

What is a pattern?

pattern ::= variable
          | wildcard
          | value-constructor [pattern]
          | tuple-pattern
          | record-pattern
          | integer-literal
          | list-pattern

Design bug: no lexical distinction between

VALUE CONSTRUCTORS
variables

Workaround: programming convention

Functions

Function pecularities: 1 argument

Each function takes 1 argument, returns 1 result

For “multiple arguments,” use tuples!

 fun factorial n =
   let fun f (i, prod) = 
         if i > n then prod else f (i+1, i*prod)
   in  f (1, 1)
   end


 fun factorial n =  (* you can also Curry *)
   let fun f i prod = 
         if i > n then prod else f (i+1) (i*prod)
   in  f 1 1
   end

Tuples are “usual and customary.”

Types

19 Oct 2020: Types

There are PDF slides for 10/20/2020.

Retrospective: Evaluating ML

You are now in a position to start to evaluate ML.

Please enter at least one strength of ML.

Please enter at least one weakness of ML.

Your answers may be anonymously shared with your classmates.

Reflection: Catching errors with types

We have seen that one of the key differences between μScheme and ML is that ML has a static type system. One of the key roles of a type system is to detect errors. List at least two kinds of errors that type systems can detect at compile time (Hint: think about the kinds of type errors the ML compiler complains about).

Type systems aren’t omniscient, though. Now list at least one kind of error that ML’s type system can’t find and do your best to explain why it can’t.

Type systems

What they do:

Guide coding
Document code (checked by compiler!)
Rule out certain errors
Provide information to compiler

How they work

Constrain what values can be at run time

World’s most widely deployed static analysis

Where type systems are found:

C#, Swift, Go
Java, Scala, Rust (polymorphism)
ML, Haskell, TypeScript, Hack (inference)

Our trajectory:

Formalize familiar, monomorphic type systems (like C)
Learn polymorphic type systems
Eventually, infer polymorphic types

Monomorphic types systems are the past.
Inference and polymorphism are the present and future.
(Visible trend just like lambda.)

This “lecture”:

Formal type system with only two types:
- words and flags
Type checker derived from inference rules

Next time:

Unbounded number of types! (Formation, introduction, elimination)

Type systems

What kind of thing is it?

``Types classify terms’’

“Term” is theory word for “syntax”:

n + 1  : int

"hello" ^ "world"  : string

(fn n => n * (n - 1))  : int -> int

if p then 1 else 0 : int, provided  p : bool

Questions type systems can answer:

What kind of value does it evaluate to (if it terminates)?
What is the contract of the function?
Is each function called with the right number of arguments?
(And similar errors)
Who has the rights to look at it/change it?
Is the number miles or millimeters?

Questions type systems typically do not answer:

Can it divide by zero?
Can it access an array out of bounds?
Can it take the car of '()?
Will it terminate?

Decidability and Type Checking

Suppose L is a Turing-Complete Language.

P is the set of programs in L that terminate.

Wish: a type system to statically classify terminating programs:

Expression e in L has type T (e : T) iff e terminates.

But: Undecideable!

We can prove no such type system exists. (See Comp170 for the Halting Problem)

Choices:

Allow type checker to run forever.
Don’t use type system to track termination.

Static vs. Dynamic Type Checking

Most languages use a combination of static and dynamic checks

Static: “for all inputs”

input independent
efficient at run-time
approximate : rules out some programs that won’t trigger errors
example: (if false then 2 else "hi") ^ "there"

Dynamic: “for some inputs”

depends on input
run-time overhead
precise

Check your understanding: Static vs. Dynamic Type Systems

Type systems can statically detect all errors.

True
False

Static type systems generally use a combiniation of static and dynamic checks.

True
False

Dynamic type systems impose runtime overhead.

True
False

Type System and Checker for a Simple Language

Why do we write a type checker?

Canonical example of static analysis. Used pervasively in industry in code improvement and security analysis.
Great practice connecting math to code. To be educated about programming languages, you must be able to realize inference rules in code. Eventually you should learn to “speak” inference rules like a native. Implementing a type system is a valuable way to build these competencies.
Tool for structuring language design, which applies to input validation. If (when!) you get to do your own language designs, type systems are an area where you are most likely to be able to innovate. The ideas behind type systems apply any time you need to validate user input, for example. This is the highest level of cognitive task: creation of new things.

Three questions any type system must answer

What types do I have?
Given a type, how do I create a value
What can I do with a value (of that type)?

Simple Type System Example

Simple language of machine-level expressions

What types do I have?

word predicts a machine word
(in a general-purpose register)
flag predicts a single bit
(in a flags register)

Type-system example (machine model)

How do I create a word?

Arithmetic

What can I do with a word?

Arithmetic, comparison

How do I create a flag?

Comparison

What can I do with a flag?

Drive conditional (IF)

Language of expressions to type

Words and flags:

datatype exp = ARITH of arithop * exp * exp
             | CMP   of relop   * exp * exp
             | LIT   of int
             | IF    of exp     * exp * exp
and      arithop = PLUS | MINUS | TIMES | ...
and      relop   = EQ | NE | LT | LE | GT | GE

datatype ty = WORDTY | FLAGTY

(Looks a lot like int and bool)

Examples to rule out

Can’t add a word and a flag:

3 + (3 < 99)

(ARITH(PLUS, LIT 3, CMP (LT, LIT 3, LIT 99)))

Can’t compare a word and a flag

(3 < (4 = 24))

CMP (LT, LIT 3, CMP(EQ (LIT 4, LIT 24)))

Check your understanding: AST Representation of Simple Language

Recall the AST for the simple machine language:

datatype exp = ARITH of arithop * exp * exp
             | CMP   of relop   * exp * exp
             | LIT   of int
             | IF    of exp     * exp * exp
and      arithop = PLUS | MINUS | TIMES | ...
and      relop   = EQ | NE | LT | LE | GT | GE

datatype ty = WORDTY | FLAGTY

The AST term

CMP(GT, (CMP(EQ, LIT 105, LIT 170)), LIT 160)

which encodes the expression

((105 = 170) > 160)

should be flagged as a type error.

True
False

Inference rules to define a type system

Form of judgment Context |- term : type

Written |- e : tau

Contexts vary between type systems

(Right now, the empty context)
Judgment is proved by derivation
Derivation made using inference rules
Inference rules determine how to code val typeof : exp -> ty:

Given e, find tau such that |- e : tau

Rule for arithmetic operators

Informal example:

|- 3 : word    |- 5 : word 
-------------------------
|- 3 + 5 : word

Rules out:

|- 'a' : char    |- 5 : word
---------------------------
|- 'a' + 5 : ???

General form:

|- e1 : word    |- e2 : word
-----------------------------
|- ARITH ( _ , e1, e2) : word

Rule for comparisons

Informal example:

|- 7 : word    |- 10 : word
-----------------------------
|- 7 < 10 : flag

General form:

|- e1 : word    |- e2 : word
-----------------------------
|- CMP ( _ , e1, e2) : flag

Rule for literals

Informal example:

|- 14 : word

General form:

--------------------
|- LIT (n) : word

Rule for conditionals:

Informal example:

|- true : flag    
|- 3    : word
|- 42   : word      
--------------------------
|- IF (true, 3, 42) : word

General form:

|- e : flag    
|- e1 : tau1   
|- e2 : tau2      tau1 equiv tau2
-----------------------------------
|- IF ( e, e1, e2) : tau1

Typing rules let us read off what a type checker needs to do.

input to checker: e
output from checker: tau

Type checker in ML

val typeof : exp -> ty
exception IllTyped
fun typeof (ARITH (_, e1, e2)) = 
      case (typeof e1, typeof e2) 
        of (WORDTY, WORDTY) => WORDTY
         | _              => raise IllTyped
  | typeof (CMP (_, e1, e2)) = 
      case (typeof e1, typeof e2) 
        of (WORDTY, WORDTY) => FLAGTY
         | _              => raise IllTyped
  | typeof (LIT _) = WORDTY
  | typeof (IF (e,e1,e2)) = 
      case (typeof e, typeof e1, typeof e2) 
        of (FLAGTY, tau1, tau2) => 
           if eqType (tau1, tau2) 
           then tau1 else raise IllTyped
         | _                    => raise IllTyped

Check your understanding: A type checker from inference rules

Recall the AST for the simple machine language:

datatype exp = ARITH of arithop * exp * exp
             | CMP   of relop   * exp * exp
             | LIT   of int
             | IF    of exp     * exp * exp
and      arithop = PLUS | MINUS | TIMES | ...
and      relop   = EQ | NE | LT | LE | GT | GE

datatype ty = WORDTY | FLAGTY

and the inference rule for comparisons:

    |- e1 : word    |- e2 : word
    -----------------------------
    |- CMP ( _ , e1, e2) : flag

and our in-progress type checking code typeof

    val typeof : exp -> ty
    exception IllTyped
    fun typeof (ARITH (_, e1, e2)) = 
          case (typeof e1, typeof e2) 
            of (WORDTY, WORDTY) => WORDTY
             | _              => raise IllTyped
      | typeof (CMP (_, e1, e2)) = 
          case (BLANK1, BLANK2) 
            of (BLANK3, BLANK4) => BLANK5
             | _              => raise IllTyped
      | typeof (LIT _) = WORDTY
      | typeof (IF (e,e1,e2)) = 
          case (typeof e, typeof e1, typeof e2) 
            of (FLAGTY, tau1, tau2) => 
               if eqType (tau1, tau2) 
               then tau1 else raise IllTyped
             | _                    => raise IllTyped

Fill in the blanks based on the comparison inference rule to complete the code for the CMP case.

An implementor’s trick: If you see identical types in a rule,

Give each type a distinct subscript
Introduce equality constraints
Remember to be careful using primitive equality to check types—you are better off with eqType.

Typing Rules: Contexts and Term Variables

Let’s add variables!

    datatype exp = ARITH of arithop * exp * exp
                 | CMP   of relop   * exp * exp
                 | LIT   of int
                 | IF    of exp     * exp * exp
                 | VAR   of name
                 | LET   of name    * exp * exp
    and      arithop = PLUS | MINUS | TIMES | ...
    and      relop   = EQ | NE | LT | LE | GT | GE

    datatype ty = WORDTY | FLAGTY

New forms to support variables

    datatype exp = ...
                 | VAR   of name
                 | LET   of name    * exp * exp

    datatype ty = WORDTY | FLAGTY

What could go wrong with a variable?

Used inconsistently:

;; x can’t be both an integer and a list

x + x @ x

;; y can’t be both an integer and a string

let y = 10 in y ^ “hello” end

So we need to track variable use to ensure consistency

Key idea: Type environment (Gamma) tracks the types of variables.

Rule for var

x in dom Gamma        tau = Gamma(x) 
----------------------------------------
Gamma |- VAR x : tau

So as long as x is recorded in Gamma as a WORDTY, it can only be used as a WORDTY.

Rule for let

Gamma          |- e1 : tau1
Gamma{x->tau1} |- e2 : tau2   
-------------------------------------
Gamma |- LET x = e1 in e2 : tau2

Type Checker

Type checker needs Gamma – gives type of each “term variable”.

Type checking for variables

val typeof : exp * ty env -> ty
fun typeof (ARITH ..., Gamma ) = <as before>
  | typeof (VAR x, Gamma)      =
      (case maybeFind (x, Gamma)
         of SOME tau => tau
          | NONE     => raise IllTyped)
  | typeof (LET (x, e1, e2), Gamma) = 
      <your turn>

Check your understanding: Adding variables

Given the typing rule for the let form we added to the simple machine language:

    Gamma          |- e1 : tau1
    Gamma{x->tau1} |- e2 : tau2   
    -------------------------------------
    Gamma |- LET x = e1 in e2 : tau2

Fill in the missing blanks to complete the type checker we started in the video:

    val typeof : exp * ty env -> ty
    fun typeof (ARITH ..., Gamma ) = <as before>
      | typeof (VAR x, Gamma)      =
          (case maybeFind (x, Gamma)
             of SOME tau => tau
              | NONE     => raise IllTyped)
      | typeof (LET (x, e1, e2), Gamma) = 
          let tau1 = typeof (BLANK1, BLANK2)
          in  BLANK3 (BLANK4, extend Gamma x BLANK5)
          end

Taking Stock

I gave you syntax for simple language.
We came up with formal typing rules.
We used the typing rules to implement the type checker.
On your homework,
- You will design new syntax and typing rules for lists
- You will extend an existing type checker
- You will implement a full type checker from scratch

This is a big chunk of what language designers do.

Bonus content

What is a type?

Weak: a set of values
Better: a conservative prediction about values
Best: the precise definition: classifier for terms!!
- The relationship to values becomes a proof obligation.
- Note: a computation can have a type even if it doesn’t terminate! (Or doesn’t produce a value)

Source of new language ideas for next 20 years

Needed if you want to understand advanced designs (or create your own)

Type soundness

Key theorem: prediction is accurate

Will state more precisely next week
Best explanation for how/why type system works
Proof beyond the scope of 105

What types can rule out

things that could go wrong:

(8 < 10) + 4

(8 == 8) < 9

x + (x :: xs)

let val y = 10 in length y end

21 October 2020: Type checking with type constructors

There are PDF slides for 10/22/2020.

should have two separate answer boxes: one for what they have learned and one for what they have questions about.

Reflection: Simple type checking

In the first part of this module on type checking, we talked about the purposes and limitations of type systems, we formalized a simple type system with just two types using inference rules, and we converted those inference rules into a type checker.

Describe two specific things you have learned so far in this module.

Write one question you have about what we’ve learned so far.

Your responses may be shared anonymously with your classmates.

Reflection: How many types?

In the previous “lecture”, we studied a type system with just two types: words and flags. In this “lecture”, we’re going to add type constructors such as function types and lists. When we ``add lists" to a language, how many types are we adding?

Where we’ve been

New watershed in the homework

You’ve been developing and polishing programming skills: recursion, higher-order functions, using types to your advantage. But the problems have been mostly simple problems around simple data structures, mostly lists.
You’ve already seen everything you need to know to implement a basic type checker, and you are almost fully equipped to add array operations and types to Typed Impcore.

Where we’re going

We’re now going to shift and spend the next several weeks doing real programming-languages stuff, starting with type systems.
What’s next is much more sophisticated type systems, with an infinite number of types.

Focus on two questions about type systems:

What is and is not a good type, that is, a classifier for terms?
How shall we represent types?

We’ll look at these questions in two contexts: monomorphic (this “lecture”) and polymorphic (next “lecture”) languages.

Monomorphic vs Polymorphic Types

Monomorphic types have “one shape.”

Examples: int, bool, int -> bool, int * int

Polymorphic types have “many shapes.”

Examples:

(forall ('a) (list 'a))
(forall ('a) ((list 'a) -> int))
(forall ('a) ((list 'a) -> (list 'a)))

Substitute any type you like for 'a.

ML Examples: 'a list, 'a list -> int, 'a list -> 'a list

In ML, forall is only at the top level, and it isn’t written explicitly

Designing monomorphic type systems

Mechanisms:

Every new variety of type requires special syntax (examples: structs, pointers, arrays)
Implementation is a straightforward application of what you already know.

Language designer’s process:

What new types do I have?
- Formation rules
How do I create new values with that type?
- Introduction rules
For introduce think “produce”, “create”, “make”, “define”
What new syntax do I have to observe terms of a type?
- Elimination rules
For eliminate think “consume”, “examine”, “interrogate”, “look inside”, or “take apart”, “observe”, “use”, “mutate”

Words “introduce” and “eliminate” are the native vocabulary of type-theoretic language design—it’s like knowing what to say when somebody sneezes.

Managing the set of types: Type formation

Examples: Not yet types

``Types in waiting’’ (yet)

Examples: Not types at all!

These are

Type-formation rules

Are about not trusting the source code

We need a way to classify type expressions into:

types that classify terms
type constructors that build types
nonsense that doesn’t mean anything

Type judgments for monomorphic system

Two judgments:

The familiar typing judgment Γ ⊢ e : τ
Now we add the judgment “τ is a type”

Check your understanding: Types, type constructors, or garbage

In each of the following, indicate whether the “type” is in fact a type (ie, something that classifies terms), a type constructor (ie, something that will classify terms when applied to some number of type arguments), or utter nonsense.

bool list

type
type constructor
utter nonsense

array

type
type constructor
utter nonsense

bool bool

type
type constructor
utter nonsense

->

type
type constructor
utter nonsense

These rules are actually the same as we saw for the simple arithmetic language, but we’ll review them again here.

Monomorphic type rules

Check your understanding: Checking type equalities

We saw how to modify the typing rule for Set to explicitly mark where two types were being compared for equality. The typing rule for If that we saw in the video has a similar implicit type equality check. Fill in the missing blanks in the typing rule below to create a version with an explicit equality check.

Classic types for data structures

At run time, these pairs are identical to cons, car, cdr.

These rules dictate type checker code just like in previous “lecture”.

Typical syntactic support for types

Explicit types on lambda and define:

For lambda, argument types:

(lambda ([n : int] [m : int]) (+ (* n n) (* m m)))

For define, argument and result types:

(define int max ([x : int] [y : int]) 
                (if (< x y) y x))

Abstract syntax:

datatype exp = ...
 | LAMBDA of (name * tyex) list * exp
    ...
datatype def = ...
 | DEFINE of name * tyex * ((name * tyex) list * exp)
    ...

Check your understanding: Writing Typing Rules

ML References are similar to C/C++ pointers. Given the following surface syntax (on the left) and corresponding AST represenations (on the right):

Surface Syntax	AST
`ref` τ	ref(τ)
`ref` `e`	ref-make(e)
`!e`	ref-get(e)
`e1 := e2`	ref-set(e1,e2)

fill in the blanks to correctly complete the following formation, introduction, and elimination rules.

=10pt

Bonus Content

Coding the arrow-introduction rule

Type-checking LAMBDA

datatype exp = LAMBDA of (name * tyex) list * exp 
   ...
fun ty (Gamma, LAMBDA (formals, body)) = 
  let val Gamma' = (* body gets new env *)
        foldl (fn ((x, ty), g) => bind (x, ty, g))
              Gamma formals
      val bodytype = ty (Gamma', body)
      val formaltypes = 
        map (fn (x, ty) => ty) formals
  in  FUNTY (formaltypes, bodytype)
  end

Approaching types as programmers

Understanding language design

Questions about types never seen before:

What types can I make?
What syntax goes with each form?
- To create values?
- To do things with values?
What functions?
What about user-defined types?

Examples: pointer, struct, function, record

Talking type theory

Formation: make new types

Introduction: make new values

Elimination: observe (“take apart”) existing values

Talking type theory: Introduction and elimination constructs

Part of learning any new field: talk to people in their native vocabulary

Introduce means “produce”, “create”, “make”, “define”
Eliminate means “consume”, “examine”, “observe”, “use”, “mutate”

It’s like knowing what to say when somebody sneezes.

Type soundness

26 Oct 2020: Polymorphic Type Checking

There are PDF slides for 10/27/2020.

Your type-system questions are answered.

Today

Burdens of monomorphism
- On designers and implementors
- On programmers
Lift burden with quantified types
Extensible type formation

Reflection: Type checking type constructors

In the previous two “lectures” we studied how to type check a known set of types. In each case, the syntax for the type was hard-coded into the AST for the language. Type checking the creation and uses of values with the type were also hard-coded into the type checker. That approach works fine if the language designers know all the types in advance. But, in any language worth its salt, programmers can create new types, (for example, by defining algebraic datatypes in ML, structs in C, or classes in Java). The language has to be able to type check these new types without having to recompile the language implementation first.

Spend a few minutes thinking about how we might redesign the implementation of the type checker to accommodate user-defined types.

Describe your preliminary design in a few sentences.

Limitations of monomorphic type systems

With monomorphism, new types are costly

Closed world

Only language designer can add a new type constructor

A new type constructor (“array”) requires

New expression and type syntax
New internal representation (type formation)
New type rules (intro, elim)
New code in type checker
New or revised proof of soundness

Notes:

Implementing arrays on homework
Writing rules for lists on homework

Costly for programmers

Monomorphism leads to code duplication

User-defined functions are monomorphic:

(check-function-type swap
                     ([array bool] int int -> unit))
(define unit swap ([a : [array bool]]
                   [i : int]
                   [j : int])
  (begin
     (set tmp       (array-at a i))
     (array-put a i (array-at a j))
     (array-put a j tmp)
     (begin)))

Check your understanding: Monomorphism limitations

If a language has a monomorphic type system, it will not let programmers write a single length function for lists. Instead, programmers must write a length function for each kind of list they want to calculate the length of, eg, a length function for lists of booleans, and another for lists of integers, etc.

True
False

Quantified types

Idea: Use functions not syntax to represent new types

Benefits:

No new syntax
No new internal representation
No new type rules
No new code in type checker
No new proof of soundness
Programmers can add new types

Requires: more expressive function types

Better type for a swap function

(check-type
    swap
    (forall ('a) ([array 'a] int int -> unit)))

Check your understanding: Quantified Types

The type theory expression and the expression are two different representations for the same polymorphic type.

True
False

In the type (forall ('a) ([list 'a] -> 'a)), the notation 'a denotes a type variable that stands for the type of element in the list argument and the type of the result returned by the function.

True
False

The type (forall ('a) ([list 'a] -> [list 'a])) is the type of the car function.

True
False

Programming with quantified types

Substitute for quantified variables: “instantiate”

-> length
<function> : (forall ('a) ((list 'a) -> int))
-> [@ length int]
<function> : ((list int) -> int)
-> (length '(1 2 3))
type error: function is polymorphic; 
instantiate before applying
-> ([@ length int] '(1 2 3))
3 : int

Instantiate as you like

-> length
 : (forall ('a) ((list 'a) -> int))
-> [@ length bool]
 : ((list bool) -> int)
-> ([@ length bool] '(#t #f))
2 : int

More instantiations

-> (val length-int [@ length int])
length-int : ((list int) -> int)
-> (val cons-bool [@ cons bool])
cons-bool : ((bool (list bool)) -> (list bool))
-> (val cdr-sym [@ cdr sym])
cdr-sym : ((list sym) -> (list sym))
-> (val empty-int [@ '() int])
() : (list int)

Bonus instantiation:

-> map
<function> : 
  (forall ['a 'b]
    (('a -> 'b) (list 'a) -> (list 'b)))
-> [@ map int bool]
<function> : 
  ((int -> bool) (list int) -> (list bool))

Polymorphic array swap

(check-type swap
    (forall ('a) ([array 'a] int int -> unit)))

(val swap
 (type-lambda ('a)
  (lambda ([a : (array 'a)]
           [i : int]
           [j : int])
   (let ([tmp ([@ Array.at 'a] a i)])
    (begin
      ([@ Array.put 'a] a i ([@ Array.at 'a] a j))
      ([@ Array.put 'a] a j tmp))))))

Two forms of abstraction:

Power comes at high notational cost

Function composition

-> (val o (type-lambda ['a 'b 'c]         
    (lambda ([f : ('b -> 'c)] 
             [g : ('a -> 'b)])
     (lambda ([x : 'a]) (f (g x))))))    

o : (forall ('a 'b 'c) 
       (('b -> 'c) ('a -> 'b) -> ('a -> 'c)))

Aka o :

Coding Interlude: Instantiating Polymorphic Functions

Enter the length function into the interpreter to observe its type.

Apply the length function to the list [1,2,3]. Observe the response.

Create two instantiations of the length function: one for ints and one for bools.

Apply the length-int function to the list [1,2,3] and the length-bool function to the list [#t, #f] and observe the result.

What do you see as the advantages and disadvantages of ’s approach to polymorphism so far?

Polymorphic Type Checking

Representing quantified types

Two new alternatives for tyex:

datatype tyex
 = TYCON  of name                // int
 | CONAPP of tyex * tyex list    // (list bool)
 | FUNTY  of tyex list * tyex    // (int int -> bool)
 | TYVAR  of name                // 'a
 | FORALL of name list * tyex    // (forall ('a) ...)

Check your understanding: Representation of quantified types

Given the tyex datatype:

datatype tyex
 = TYCON  of name                // int
 | CONAPP of tyex * tyex list    // (list bool)
 | FUNTY  of tyex list * tyex    // (int int -> bool)
 | TYVAR  of name                // 'a
 | FORALL of name list * tyex    // (forall ('a) ...)

Match the surface syntax for polymorphic types with their representations using the tyex datatype.

Surface syntax:

bool
'a
(list int)
(bool bool -> int)
(forall ['b] ((list 'b) -> int))

Represenation:

FUNTY ([TYCON "bool", TYCON "bool"], TYCON "int")
TYVAR "a"
FORALL ("b", FUNTY ([CONAPP (TYCON "list", [TYVAR "b"])], TYCON "int"))
TYCON "bool"
CONAPP (TYCON "list", [TYCON "int"])

Type rules for polymorphism

Check your understanding: Rules for `type-lambda`

In the new judgement form for type checking type-lambda, the meta-variable Δ is a kind environment that records the type variables that have been used in typing a program.

True
False

The AST term tylambda(α₁,…,α_n, e) is the representation of the surface syntax (@ e τ₁ ... τ_n)

True
False

In the introduction rule for type-lambda, it is okay if the type variables to be quantified over already appear in the context Γ.

True
False

In the elimination rule for type-lambda, the resulting type is obtained by substituting the type arguments for the corresponding quantified type variables. If the type of the expression e is quantified over m type variables and e is applied to n type arguments, a type error should result if n ≠ m.

True
False

Type formation through kinds

What have we gained?

No more type-specific introduction rules:

Instead, use polymorphic functions

No more type-specific elimination rules:

Instead, use polymorphic functions

But, we still need formation rules

User types can’t be blindly trusted

-> (lambda ([a : array]) (Array.size a))
type error: used type constructor `array' as a type

-> (lambda ([x : (bool int)]) x)
type error: tried to apply type bool as type constructor

-> (@ car list)
type error: instantiated at type constructor `list', 
which is not a type

How can we know which types are OK?

Check your understanding: Use kinds to classify types

For each type expression, select its kind or indicate it is not well formed.

char

*
* => *
not well defined

list list

*
* => *
not well defined

array

*
* => *
not well defined

Opening the closed world

Designer’s burden reduced

To extend Typed Impcore:

New syntax
New type rules
New internal representation
New code
New soundness proof

To extend Typed μScheme, none of the above! Just

New functions
New primitive type constructor in Δ

You’ll do arrays both ways

Programmer’s power increased?

Typed Impcore:

Closed world (no new types)
Simple formation rules

Typed μScheme:

Semi-closed world (new type variables)
Types are formed via explicit abstraction and instantiation

Standard ML:

Open world (programmers create new types)
How are types formed (from other types)?

Bonus content: a definition manipulates three environments

28 October 2020: Introduction to type inference

There are PDF slides for 10/29/2020.

Retrospective: Type Checking

Write one thing you learned in the previous module about type checking and one question you have about that material.

Your responses may be anonymously shared with the rest of the class.

Reflection: How does the compiler know?

Consider the following ML code for the append function:

fun append (x::xs) ys = x :: append xs ys
  | append []      ys = ys

When you enter this code into the ML comiler, it returns the type of the function.

Describe how you think the ML compiler might figure out the type of the function.

Why study type inference?

Useful in its own right (as we’ll see shortly)
- Essential in modern languages with complex type systems
Canonical example of Static Analysis. Key for:
- Program optimization
- Software engineering
- Cybersecurity

What type inference accomplishes

-> (define     double (x)       (+ x x))
double                         ;; uScheme
-> (define int double ([x : int]) (+ x x))
double : (int -> int)          ;; Typed uSch.
-> (define     double (x)       (+ x x))
double : (int -> int)          ;; nML

What else type inference accomplishes

-> ([@ cons bool] #t ([@ cons bool] #f [@ '() bool]))
(#t #f) : (list bool)    ;; typed uScheme
-> (   cons       #t (   cons       #f    '()      ))
(#t #f) : (list bool)    ;; nML

How it works

For each unknown type, a fresh type variable
Every typing rule adds equality constraints
Instantiate every variable automatically
Introduce polymorphism at let/val bindings

Plan of study

First, we’ll walk through several examples, showing how to generate constraints
Then, you’ll do an example
Then, we focus on solving constraints
For the next homework, you’ll implement both algorithms

Let’s do some examples

N.B. Book is “constraints first;” lecture will be “type system first.” Use whatever way works for you

Example: `(val-rec double (lambda (x) (+ x x)))`

What do we know?

double has type ′a₁
x has type ′a₂
+ has type int * int -> int
(+ x x) is an application, what does it require?
- ′a2 ~ int and ′a2 ~ int
Is this possible?

Key idea: Record the constraint in a typing judgement.

'a2 ~ int /\ 'a2 ~ int, 
{ double : 'a1, x : 'a2 } |- (+ x x) : int

Example: `(if y 1 0)`

y has type ′a3, 1 has type int, 0 has type int
Requires what constraints? (int ~ int, ′a3 ~ bool)

Example: `(if z z (- 0 z))`

z has type ′a3, 0 has type int, - has type int * int -> int
Requires what constraints? (′a3 ~ bool /\ int ~ int /\ ′a3 ~ int)
These constraints are not solveable so the program is ill-typed.

Check your understanding: Gathering constraints

In this judgment

    'ax ~ int /\ 'ay ~ int, 
    {x : 'ax, y : 'ay } |- (+ x y) : int

the notation 'ay ~ int is a constraint that means that the type variable 'ay must be equal to type int.

True
False

In this judgment

    'ax ~ int /\ 'ay ~ int, 
    {x : 'ax, y : 'ay } |- (+ x y) : int

the notation y : 'ay means that expression variable y has type 'ay. Furthermore, because y has a type variable for its type, y’s type can become any syntactically valid type.

True
False

Which set of constraints might have been generated when running type inference on the following program fragment? Pick the most complete set that applies.

(if x (+ 1 y) z)

'ax ~ 'ay, 'ay ~ int, 'az ~ int
'ax ~ bool, 'ay ~ int, 'az ~ int
'ax ~ 'ay, 'ay ~ 'az
'ay ~ int, 'az ~ int
'ax ~ bool, 'ay ~ int

Inferring polymorphic types

(val app2 (lambda (f x y)
             (begin
                (f x)
                (f y))))

Assume f : 'a_f

Assume x : 'a_x

Assume y : 'a_y

f x implies 'a_f ~ ('a_x -> 'a1)

f y implies 'a_f ~ ('a_y -> 'a2)

Together, these constraints imply 'a_x ~ 'a_y and 'a1 ~ 'a2

begin implies result of function is 'a2

So,

app2 : (('a_x -> 'a2) 'a_x 'a_x -> 'a2)

'a_x and 'a2 aren’t mentioned anywhere else in program, so

we can generalize to:

(forall ('a_x 'a2) (('a_x -> 'a2) 'a_x 'a_x -> 'a2))

which is the same thing as:

app2 : (forall ('a 'b) (('a -> 'b) 'a 'a -> 'b))

Check your understanding: Type inference with polymorphic functions

This question walks you through generating constraints for a simple function that makes two calls to a polymorphic function. It illustrates how those two calls can end up being used at different types.

Each subquestion builds on the previous one. The explanations given after each subquestion explain what you need to know for the next part.

Consider the following program:

(val cc (lambda (nss) (car_2 (car_1 nss))))

where car_1 and car_2 are two instances of the polymorphic car function that has type

car : forall 'a . 'a list -> 'a

For the purposes of this question we’ve named these uses car_1 and car_2 so we can be clear which use we are talking about as we study their effect on the type of the expression.

Variable nss is a parameter, and so during type checking, we assign it a fresh type variable, say 'b.
- True
- False

We need to instantiate the two instances of the polymorphic car function before we can apply them. Which set of instantiations is correct?

car_1 : 'a1 list -> 'a1 and car_2 : 'a1 list -> 'a1
car_1 : 'a1 list -> 'a2 and car_2 : 'a2 list -> 'a1
car_1 : 'a2 list -> 'a1 and car_2 : 'a1 list -> 'a2
car_1 : 'a1 list -> 'a1 and car_2 : 'a2 list -> 'a2

Given what we have already inferred about the types of car_1 and nss (and given in the answers in parts 1 and 2), which of the following constraints is generated by the application (car_1 nss)?

'b ~ 'a1
'b ~ 'a1 list
'a1 ~ 'a2 list
'b ~ 'a2 list

Given what we have already inferred about the types of car_1, car_2, and nss (and given in the answers in parts 1, 2, and 3), which of the following constraints is generated by the application of the function car_2 to the argument (car_1 nss) in the term (car_2 (car_1 nss)):

'a2 ~ 'a1
'a2 ~ 'a1 list
'b ~ 'a2
'a1 ~ 'a2 list

Extra examples

Bonus example

-> (val second (lambda (xs) (car (cdr xs))))
second : ...
-> (val two    (lambda (f) (lambda (x) (f (f x)))))
two : ...

Bonus example solved

-> (val second (lambda (xs) (car (cdr xs))))
second : (forall ('a) ((list 'a) -> 'a))
-> (val two    (lambda (f) (lambda (x) (f (f x)))))
two :    (forall ('a) (('a -> 'a) -> ('a -> 'a)))

Precise inference with Hindley-Milner types

Making Type Inference Precise

Sad news:

Type inference for polymorphism is undecidable

Solution:

Each formal parameter has a monomorphic type

Consequences:

The argument to a higher-order function cannot be mandated to be polymorphic
forall appears only outermost in types

Representing Hindley-Milner types

type tyvar = name
datatype ty
  = TYVAR  of tyvar
  | TYCON  of name
  | CONAPP of ty * ty list

datatype type_scheme
  = FORALL of tyvar list * ty

fun funtype (args, result) = 
  CONAPP (TYCON "function", 
          [CONAPP (TYCON "arguments", args),
           result])

All the pieces

Hindley-Milner types
Bound names : σ, expressions : τ
Type inference yields type-equality constraint
Constraint solving produces substitution
Substitution refines types
Call solver, introduce polytypes at val
Call solver, introduce polytypes at all let forms

To code the type-inference algorithm, replace eqType with constraint generation!

Check your understanding: Hindley-Milner Types

With Hindley-Milner types, the meta-variable τ cannot stand for a type variable α.

True
False

With Hindley-Milner types, the meta-variable σ represents a polymorphic type and has the form ∀ α₁,… α_n . τ.

True
False

Choose the appropriate values for the blanks in the following ML data structure to make it represent the the Hindley-Miler type scheme ∀ α . α list -> α.

FORALL (BLANK1, 
        CONAPP(BLANK2, 
               [CONAPP (TYCON "arguments", BLANK3), 
                        TYVAR "alpha" ]))

BLANK1:

“alpha”
[“alpha”]
TYVAR “alpha”

BLANK2:

TYCON “function”
[TYCON “function”]
“function”

BLANK3:

CONAPP(TYCON “list”, [TYVAR “alpha”])
CONAPP(“list”, [TYVAR “alpha”])
CONAPP(TYCON “list”, TYVAR “alpha”)
CONAPP(“list”, “alpha”)

In Hindley-Miler type inference, type environments Γ always bind variables to type scheme σ. If a variable is monomorphic, the type scheme will quantify over an empty list of type variables (ie, be of the form ∀ . τ)

True
False

Match each step in Hindley-Milner type inference with its purpose.

Steps:

constraint generation
constraint solving
substitution
generaliation
instantiation

Purposes:

Refining types based on information inferred during constraint solving.
Collating clues to make sure expressions are used consistently.
Converting type scheme into a type so that it may be used in a given context.
Collecting clues about how expressions are used in the program.
Introducing type scheme to memorialize inference that given expression will work for any type.

2 November 2020: Making type inference formal

There are PDF slides for 11/3/2020.

Retrospective: Introduction to type inference

Write one thing you learned about type inference so far and one question you have. Your answers may be anonymously shared with your classmates.

Type inference algorithm, formally

All the pieces

Hindley-Milner types
Bound names : σ, expressions : τ
Type inference yields type-equality constraint
Constraint solving produces substitution
Substitution refines types
Call solver, introduce polytypes at val
Call solver, introduce polytypes at all let forms

Apply rule: e’s type could just be a type variable—we need to force it into the arrow form we need.

What you know and can do now

Taking stock

You can complete typeof

Takes e and Γ, returns τ and C

Except for let forms.

Next up: Solving constraints

Check your understanding: Type inference algorithm

The judgment form C, Γ |- e : τ corresponds to the function typeof. For each element of the judgment, say whether that element is an input or an output of the function typeof:

Constraint C:
- input
- output

Context Γ:
- input
- output

Expression e:
- input
- output

Type τ:
- input
- output

Consider the typing rule for apply written in the new style:

Now answer the following questions about the rule.

A. The type τ_f is the type of the first argument to the function being applied.

True
False

B. The arguments to the function e₁, …, e_n have types τ₁, … τ_n, respectively.

True
False

C. The type variable α is the result type of the function.

True
False

Reflection: Solving constraints

For each of the following constraints, say if it is solvable. If it is, give a solution.

 1. int           ~ bool
 2. (list int)    ~ (list bool)
 3. 'a            ~ int
 4. 'a            ~ (list int)
 5. 'a            ~ ((args int) -> int)
 6. 'a            ~ 'a
 7. (args 'a int) ~ (args bool 'b)
 8. (args 'a int) ~ ((args bool) -> 'b)
 9. 'a            ~ (pair 'a int)
10. 'a            ~ tau    // arbitrary tau

What does it mean to solve constraints?

Representing constraints

But first, we need to represent them!

datatype con = ~   of ty  * ty
             | /\  of con * con
             | TRIVIAL
infix 4 ~
infix 3 /\

Example:

  TYVAR "alpha1" ~ TYCON "int" /\  
  TYVAR "alpha2" ~ TYCON "bool"

Two questions: what’s substitution, and when is a constraint satisfied?

just formalize it. It’s a function that behaves a certain way.

Constraint satisfaction: equal types mean equal constructors applied to equal arguments—same is in eqType.

Check your understanding: Substitutions and satisfied constraints

Answer the following questions.

The constraint τ₁ ~ τ₂ is satisfied if τ₁ = τ₂.

True
False

A substitution θ is a mapping from type variables to type variables, but not more complex types.

True
False

A substitution θ ``solves’’ a constraint C by ensuring all types related in C via the ~ equivalence relation are equal.

True
False

Solving simple type equalities

Reflection: Solving type-equality constraints

One type of constraint we need to solve is that involving two types τ₁ ∼ τ₂.

Recall that each type can have one of three forms:

datatype ty
  = TYVAR  of tyvar
  | TYCON  of name
  | CONAPP of ty * ty list

Briefly describe how you would determine whether the constraint τ₁ ∼ τ₂ was satsified.

How many potential cases are there to consider?

Solving type equalities: The nine cases

tau1 \ tau2    TyVar2    TyCon2      ConApp2

                alpha2     mu2     (tau21,...,tau2n)tau2

TyVar1             1        2          3

TyCon1             S      mu1=mu2?     Fail

ConApp1            S        S          Recurse

I = immediate; S = symmetry; 

1 = alpha1 -> alpha2

2 = alpha1 -> mu2

3 = if occurs check passes, alpha1 -> (tau21,...,tau2n)tau2

Solving conjunctions

Check your understanding: Solving constraints

When asked to produce a substitution θ that solves an equality constraint, which has the form τ₁ ~ τ₂, there are nine cases to consider. The number nine comes from the fact that each type has three possible forms: a type variable α, a type constructor μ, or a type application (τ₁,…,τ_n)τ.

True
False

When asked to produce a substitution θ that solves an equality constraint, which has the form τ₁ ~ τ₂, if τ₁ has the form μ₁ and τ₂ has the form μ₂, then there is no solution.

True
False

When asked to produce a substitution θ that solves an equality constraint, which has the form τ₁ ~ τ₂, if τ₁ has the form μ₁ and τ₂ has the form (τ₁,…,τ_n)τ then there is no solution.

True
False

In the type system we have been studying, the constraint α ~ α * τ is solvable.

True
False

When solving the conjunction of constraints C₁ ∧ C₂, it is sufficient to produce a solution θ₁ for constraint C₁ and a separate solution θ₂ for constraint C₂, and then compose those substitutions to produce a solution θ₁ o θ₂ for the conjunction C₁ ∧ C₂.

True
False

The inference algorithm, formally

All the pieces

Hindley-Milner types
Bound names : σ, expressions : τ
Type inference yields type-equality constraint
Constraint solving produces substitution
Substitution refines types
Call solver, introduce polytypes at val
Call solver, introduce polytypes at all let forms

What you can do so far

Write type inference for everything except VAL, VALREC, and LETX

Write function solve, which, given a constraint C, has one of three outcomes:

Returns the identity substitution in the case where C is trivially satisfied
Returns a non-trivial substitution if C is satisfiable otherwise.
Calls unsatisfiableEquality in when C cannot be satisfied

4 November 2020: Instantiation and generalization

There are PDF slides for 11/5/2020.

Retrospective: Making type inference precise

Write one thing you learned about formalizing type inference so far and one question you have. Your answers may be anonymously shared with your classmates.

Reflection: Introducing type schemes

Suppose your type inferencer has produced a type that mentions a type variable α. Under what circumstances do you think it is safe to generalize that type to a polymorphic type scheme ∀ α . τ?

Introducing and eliminating type schemes

All the pieces

Hindley-Milner types
Bound names : σ, expressions : τ
Type inference yields type-equality constraint
Constraint solving produces substitution
Substitution refines types
Call solver, introduce polytypes at val form
Call solver, introduce polytypes at all let forms

Moving from type scheme to types (Instantiation)

Moving from types to type scheme (Generalization)

Check your understanding: Instantiation

Instantiation is the process of converting from a polymorphic type scheme σ to a monomorphic instance τ.
Such conversions always take place when variables are type checked using the Var typing rule.

True
False

Consider the Var typing rule:

Suppose we want to infer a type for the expression

(if y (fst 2 3) 4)

in the environment

Gamma = {fst : forall 'a 'b. 'a * 'b -> 'a, y : 'c}

Suppose we instantiate fst as follows:

fst : 'c * 'b -> 'c

This instantiation and the expression gives rise to the following constraints:

'c ~ bool  ;; because y:'c and y is tested by the if

'c ~ int   ;; because first argument to fst is an int

These constraints aren’t satisfiable.

But, the expression should have the typing

 'c ~ bool, Gamma |- (if y (fst 2 3) 4) : int

What went wrong?

We ignored the freshness requirement when instantiating fst.
We ignored the distinctness requirement when instantiating fst.
Nothing: the program shouldn’t be typeable.

Bonus question: Why the distinctness requirement?

Consider again

Gamma = {fst : forall 'a 'b. 'a * 'b -> 'a, y : 'c}

??, Gamma |- (if y (fst 2 3) 4) : ??

Now suppose we instantiate fst as follows:

fst : 'a * 'a -> 'a

Gamma |- fst 2 #t

Application rule yields constraints:

'a ~ int

'a ~ bool

which aren’t satisfiable. Which means this code would also be erroneously flagged as an error.

Correct typing:

Gamma |- fst 2 #t : int

The mistake was ignoring the distinctness requirement when instantiating the type scheme.

From types to type scheme: Val rule

The set A above will be useful when some variables in τ are mentioned in the environment.

We can’t generalize over those variables.

Applying idea to the type inferred for the function fst:

generalize('a * 'b -> 'a, emptyset) = 
forall 'a, 'b. 'a * 'b -> 'a

Note the new judgement form above for type checking a declaration.

On the condition θΓ = Γ: Γ is “input”: it can’t be changed.
The condition ensures that θ doen’t conflict with Γ.

We can’t generalize over free type variables in Γ.

Their presence indicates they can be used somewhere else, and hence they aren’t free to be instantiated with any type.

Check your understanding: The full Val rule

Consider the Val rule:

In this series of questions, we will walk through using the Val rule to type check the following definition:

(val pick-t (lambda (y z) (pick #t y z)))

in the context

Γ = { pick : ∀ α.bool $\cross$ α $\cross$ α $\arrow$ α}

The first hypothesis above the line in the Val rule that we must check is the judgment C, Γ |- e : τ. Suppose we have already derived the following instance of that judgment as we infer a type for the right-hand side of the pick-t definition:

$$ \begin{array}{l} \alpha_y \sim \alpha_p \land \alpha_z \sim \alpha_p, \Gamma \vdash \\ \quad \lit{(lambda (y z) (pick \#t y z)) }: \alpha_y \cross \alpha_z \arrow \alpha_p \\ \end{array} $$

To derive that judgment, which of the following might have been how the polymorphic pick function was instantiated, assuming α_y and α_z are the types assumed for variables y and z, respectively? Consider what instantiation would have been necessary to produce the observed constraints and resulting type.

pick : bool $\cross$ α_p $\cross$ α_y $\arrow$ α_y
pick : bool $\cross$ α_z $\cross$ α_z $\arrow$ α_z
pick : bool $\cross$ α_p $\cross$ α_p $\arrow$ α_p
pick : α_p $\cross$ α_p $\cross$ α_p $\arrow$ α_p

The next step is to produce a substitution θ that solves the constraints α_y ~ α_p ∧ α_z ~ α_p and leaves Γ unchanged (θ Γ = Γ). Which of the following substitutions is a such solution?

θ = { α_z -> α_p}
θ = { α_y -> α_p, α_z -> α_p}
θ = { α_p -> α_x, α_z -> α_p}
None of the above because they all change Γ.

The next step is to generalize. Given the substitution θ = { α_y -> α_p, α_z -> α_p}, what type should be passed as the first argument to the generalize function? (The second argument is the empty set because ftv(Γ) = ∅.

θ(α_y $\cross$ α_z $\arrow$ α_p) = α_p $\cross$ α_p $\arrow$ α_p
θ(α_y $\cross$ α_z $\arrow$ α_p) = α_z $\cross$ α_z $\arrow$ α_z
θ(α $\cross$ α $\arrow$ α) = α $\cross$ α $\arrow$ α
θ(α_y) = α_p

Reflection: Type inference for `let` form

Consider the following nML code. ML will infer a type scheme for the first and third examples, but reports an error for the second.

Say what each of the functions is attempting to do and explain why you think ML reports a type error for the second function.

(lambda (ys)  
   (let ([s (lambda (x) (cons x '()))])
      (pair (s 1) (s #t))))
<function> : 
  (forall ['a] ('a -> (pair (list int) (list bool))))

(lambda (ys)  
   (let ([extend (lambda (x) (cons x ys))])
      (pair (extend 1) (extend #t))))
type error: cannot make int equal to bool

(lambda (ys)  ; OK
    (let ([extend (lambda (x) (cons x ys))])
       (extend 1)))
<function> : ((list int) -> (list int))

Let with constraints, operationally:

typesof: returns τ₁, …, τ_n and C
val theta = solve C
C-prime from map, conjoinConstraints, dom, inter, freetyvarsGamma
freetyvarsGamma, union, freetyvarsConstraint
Map anonymous lambda using generalize, get all the σ_i
Extend the typing environment Gamma (pairfoldr)
Recursive call to type checker, gets C_b, \tau
Return (tau, C' /\ C_b)

Check your understanding: The Let rule

The following questions ask about the Let rule:

The let-bound variables x₁, …, x_n are given polymorphic type schemes within the scope of the body of the let e, but the type inferred for e is monomorphic.
- True
- False

If the constraint solver returned the substitution θ = {α -> (α, β)}, type inference would be able to proceed with calculating a residual constraint and generalization.

True
False

The residual constraint C′ retains all information in the substitution θ that relates to any type variable that might be relevant after type checking the let expression as determined by being mentioned in the free type variables of the context Γ.

True
False

During generalization, we only need to worry about not generalizing over the free type variables mentioned in Γ because those type variables might be constrained in various ways by other parts of the program.

True
False

Check your understanding: The val-rec and letrec rules

The key insight with the two recursive type inference rules is to follow a three-step process:

Introduce a fresh type variable α to represent the unknown type for the recursive binding.
Infer constraints and a type τ for the recursive binding as usual
Add a constraint relating the type variable and the inferred type (α ~ τ).

Otherwise, the recursive variants of the type inference rules proceed just as the non-recursive versions.

True
False

Forall things

Managing Quantified types
`val` and `val-rec`	`let`, `letrec`, …	`lambda`
FORALL contains all variables (because none are free in the context)	FORALL contains variables not free in the context	FORALL is empty
Generalize over all variables (because none are free in the context)	Generalize over variables not free in the context	Never generalize

9 November 2020: Introduction to Object-orientation

There are PDF slides for 11/10/2020.

Retrospective: Type Inference

Write one thing you learned in the previous module about type inference and one question you have about that material.

Your responses may be anonymously shared with the rest of the class.

Reflection: Introduction to Object-Oriented Programming

Suppose you were implementing a drawing program that lets users manipulate various kinds of shapes: circles, triangles, diamonds, etc.

What information would each shape need to know to be able to respond to user commands to display itself or to move to a new location?
What information would each shape need to know about the other shapes?
What benefits would there be to minimizing the amount of information shared between objects?
How would you organize your code if you knew you were going to have to add lots of new shapes in the future?

Information Hiding

So far: Programming in the small

Expressive power
Success stories:
- First-class functions
- Algebraic data types and pattern matching
- Polymorphic type systems

What about big programs?

Immersion in pile of interpreter code: not fun

An area of agreement and a great divide:

                 INFORMATION HIDING
             (especially: mutable data)
                      /       \
                     /         \
       code reuse   /           \  modular reasoning 
                   /             \ 
interoperability  /               \  internal access
between reps     /                 \  to reps
                /                   \
            OBJECTS                 MODULES               
                                  ABSTRACT TYPES

Key idea: Interfaces

Provide external abstraction of gory details
Limit access so internals can evolve

Introduction to OOP

Example Languages: JavaScript, Ruby, Java, C++, Python, …

Goals:

Abstraction
Code reuse
Scaling to large code bases

Mechanism:

Every object is a black box
All you can do is send it messages!
What messages? What do they do?
That’s the protocol
(an example of an interface)

Object-oriented demo

Demo: circle, diamond, triangle cards with these methods:

Front:

Object 1 : Circle

pos: cardinal-point
set-pos:to: cardinal-point coord
draw

Back:

Object 1: Circle

;; stores center of shape

center: coord

methods: ...

Message sends:

object1 set-pos:to: 'S (0,0)
object1 pos: 'N answer: (0,2)
object2 set-pos:to: 'S (0,2)
object2 pos: 'N answer: (0,4)
object3 set-pos:to: 'SW (0,4)
object1 draw
object2 draw
object3 draw

Key concepts of object-orientation

Don’t ask about my form:
It’s not your concern!
(information hiding)
Instead:
- Ask me to tell you something
- Ask me to do something
(In object-oriented world, imperative “do this” is common.)
If you ask me something, I might delegate to my parent.

(Code reuse: one parent can do things for many children.)

Functional vs. OOP mindset

Functional and procedural:
- I find out what form you are
- I decide what to do and how to do it.
- Function name comes first in the syntax.
Object oriented:
- I ask you what to do
- You decide how to do it, based on what form you are.
- Receiving object comes first in the syntax.

Reflected in the syntax: the decider comes first!

Examples

Print every element of a list

Functional: Are you nil or cons?
OOP: Ask every element to do something.

Multiplication of natural numbers, base 10

Functional: Are you zero or nonzero?
OOP: return a number that is 10 times yourself.

Check your understanding: Key concepts of object-orientation

In object-oriented programming, computation proceeds by sending messages to objects.

True
False

Information-hiding is important to large-scale programming because it allows a large program to be divided into smaller units that can be understood separately.

True
False

In both functional and object-oriented languages it is idiomatic to ask about the representation of arguments to functions/methods.

True
False

One way an interface supports programming-in-the-large is by limiting access to the internal details of part of a program so those details may evolve independently of the rest of the program.

True
False

the fact that the concepts are intertwined to the point that it is hard to present them one at a time. In this video, I’ll review some of the key terms to give you some context, but we’ll introduce them again as we see them in use.

OOP terminology

Object:
- Dynamic entity that receives messages
- Encapsulates instance variables (generally imperative) and methods (aka instance methods)
- Special variable self refers to the host object
Class: Construct for defining objects
- Specifies methods for objects
- Defines object initialization in class methods
- New objects created by sending message to class
- May inherit behavior from parent or super class
- An object instantiated from a class is called an instance of that class.
Dynamic dispatch: Another term for message sending

Example: List select (aka filter)

-> (val ns (List withAll: '(1 2 3 4 5)))    
List( 1 2 3 4 5 )
-> (ns select: [block (n) ((n mod: 2) = 0)])
List( 2 4 )

Vocabulary notes:

List is a class
withAll is a class method
- It encapsulates a use of new
ns is an object and an instance of class List
ns is sent the message select
select is an instance method of class List
Bonus: block construct analogous to lambda

Check your understanding: OOP Terminology

Match the OOP term to its definition:

OOP term:

object
class
instance variables
class method
dynamic dispatch
inheritance
instantiation

Definition:

construct used for defining objects
imperative state allocated on a per-object basis
process of creating an object from a class; the new object is called an instance of the class
dynamic entity that receives messages
behavior invoked at the class level typically for instantiating and initializing objects
mechanism whereby one class can be defined by extending the behavior of a parent or super class
another term for message sending

Design process in an OO setting

No interrogation about form!

Design process still works

Each method defined on a class
- The class knows the form!
Class determines
- How object is formed (class method)
- From what parts (instance variables)
- How object responds to messages (instance method)

Each form of data gets its own methods in its own class!

Using classes to implement select

Class determines how object responds.

Key classes in lists:

class ListSentinel: an instance denotes end of list
class Cons: instance is cons cell with car and cdr instance variables

(method select: (_) self) ;; on ListSentinel

(method select: (aBlock)  ;; on Cons
   ([aBlock value: car] ifFalse:ifTrue:
       {(cdr select: aBlock)}
       {(((Cons new) car: car) 
                     cdr: (cdr select: aBlock))}))

Check your understanding: Classes represent forms of data

When coding lists using object-oriented idioms, different classes are used to represent the different forms of data: one class for the empty list and another class for non-empty lists.

True
False

Dynamic dispatch replaces if

In OOP, conditionals are deprecated

Instead, idiomatic code uses dynamic dispatch.

We’ll expore via example: List filtering via iteration

Uses imperative idioms (common in OOP)

Iteration in Scheme

Ask value about form:

(define app (f xs)
  (if (null? xs)
      'do-nothing
      (begin
         (f (car xs))
         (app f (cdr xs)))))

Iteration in uSmalltalk

Use dynamic dispatch:

Instead of (app f xs), we have

(xs do: f-block)

For each element x in xs,

send

Example: Iteration

-> (val ms (ns select: [block (n) ((n mod: 2) = 0)]))
List( 2 4 )

-> (ms do: [block (m) ('element print) (space print)
                      ('is print) (space print)
                      (m println)])
element is 2
element is 4
nil

Implementing iteration

What happens if we send “do f” to an empty

What happens if we send “do f” to a cons cell?

Iteration by dynamic dispatch

Sending do: to the empty list:

  (method do: (aBlock) nil)
      ; nil is a global object

Sending do: to a cons cell:

  (method do: (aBlock)
      (aBlock value: car)
      (cdr do: aBlock))

Look! No if! Decisions made by dynamic dispatch

Basic OOP mechanism review

Message send replaces function application
- (ns select: pred) vs filter ns pred
Receiver appears first: it’s in charge!
Object responds to message by evaluating method
Method determined by object’s class

Not shown: a method can use primitive operations.

Check your understanding: Code selection via dynamic dispatch

Dynamic dispatch means it is the run-time class of the receiver object that determines the code to be executed.

True
False

Idiomatic object-oriented code often avoids conditionals through the use of dynamic dispatch, for example by sending messages to objects from different classes to trigger class-appropriate behavior instead of asking the object about its representation and then branching on the result.

True
False

The six questions for Smalltalk

The six questions

Values are objects (even true, 3, "hello")

Even classes are objects!

There are no functions—only methods on objects

The six questions

Syntax:

Mutable variables
Message send
Sequential composition of mutations and message sends
(side effects used commonly)
“Blocks”
(objects and closures in one, used as continuations)
No if or while.
Functionality implemented by passing continuations to Boolean objects.

Message passing

Look at SEND

Message identified by name (messages are not values)
Always sent to a receiver
Optional arguments must match arity of message name
(no other static checking)

N.B. BLOCK and LITERAL are special objects.

The six questions

Environments

Name stands for a mutable cell containing an object:
- Global variables
- Instance variables

The six questions

Types

There is no compile-time type system.

At run time, Smalltalk uses behavioral subtyping, known to Rubyists as “duck typing”
Dynamic semantics
- Main rule is method dispatch (complicated)
- The rest is familiar
The initial basis is enormous
- Why? To demonstrate the benefits of reuse, you need something big enough to reuse.

Check your understanding: The six questions for Smalltalk

In μSmalltalk, all values are objects, even booleans, integers, strings, classes.

True
False

Because μSmalltalk focuses so strongly on objects, it does not provide any language feature akin to anonymous functions.

True
False

The μSmalltalk type system statically guarantees that no message-not-understood errors will happen at run-time. That is, if a message m is sent to an object obj in the text of the program, then obj belongs to a class in which method m is defined.

True
False

10 November 2020: OOP Mechanisms

There are PDF slides for 11/11/2020.

Retrospective: Introduction to OOP

Write one thing you learned about OOP so far and one question you have. Your answers may be anonymously shared with your classmates.

Smalltalk syntax

little livecoding.

Coding in uSmalltalk

;; blocks
(val twice [block (n) (n + n)])
(twice value: 3)
;; note lightweight syntax for parameterless blocks
(val delayed {(’hello println) 42})
delayed
(delayed value)
;; blocks sent to Booleans
(val x 10)
(val y 20)
;; Note the number of columns matches number of args
((x <= y) ifTrue:ifFalse: {x} {y})
;; numbers
;; Fractions: exact 1 / 3
;; (1/3) is a name; (1 / 3) is what you want!
((1 / 3) - (1 / 2))
((1 / 3) + 1) ;; mixing classes!
((1 / 3) asFloat) ;; software floating point
(2 asFloat)
((2 asFloat) sqrt)
(((2 asFloat) sqrt) squared)
(100 squared)
(it squared)
… keep going until overflow …
(use bignum.smt)
100
(it squared)
Try again, go until CPU cap, stack trace

Coding Interlude: Coding in Smalltalk

Consider the following code:

  ;; blocks
  (val twice [block (n) (n + n)])
  (twice value: 3)
  ;; note lightweight syntax for parameterless blocks
  (val delayed {('hello println) 42})
  delayed
  (delayed value)
  ;; blocks sent to Booleans
  (val x 10)
  (val y 20)
  ;; Note the number of columns matches number of args
  ((x <= y) ifTrue:ifFalse: {x} {y})

Enter the code, run it, and observe the results.
Write a block square that multiples its argument object by itself (the symbol * represents multiplication), and use it to square the number 5.

Now answer the following questions:

Why doesn’t entering delayed cause the string hello to be printed and the number 42 to be returned?
Why does entering (delayed value) cause the string hello to be printed and the number 42 to be returned?
What is the meaning of the braces, as in {('hello println) 42} and in the arguments to theifTrue:ifFalse:method ({x}and{y}`)?
What about the syntax of the ifTrue:ifFalse method tells you how many arguments it is expecting?

Classes

Parts of a class definition

Superclass: What do I inherit?
Instance variables:
- Parts from which objects are formed
Class methods:
- How will class respond to class messages?
  (Yes, in uSmalltalk, classes are also objects)
- Typically used to create and initialize instances.
Instance methods:
- How will each object respond to messages?
- Some may override methods in parent class.

Example: CoordPair

(class CoordPair
   [subclass-of Object]   ;; Inherits from class Object
   [ivars x y]            ;; Instance variables

   (class-method withX:y: (anX aY)
      ((self new) initializeX:andY: anX aY))
   (method initializeX:andY: (anX aY) ;; private
        (set x anX)
        (set y aY)
        self)

   (method x () x)
   (method y () y)
   (method = (coordPair)
      ((x = (coordPair x)) & (y = (coordPair y))))
   (method * (k)           ;;;;;; Scaling and translation
        (CoordPair withX:y: (x * k) (y * k)))
   ... )

Check your understanding: Parts of a class definition

Consider the following class definition:

(class CoordPair
   [subclass-of Object]   ;; Inherits from class Object
   [ivars x y]            ;; Instance variables

   (class-method withX:y: (anX aY)
      ((self new) initializeX:andY: anX aY))
   (method initializeX:andY: (anX aY) ;; private
        (set x anX)
        (set y aY)
        self)

   (method x () x)
   (method y () y)
   (method = (coordPair)
      ((x = (coordPair x)) & (y = (coordPair y))))
   (method * (k)           ;;;;;; Scaling and translation
        (CoordPair withX:y: (x * k) (y * k)))
   ... )

Now answer the following questions.

Class CoordPair inherits from a class Pair

True
False

Objects instantiated from class CoordPair have four instance variables: x, y, anX, and anY.

True
False

The class method withX:y: is sent to the class CoordPair.

True
False

The method = is an instance method that is sent to objects instantiated from the class CoordPair.

True
False

In the class CoordPair, the identifier x is used as both an instance variable and an instance method.

True
False

Access rights

Instance variables

Visible only to me (“private”)
I can’t access argument instance variables directly;
I can only send messages.
- Instance variables are encapsulated

Methods

By mechanism, “public” (anyone can send)
By convention, some are private
- Send to myself, or to an instance I create
  (Simplest thing that could possibly work.)

coordPair; need to use accessor methods; common idiom.

Example: Access rights

(class CoordPair
   [subclass-of Object]   ;; Inherits from class Object
   [ivars x y]            ;; Instance variables

   (method initializeX:andY: (anX aY) ;; private
        (set x anX)
        (set y aY)
        self)

   (method x () x)
   (method y () y)
   (method = (coordPair)
      ((x = (coordPair x)) & (y = (coordPair y))))

   ...
)

Check your understanding: Access rights

Consider the following code:

(class CoordPair
   [subclass-of Object]   ;; Inherits from class Object
   [ivars x y]            ;; Instance variables

   (method initializeX:andY: (anX aY) ;; private
        (set x anX)
        (set y aY)
        self)

   (method x () x)
   (method y () y)
   (method = (coordPair)
      ((x = (coordPair x)) & (y = (coordPair y))))

   ...
)

Now answer the following questions:

In the class CoordPair, the instance variable y is private, which means it is accessible within the methods of the class, but not elsewhere. As a result, it is accessible within initializeX:andY:, y, and =. However, any y instance variable that might be part of the argument object coordPair is not accessible in the = method.

True
False

μSmalltalk makes it impossible to access the method initializeX:andY: outside of the class CoordPair because of the comment private.

True
False

invariants for each class in hierarchy.

Object initialization

Objects created by sending creator method to class.

(class CoordPair
   [subclass-of Object]   ;; Inherits from class Object
   [ivars x y]            ;; Instance variables

   (class-method withX:y: (anX aY)
      ((self new) initializeX:andY: anX aY))

   (method initializeX:andY: (anX aY) ;; private
        (set x anX)
        (set y aY)
        self)
    ...

   (method * (k)          ;;;;;; Scaling and translation
        (CoordPair withX:y: (x * k) (y * k)))
)

Check your understanding: Object initialization

Consider the following code:

(class CoordPair
   [subclass-of Object]   ;; Inherits from class Object
   [ivars x y]            ;; Instance variables

   (class-method withX:y: (anX aY)
      ((self new) initializeX:andY: anX aY))

   (method initializeX:andY: (anX aY) ;; private
        (set x anX)
        (set y aY)
        self)
    ...

   (method * (k)          ;;;;;; Scaling and translation
        (CoordPair withX:y: (x * k) (y * k)))
)

Now answer the following questions:

The method * sends the message withX:y: to the class CoordPair to create a new CoordPair object initialized with scaled up x and y coordinates.

True
False

The class method withX:y: sends the new message to self to allocate a fresh CoordPair object. It then sends the message initializeX:andY: to the resulting object to initialize its instance variables x and y to the parameter values anX and aY.

True
False

An alternative way to get a new CoordPair object in a method defined within the CoordPair class would be to send the message initializeX:andY: to self.

True
False

Special name self

Allows message sends to host object
Idiomatic to return self to support message chaining
Not a proper variable because can’t be set

Example: A short-hand method

     (method toOrigin ()
         (self setLoc: (0,0)))

Special name super

Starts method search with parent class
Statically determined based on defining class of method
Typically used when refining inherited behavior
Not a proper variable because can’t be set

Example: A speedy shape

(method move: (coord) (super move: (coord * 2)))

Check your understanding: Special names self and super

A method defined on an object obj sends a message to the special name self to invoke another one of obj’s methods.

True
False

A method defined in a class S that inherits from class P sends a message m to the special name super when it wants to use the method implementation that P would have used for m.

True
False

Abstract and concrete classes

Abstract class

Meant to be inherited from
Some ( > 0) subclassResponsibility methods
Examples: Shape, Boolean, Collection

Concrete class (aka, a regular class)

Meant to be instantiated
No subclassResponsibility methods
Examples: Triangle, True, List

the collction protocol:

definitions with more efficient implementations.

Check your understanding: Concrete and abstract classes

An abstract class has exactly one abstract method whose implementation is subclassResponsibility.

True
False

The purpose of an abstract class is to define some concrete methods in terms of the abstract methods so that behavior can be inherited by subclasses.

True
False

Method Dispatch

dispatch. We’ll use Booleans as an example.

class implements: it’s interface. Often standard methods defined on all objects omitted (like print).

send a message to the object true.

Method dispatch: The algorithm

To answer a message:

Consider the class of the receiver
Is the method with that name defined?
If so, use it
If not, repeat with the superclass

Run out of superclasses?

“Message not understood”

Method dispatch: An example

Q: What happens if is sent to ?

(true ifTrue: aBlock)

A: Search starts in True class and method is found.

Q: What if is sent to ?

(true print)

A: Search starts in True, moves to parent Boolean, stops in class Object where it finds the print method.

Q: What if is sent to ?

(true move)

A: Search starts in True, moves to Boolean, moves to Object, fails to find move, so reports “Message not understood” error.

Check your understanding: Method dispatch

When a message is sent to an object, the runtime system searches for a corresponding method definition starting with the class that defined the object.

True
False

If a message m is sent to an object obj and if the runtime system cannot find a method definition in obj’s class, then the system reports a message-not-understood error.

16 November 2020: OOP Case Study: The Number Hierarchy

There are PDF slides for 11/17/2020.

Case study: Magnitudes and numbers

Reflection: Design of a Number Hierarchy

Imagine you are designing a hierarchy of number types such as reals, floats, integers and fractions. How would you organize this hierarchy? How are these number types similar and how are they different?

Key problems on homework

Natural is a Magnitude
“Large integer” is a Number

Case Study: The Number Hierarchy

Goal: Develop feel for programming in the large

Issues we’ll consider along the way:

Class design
- What methods go where?
Invariants
- Class methods establish invariants
- Instance methods must maintain them
Information hiding
- “semi-private” methods for like objects
- Double-dispatch when argument form matters.

Implementation of : Reuse

(class Magnitude            ; abstract class
    [subclass-of Object]
    (method =  (x) (self subclassResponsibility))
                    ; may not inherit = from Object
    (method <  (x) (self subclassResponsibility))
    (method >  (y) (y < self))
    (method <= (x) ((self > x) not))
    (method >= (x) ((self < x) not))
    (method min: (aMag)
       ((self < aMag) ifTrue:ifFalse: {self} {aMag}))
    (method max: (aMag)
       ((self > aMag) ifTrue:ifFalse: {self} {aMag}))
)

Check your understanding: Design of magnitude class

Consider the Magnitude Class

(class Magnitude ; abstract class
  [subclass-of Object]
  (method = (x) (self subclassResponsibility))
                ; may not inherit = from Object
  (method < (x) (self subclassResponsibility))
  (method > (y) (y < self))
  (method <= (x) ((self > x) not))
  (method >= (x) ((self < x) not))
  (method min: (aMag)
      ((self < aMag) ifTrue:ifFalse: {self} {aMag}))
  (method max: (aMag)
      ((self > aMag) ifTrue:ifFalse: {self} {aMag}))
)

What methods must a subclass of Magnitude implement at a minimum?

<, =
=
<,>,=
<,=,<=,>=,>

True or False: Can you compare two instances of Magnitude for equality using the = method as defined?

True
False

receiver but represents the same value as the argument

Reflection: OOP Design: Implementing class Number

Code reuse is a powerful feature. Consider the methods negated, reciprocal, +, -, * / .

How might you implement some of these methods in terms of others?

Concrete Number Methods

(method -  (y) (self + (y negated)))
(method abs () ((self isNegative) ifTrue:ifFalse:
                   {(self negated)}
                   {self}))
(method /  (y) (self * (y reciprocal)))

(method isNegative         ()
   (self  < (self coerce: 0)))
(method isNonnegative      ()
   (self >= (self coerce: 0)))
(method isStrictlyPositive ()
   (self  > (self coerce: 0)))

Abstract Number Methods

(class Number
    [subclass-of Magnitude] ; abstract class 
    ;;;;;;; arithmetic
    (method +   (aNumber)     (self subclassResponsibility)) 
    (method *   (aNumber)     (self subclassResponsibility)) 
    (method negated    ()     (self subclassResponsibility)) 
    (method reciprocal ()     (self subclassResponsibility)) 

    ;;;;;;; coercion
    (method asInteger ()      (self subclassResponsibility)) 
    (method asFraction ()     (self subclassResponsibility)) 
    (method asFloat ()        (self subclassResponsibility)) 
    (method coerce: (aNumber) (self subclassResponsibility)) 
)

Check your understanding: Design of number class

Consider the Extended Number Heirarchy

At a minimum what messages will instances of Fraction, Float and Integer be able to respond to?
- Methods defined in Number, Magnitude and Object
- Methods defined in Number only

At a minimum what messages will instances of Small Integer and LargeInteger be able to respond to?
- Only Methods defined in Number
- Methods defined in Number and Methods Defined in Integer

Fractions: A case study in information hiding and initialization

the denominator is always positive
If the numerator is zero, the denominator is 1.
The numerator and the denominator are reduced to lowest terms, that is, their only common divisor is 1.

Example class Fraction: Initialization

(class Fraction
    [subclass-of Number]
    [ivars num den] 
      ;; invariants: lowest terms, minus sign on top
      ;;             if num = 0, den = 1
      ;; maintained by divReduce, signReduce
    (class-method num:den: (a b) 
        ((self new) initNum:den: a b))
    (method initNum:den: (a b) ; private
        (self setNum:den: a b)
        (self signReduce)
        (self divReduce))
    (method setNum:den: (a b)
        (set num a) (set den b) self) ; private
    .. other methods of class Fraction ... )

Information revealed to self

Instance variables num and den

Directly available
Always and only go with self

Object knows its own representation, invariants, private methods:

(method asFraction ()
   self)
(method print () 
   (num print) ('/ print) (den print) self)
(method reciprocal ()
   (((Fraction new) setNum:den: den num) signReduce))

Information revealed to others

Reflection: Implementing coerce

Coercion is the process of making an instance fit a specified type. For instance, the Integer 1 can be coerced to the Float 1.0.

Imagine you wanted to add a Float x to an Integer y without losing any precision. Which instance should be coerced to what type before adding?

The method coerce: implements this idea. It returns the value of the argument in the representation of the receiver. How would you implement coerce: ?

What does the caller of the method coerce: need to know about the type and protocol of the receiver?

Information revealed to others: coerce

How would you implement coerce:?

Value of argument, rep of receiver
Challenge: Can’t access rep of argument!

(method asFraction ()
   self)
(method print () 
   (num print) ('/ print) (den print) self)
(method reciprocal ()
   (((Fraction new) setNum:den: den num) signReduce))
(method coerce: (aNumber)
   (aNumber asFraction))

Saved: Number protocol includes asFraction!

Reflection: Implementing multiply on fractions

(class Fraction
    [subclass-of Number]
    [ivars num den]
        ;; invariants: lowest terms, minus sign on top
        ;; if num = 0, den = 1
        ;; maintained by divReduce, signReduce
    (class-method num:den: (a b)
        ((self new) initNum:den: a b))
    (method initNum:den: (a b) ; private
        (self setNum:den: a b)
        (self signReduce)
        (self divReduce))
    (method setNum:den: (a b)
        (set num a) (set den b) self) ; private
    .. other methods of class Fraction ... )

Consider implementing multiply between instances of the Fraction class. How do you multiply two fractions? How might you implement multiply given what we know about multiplying two fractions?

Reflection: Implementing multiply across differnt kinds of numbers

How might you implement multiply when we have lots of different kinds of numbers with different representations?

Extending open systems

Number protocol: like multiplies with like

Goal:

Large integers and small integers are both Integers
Messages =, <, +, * ought to mix freely

Constraint: Each object has its own algorithm.

Small: Use machine-primitive multiplication
Large: Multiply magnitudes; choose sign

Double dispatch to the rescue!

In the following slide, the notation :+ and :- are value constructors for positive and negative large integers. The n and m are magnitudes (natural numbers).

Double dispatch to implement addition

How do you act?

As small integer, you’re asked to add large positive integer N to self (aka plusLP: N)

Response: ((self asLargeInteger) plus N)

As small integer, you’re asked to add small integer n to self (aka plusSP: n)

Response: (self plus n)

As large positive integer, you’re asked to add large positive integer N to self (aka plusLP: N)

Response: (self plus n)

As large positive integer, you’re asked to add small integer n to self (aka plusSP: n)

Response: (self plus (n asLargeInteger))

Check your understanding: Double dispatch

What two things does a double dispatch method encode?
- Operation and protocol
- Operation and instance
- Instance and protocol

Consider using double dispatch to implement multiplication between your objects.

For each item, consider your class, and then the class of the object you are being multiplied against.

Do you need to coerce yourself into a different type of number for the operation to be successful?

Choose the appropriate smalltalk code to accomplish what you’re asked to do.

As small integer, you’re asked to multiply large positive integer N by self (aka timesLP: N)
- (self * n)
- ((self asLargeInteger) * N)
As small integer, you’re asked to multiply small integer n by self (aka timesSP: n)
- (self * n)
- ((self asLargeInteger) * N)
As large positive integer, you’re asked to multiply large positive integer N by self (aka timesLP: N)
- (self * n)
- ((self asLargeInteger) * N)
As large positive integer, you’re asked to multiply small integer n by self (aka timesSP: n)
- (self * n)
- ((self asLargeInteger) * N)

Take a look at the code below. Can you determine which class each method belongs to? Hint: Your instance knows it’s own type and broadcasts that to the object the method is operating on.

(method + (aNumber) (aNumber addSmallIntegerTo: self))
- SmallInteger
- LargePositiveInteger
(method * (anInteger) (anInteger multiplyByLargePositiveInteger: self))
- SmallInteger
- LargePositiveInteger

Your turn: Double dispatch to implement multiplication

How do you act?

As small integer, you’re asked to multiply large positive integer N by self (aka timesLP: N)

Response: ((self asLargeInteger) * N)

As small integer, you’re asked to multiply small integer n by self (aka timesSP: n)

Response: (self * n)

As large positive integer, you’re asked to multiply large positive integer N by self (aka timesLP: N)

Response: (self * n)

As large positive integer, you’re asked to multiply small integer n by self (aka timesSP: n)

Response: (self * (n asLargeInteger))

Knowledge of protocol

Three levels:

I know only your

Example: send to any collection

You are like me: share

Example: send or to

I must get to know you:

Example: send to to

Extra: Dealing with overflow

New law for multiplication:

  (self * small) 
=
  ((primitive mulWithOverflow
       self
       small
       {((self asLargeInteger) * small)})
   value))

Primitive is not a method

Answers good block or exception block
Answer is then sent value

18 November 2020: Subtyping and Equality

There are PDF slides for 11/19/2020.

Retrospective: Double Dispatch

Write one thing you learned about Double Dispatch and one question you have. Your answers may be anonymously shared with your classmates.

Reflection: Subtyping

Consider when can an object of one type be used in a context expecting an object of another type.

What needs to be true for this substitution to work?

Hint: Imagine you need to pound a nail. In this context you expect to use a hammer. If you had to use something else besides a hammer, what needs to be true about the alternative object for to accomplish your task?

Subtyping

Subtype polymorphism

Key mechanism to support code reuse
A is a subtype of B (written A <: B) if value a:A can be used whenever a value of supertype B is expected.
Example: Circle, Diamond, and Triangle can be used in any context expecting a Shape
Subtyping relationship can be checked statically (e.g. Java, C++, Scala) or dynamically (e.g. Smalltalk, Ruby)

Behavioral subtyping

(in Ruby, ``duck typing’’)

Types aren’t enough:

Methods should also behave as expected.

Example:

draw method in Shape hierarchy vs.
draw method in first-person shooter game

Not enforced in type systems because of decidablity.

Check your understanding: Subtyping

True or False: A subtype doesn’t need to understand all the messages of its supertype.

True
False

True or False: If A is a subtype of B and B is a subtype of C A can understand all the messages of C.

True
False

Reflection: Subtyping vs. Inheritance

Subtyping describes the relationship between two objects in terms of what messages both objects understand.

Inheritance describes the relationship between two objects in terms of what code is shared between objects.

First, describe two objects that aren’t related via inheritance, but that are subtypes.

Next, describe two objects that are related via inheritance, but that aren’t subtypes.

Subtyping vs Inheritance

Subtyping is not inheritance

  - **SUBTYPE != SUBCLASS**
  - **SUPERTYPE != SUPERCLASS**

Some languages like C++ identify subtype with subclass, but conceptually they are different.

Examples

Example: circle and square with same protocol but defined independently

Subtypes but not related via inheritance

Example: Non-mutable dictionary that inherits from Dictionary but hides all methods that can update entries

Related via inheritance but not subtypes

Subtyping and Inheritance: Code Reuse

What reuse is being enabled?

Inheritance: Parent code to facilitate class definition
Subtyping: All the client code

Food for thought: Which has the bigger impact?

Check your understanding: Subtyping vs. Inheritance

Match each term with the correct definition.

Subtyping
- related in class definition
- substitutability
Inheritance
-related in class definition
- substitutability

True or False: When you add a method to a class, you need to add it to all the classes which inherit from that class.

True
False

Reflection: Equality

When should two objects be considered equal? Consider comparing two strings with == in C. What problems could result?

Equality

When are two entities equal?

Easy to get wrong esp. for abstract types
Logically equal but have different reps

Example: lists used to represent sets

[1,2,3] and [2,1,3]

Logically distinct by have the same rep

Example: Simulation of an ecosystem

Two distinct animals may have same data

Different answers for different types

Scalars

Examples: int, bool, char, etc.

Test: Same bit representation

Efficient
But: Don’t use for floats!

Functions

Example: int * int list -> int list

Test: Same code?

Decidable, but meaningful?

Test: Same math function?

Undecideable
Reason for ML’s ''a types:
any type that supports equality

Usually disallowed.

Compound types: Option 1

Examples:

a dict object from class Dictionary
a rabbit from a simulation class

Test: object identity

Same object in memory
Very efficient
May fail to equate things that should be equal

Compound types: Option 2

Examples:

a dict object from class Dictionary
a rabbit from a simulation class

Test: structural equality

Recursively examine all the parts
Can be expensive!
Less picky, but may also fail to equate things that should be equal

Compound types: Option 3

Examples:

a dict object from class Dictionary
a rabbit from a simulation class

Test: User-defined

Can precisely define correct notion of equality
May or may not be efficient
Requires programmer attention

Tension

Correctness

vs. Efficiency

vs. Programmer Convenience

Check your understanding: Equality

True or False: Different types may require different notions of equality. For example a notion of equality between two integers may be different than a notion of equality between two functions, or two expressions or two strings.

True
False

Semantics of Equivalence

Observational equivalence

Key idea: observational equivalence

Two values should be considered equal if no context can tell them apart

Key question: What should the context be allowed to do?

If context can use reflection (eg, class methods to query object formation), then no abstraction
- Rule out reflection
Internals of an abstraction can see more than intended
- Only let context use client code
If object is mutable and context is allowed to mutate it
- either: Object identity
- or: Rule out mutation

Check your understanding: Observational Equivalence

What is the key idea of Observational Equivalence?

Two values should be considered equal if no context can tell them apart.
Two values should be considered equal if they have the same structure or binary representation in memory.
Two values should be considered equal if there is a context which produces the same result for both.

Equality in uSmalltalk

Method == is object identity

;; object identity
(method  == (anObject) (primitive sameObject self anObject))

Method = is a form of observational equivalence

Principle: Two objects are considered equivalent if, without mutating either object, client code cannot tell them apart
Each class decides for itself by overwriting default method
Defaults to object identity

(method = (anObject) (self == anObject))
Used by check-expect
Why disallow client code from using mutation to observe?
Pragmatic: already have object-identity with = method

Example: Equivalent but not equal

(-> (val ns (List new)) List( )
-> (ns addFirst: 3)
-> (ns addFirst: 2)
-> (ns addFirst: 1)
-> ns

List( 1 2 3 )
-> (val ms (List new)) List( )
-> (ms addLast: 1)
-> (ms addLast: 2)
-> (ms addLast: 3)
-> ms
List( 1 2 3 )

-> (ns == ms) 
<False>
-> (ns = ms)
<True>

Check your understanding: Equality in Smalltalk

Match each of the smalltalk operators ( =, == ) with the correct definition.

==
- Are two objects observationally equivalent?
- Are two objects pointing to the same place in memory?
=
- Are two objects observationally equivalent?
- Are two objects pointing to the same place in memory?

Review: Designing with objects

Visibility rules

Private messages to self (or instance of self)

Secret protocol to other objects of the same class (which can be spoofed)

How to approach object-orientation

What to remember:

Dynamic dispatch tells your own form of data
Avoid knowing any argument’s form!
Use semi-private methods, or (last resort) double dispatch

What not to do:

Don’t guess what code will answer a message
Don’t try to trace follow procedures step by step
(crosses too many class/method boundaries)

What to do instead:

Trust the contract
Keep each method dirt simple

23 November 2020: Hiding information with abstract data types

There are PDF slides for 11/24/2020.

Retrospective: Object-oriented programming

Write one thing you learned about object-oriented programming and one question you still have. Your answers may be anonymously shared with your classmates.

Reflection: Introduction to Module Systems

Describe two issues that arise when working on large programs that to not arise when working on small ones.

Pick a particular language feature or programming idiom and describe why that feature or idiom makes programming-in-the-large either easier or harder.

Data abstraction

How to write large programs?

An area of agreement and a great divide:

                 INFORMATION HIDING
             (especially: mutable data)
                      /       \
                     /         \
 modular reasoning  /           \  code reuse
                   /             \ 
internal access   /               \  interoperability 
to rep           /                 \  between reps
                /                   \
            MODULES               OBJECTS           
        ABSTRACT TYPES

Introduction to Module Systems

Key idea: Internal details hidden behind clean interface

        Implementation      Interface

      ------------------   -----
      |                |   |   |
      |     Module     |   | I |
      |                |   |   |
      ------------------   -----
                ^            ^
                |            |
   Nitty gritty -            - Public face

Key idea: Hierarchical building blocks

     ------        ------
     | C1 |        | C2 |
     ------        ------
        \            /
         \          /
     --------------------
     |        I3        |
     --------------------
     |        M3        |
     |  ------  ------  |
     |  | I1 |  | I2 |  |
     |  ------  ------  |
     |  |    |  |    |  |
     |  | M1 |  | M2 |  |
     --------------------

Terminology: a module is a client of the interfaces it depends on

Module system overview

Functions of a module system:

Hide representations, implementations, private names
“Firewall” separately compiled units
(promote independent compilation)
Possibly reuse units

Desirable features:

Modular (i.e., independent) static type checking
Separately compilable interfaces and implementations

Aside: C and C++ are cheap imitations

C doesn’t provide namespaces
C++ doesn’t provide modular type checking for templates

Interfaces

Collect declarations

Unlike definition, provides partial information about a name
(usually environment and type)

Things typically declared:

Variables or constants (values, mutable or immutable)
example: val MAX_INT : int
Types example: type nat
Procedures (if different from values)
example: val intOfNat : nat -> int
Exceptions
example: exception Negative

Key idea: a declared type can be abstract

Just like a primitive type
Cannot ask “how were you formed, and from what parts?”

Ways of thinking about interfaces

Specification or contract for a module
- The “type” of a module
- Includes contracts for all declared functions
Means of hiding information (“what secret does it protect?”)
A way to limit what we have to understand
- The unit of sharing and reuse
A contract between programmers
- Parties might be you and your future self
- Explainer of libraries
The best technology for structuring large systems.
- Estimated force multiplier x10

Example interface

signature NATURAL_NUMBER = sig
  type nat
  val intOfNat  : nat -> int
  val natOfInt  : int -> nat
  val natString : nat -> string

  exception Negative
  val addNats : nat * nat -> nat
  val subNats : nat * nat -> nat
  val mulNats : nat * nat -> nat
end

Design choice: How are interfaces constructed?

Interface “projected” from implementation:

No separate interface
Compiler extracts from implementation
(CLU, Java class, Haskell)
When code changes, must extract again
Easy to construct but few cognitive benefits

Interface written independently:

Distinct file, separately compiled
(Caml, Go, Java interface, Modula, Ada)
Implementations can change independently
Harder to construct but full cognitive benefits

Check your understanding: Interfaces

An interface can be thought of as the type of a module.

True
False

An interface typically includes the implementation of one or more functions.

True
False

An interface can declare a type but leave its definition abstract, forcing client code to treat that type abstractly, as if it were a primitive type.

True
False

Module Implementations

     ------        ------
     | C1 |        | C2 |
     ------        ------
        \            /
         \          /
     --------------------
     |        I3        |
     --------------------
     |        M3        |
     |  ------  ------  |
     |  | I1 |  | I2 |  |
     |  ------  ------  |
     |  |    |  |    |  |
     |  | M1 |  | M2 |  |
     --------------------

Hold all dynamically executed code
- expressions/statements
May include private names
May depend only on interfaces, or on interfaces and implementations both
- max cognitive benefits when all dependency is mediated by interfaces
Dependencies may be implicit or explicit (import, require, use)

Check your understanding: Modules and module systems

A module provides the implementation of one or more interfaces.

True
False

A module may have private information hidden from client code.

True
False

Standard ML Modules

The Perl of module languages?

Has all known features
Supports all known styles
Rejoice at the expressive power
Weep at
- the terminology,
- the redundancy,
- the confusing syntax

What we’ve been using so far is the core language

Modules are a separate language layered on top.

ML module terminology

Interface is a signature

Implementation is a structure

Generic module is a functor

A compile-time function over structures
The point: Reuse without violating abstraction

Structures and functors match signature

Analogy: Signatures are the ``types’’ of structures.

Signature basics

Signature says what’s in a structure

Specify types (w/kind), values (w/type), exceptions

Examples: Ordinary types

    type t        // abstract type, kind *
    eqtype t
    type t = ...  // 'manifest' type
    datatype t = ...

Examples: Type constructors work too

    type 'a t     // abstract, kind * => *   
    eqtype 'a t
    type 'a t = ...
    datatype 'a t = ...

Signature example: Ordering


signature ORDERED = sig
  type t
  val lt : t * t -> bool
  val eq : t * t -> bool
end

operation that quickly finds and removes a minimal element.

without knowing the details of the implementation.

Signature example: Priority queue

signature PQUEUE = sig
  type elem           (* <-- ABSTRACT type *)
  val compare_elem : elem * elem -> order

  type pqueue         (* <-- ABSTRACT type *)
  val empty  : pqueue
  val insert : elem * pqueue -> pqueue
  val isEmpty : pqueue -> bool
  exception Empty
  val deletemin : pqueue -> elem * pqueue
end

Check your understanding: The anatomy of a signature

When specifying a value in an ML signature, it is necessary to give a type for that value.

True
False

An ML signature can specify types, but not type constructors.

True
False

An ML signature cannot include exceptions.

True
False

ML modules extended example: Numbers

Signature example: Integers

signature INTEGER = sig
  eqtype int             (* <-- ABSTRACT type *)
  val ~   : int -> int
  val +   : int * int -> int
  val -   : int * int -> int
  val *   : int * int -> int
  val div : int * int -> int
  val mod : int * int -> int
  val >   : int * int -> bool
  val >=  : int * int -> bool
  val <   : int * int -> bool
  val <=  : int * int -> bool
  val compare : int * int -> order
  val toString   : int    -> string
  val fromString : string -> int option
end

Implementations of integers

A selection…

structure Int    :> INTEGER
structure Int31  :> INTEGER  (* optional *)
structure Int32  :> INTEGER  (* optional *)
structure Int64  :> INTEGER  (* optional *)
structure IntInf :> INTEGER  (* optional *)

Natural Numbers

signature NATURAL = sig
   type nat   (* abstract, NOT 'eqtype' *)
   exception Negative
   exception BadDivisor

   val ofInt   : int -> nat 
   val /+/     : nat * nat -> nat
   val /-/     : nat * nat -> nat
   val /*/     : nat * nat -> nat
   val sdiv    : nat * int -> 
                 { quotient : nat, remainder : int }
   val compare : nat * nat -> order
   val decimal : nat -> int list
end

Large Integers

signature BIGNUM = sig
   type bigint
   exception BadDivision

   val ofInt    : int -> bigint
   val negated  : bigint -> bigint
   val <+>      : bigint * bigint -> bigint
   val <->      : bigint * bigint -> bigint
   val <*>      : bigint * bigint -> bigint
   val sdiv : bigint * int -> 
              { quotient : bigint, remainder : int }
   val compare  : bigint * bigint -> order
   val toString : bigint -> string
end

Check your understanding: Number signatures

Consider the INTEGER signature:

signature INTEGER = sig
  eqtype int             (* <-- ABSTRACT type *)
  val ~   : int -> int
  val +   : int * int -> int
  val -   : int * int -> int
  val *   : int * int -> int
  val div : int * int -> int
  val mod : int * int -> int
  val >   : int * int -> bool
  val >=  : int * int -> bool
  val <   : int * int -> bool
  val <=  : int * int -> bool
  val compare : int * int -> order
  val toString   : int    -> string
  val fromString : string -> int option
end

In this signature, the representation of int is kept abstract except for the fact that values of type int may be compared for equality using the built-in equality function.

True
False

Consider the NATURAL signature:

signature NATURAL = sig
   type nat   (* abstract, NOT 'eqtype' *)
   exception Negative
   exception BadDivisor

   val ofInt   : int -> nat 
   val /+/     : nat * nat -> nat
   val /-/     : nat * nat -> nat
   val /*/     : nat * nat -> nat
   val sdiv    : nat * int -> 
                 { quotient : nat, remainder : int }
   val compare : nat * nat -> order
   val decimal : nat -> int list
end

Given this signature, if you do not already have a value of type nat, the only way to get a value one is to use the ofInt function.

True
False

Signatures collect

signature QUEUE = sig
  type 'a queue    (* another abstract type *)
  exception Empty

  val empty : 'a queue
  val put : 'a * 'a queue -> 'a queue
  val get : 'a queue -> 'a * 'a queue   (* raises Empty *)

  (* LAWS:  get(put(a, empty))     ==  (a, empty)
            ...
   *)
end

Structures collect

structure Queue :> QUEUE = struct   (* opaque seal *)
  type 'a queue = 'a list
  exception Empty

  val empty = []
  fun put (x,q) = q @ [x]
  fun get [] = raise Empty
    | get (x :: xs) = (x, xs)


  (* LAWS:  get(put(a, empty))     ==  (a, empty)
            ...
   *)
end

Dot notation to access components

fun single x = Queue.put (Queue.empty, x)
val _ = single : 'a -> 'a Queue.queue

Coding Interlude: Signature for a stack

Given the following SML structure, write a corresponding SML signature that hides the implementation as a list but allows polymorphic stacks to be created and elements to be pushed and popped.

Structure for a stack

structure Stack = struct
   type 'a stack = 'a list
   exception Empty
   val empty = []
   val push  = op ::
   fun pop []            = raise Empty
     | pop (top :: rest) = (top, rest)
end

Reflection: When does a structure match a signature?

If we can write structures and signatures independently, we must have some way to determine whether a given structure implements given signature.

Give examples of three different circumstances that would mean a given structure does not implement an interface. One such example would be that the interface requires the presence of a value MAX_INT but the structure fails to define such a value.

What interface with what implementation?

Maybe mixed together, extracted by compiler!

CLU, Haskell

Maybe matched by name:

Modula-2, Modula-3, Ada

Best: any interface with any implementation:

Java, Standard ML

But: not “any”—only some matches are OK

Signature Matching

Structure declaration is well-formed

 structure Queue :> QUEUE = QueueImpl

if principal signature of QueueImpl matches ascribed signature QUEUE

Every type in QUEUE is in QueueImpl
Every exception in QUEUE is in QueueImpl
Every value in QUEUE is in QueueImpl
(type could be more polymorphic)
Every substructure matches, too (none here)

Matching items must be compatible.

Signature Ascription

Ascription attaches signature to structure

Transparent Ascription: types are revealed
```
structure strid : sig_exp = struct_exp
```
This method leaks information (legacy)
(But it’s awfully convenient)
Opaque Ascription: types are hidden (“sealing”)
```
structure strid :> sig_exp = struct_exp
```
This method respects abstraction
(And when you need to expose, can be tiresome)

Slogan: “use the beak”

Transparent Ascription

Not recommended!

Example:

  structure IntLT : ORDERED = struct
     type t = int
     val le = (op <)
     val eq = (op =)
  end

Exposed: IntLT.t = int

Violates abstraction

Opaque Ascription

Recommended

Example:

 structure Queue :> QUEUE = struct
   type 'a queue = 'a list
   exception Empty

   val empty = []
   fun put (x, q) = q @ [x]
   fun get [] = raise Empty
    | get (x :: xs) = (x, xs)
 end

Not exposed: type ’a Queue.queue = ’a list

Respects abstraction

Check your understanding: Signature ascription

The principal signature of a structure is the signature that contains the maximal interface information for that structure.

True
False

With opaque signature ascription (Queue :> QUEUE), type information in the structure is hidden. Only information described in the ascribing signature is available to client code.

True
False

Abstract data types

Opaque ascription and access to internals

Outside module, no access to representation

Protects invariants
Allows software to evolve
Enforced by type system

Inside module, complete access to representation

Every function sees rep of every argument
Key distinction modules vs object

Abstract data types and number hierarchy

Natural numbers

signature NATURAL = sig
   type nat   (* abstract, NOT 'eqtype' *)
   ...
   val /+/     : nat * nat -> nat
   val /-/     : nat * nat -> nat
   val /*/     : nat * nat -> nat
end

Funs/+/,/-/,/*/ see both representations
Makes arithmetic relatively easy
But type nat works only with type nat
(no “mixed” arithmetic, unlike in OOP)

Check your understanding: Module access to representations

Consider the signature NATURAL:

signature NATURAL = sig
   type nat   (* abstract, NOT 'eqtype' *)
   ...
   val /+/     : nat * nat -> nat
   val /-/     : nat * nat -> nat
   val /*/     : nat * nat -> nat
end

Now answer the following question.

In any structure implementing this signature, the code for the /+/ operation will be able to access the representation of both arguments.

True
False

30 November 2020: Functors and Extended Case Study

There are PDF slides for 12/1/2020.

Retrospective: Structures and Signatures

Write one thing you learned about structures and signatures so far and one question you have. Your answers may be anonymously shared with your classmates.

Reflection: Parameterizing a structure by another structure

We’ve seen a couple of different kinds of functions. A lambda expression takes an expression-level value as an argument, while a type-lambda expression takes a type as an argument.
Suppose we introduced a new kind of function that took a structure as an argument and returned a structure as a result.

How do you think we might we “type” the argument and the result?
What do you think such functions might be used for?

Functors (aka generic modules aka parameterized structures)

A is a generic module

A new form of parametric polymorphism:

lambda and type-lambda in one mechanism
Introduction form is functor (definition form)

“Generics” found across language landscape
(wherever large systems are built)

Introducing functors

A functor abstracts over a structure

Formal parameters are declarations

Example:

functor MkSingle(structure Q:QUEUE) = 
   struct
     structure Queue = Q
     fun single x = Q.put (Q.empty, x)
   end

Functors combine familiar ideas:

Higher-order functions (value parameter Q.put)
type-lambda (type parameter Q.queue)

Using Functors

Actual parameters are definitions

structure QueueS  = MkSingle(structure Q = Queue)
structure EQueueS = MkSingle(structure Q = EQueue)

(EQueue might be a more efficient implementation)

functor MkSingle(structure Q:QUEUE) = 
   struct
     structure Queue = Q
     fun single x = Q.put (Q.empty, x)
   end

Functor applications are evaluated at compile time.

Check your understanding: Functors

A functor is a function over structures that combines elements of higher-order functions and type-lambda.

True
False

Functor applications are evaluated at runtime.

True
False

Check your understanding: where type

Consider the following code:

signature ORDER = sig
  type t
  val compare : t * t -> order
end

signature MAP = sig
  type key
  type 'a table
  val insert : key -> 'a -> 'a table -> 'a table
  ...
end

functor RBTree(structure Ord : ORDER) 
                  :> MAP where type key = Ord.t = 
   struct  ...   end

Now answer the following questions.

The ORDER signature defines an abstract type t.

True
False

If applying the RBTree functor to some concrete structure Foo passes the type checker, then Foo must not define a type t.

True
False

The MAP signature defines an abstract type key.

True
False

The structure that results from applying the functor RBTree to structure Foo will satisfy the MAP signature refined by the where type clause so that key type is defined to be equal to Foo’s definition of type t.

True
False

Extended example: Error-tracking Interpreter

Reusable Abstractions: Extended Example

Error-tracking interpreter for a toy language

Why this example?

Sketch composing large systems
Lots of interfaces using ML signatures
Some ambitious, very abstract abstractions—it’s toward the end of term, and you should see something ambitious.
Practice implementing functors

Error modules: Three common implementations

Collect all errors
Keep just the first error
Keep the most severe error

Classic ``accumulator’’ for errors

signature ERROR = sig
  type error   (* a single error *)
  type summary (* summary of what errors occurred *)

  val nothing : summary                  (* no errors *)
  val <+> : summary * summary -> summary (* combine *)

  val oneError : error -> summary

  (* laws:   nothing <+> s == s
             s <+> nothing == s
             s1 <+> (s2 <+> s3) == (s1 <+> s2) <+> s3    
                                        // associativity
   *)
end

First Error: Implementation

structure FirstError :> 
    ERROR where type error = string
            and type summary = string option =
  struct
    type error   = string
    type summary = string option

    val nothing = NONE
    fun <+> (NONE,   s) = s
      | <+> (SOME e, _) = SOME e

    val oneError = SOME
  end

All Errors: Implementation

structure AllErrors :>
    ERROR where type error   = string
            and type summary = string list =
  struct
    type error   = string
    type summary = error list

    val nothing = []
    val <+> = op @
    fun oneError e = [e]
  end

Check your understanding: Error signature and implementations

Consider the ERROR signature:

signature ERROR = sig
  type error   (* a single error *)
  type summary (* summary of what errors occurred *)

  val nothing : summary                  (* no errors *)
  val <+> : summary * summary -> summary (* combine *)

  val oneError : error -> summary

  (* laws:   nothing <+> s == s
             s <+> nothing == s
             s1 <+> (s2 <+> s3) == (s1 <+> s2) <+> s3    
                                        // associativity
   *)
end

Now answer the following questions.

There is only one sensible way to implement this signature.

True
False

In the signature, the type error represents one detected error.

True
False

In the signature, the combinator <+> concatenates a new error onto a summary of the errors detected so far.

True
False

Computations Abstraction

Ambitious! (will make your head hurt a bit)

Computations either:

return a value
produce an error

Errors must be threaded through everything :-(

Example: Simple arithmetic interpreter

(* Given: *)
datatype 'a comp = OK of 'a | ERR of AllErrors.summary

datatype exp = LIT  of int
             | PLUS of exp * exp
             | DIV  of exp * exp

(* Write an evaluation function that tracks errors. *)
val eval : exp -> int comp = ...

Reflection: Error tracking interpreter

Given the AllErrors implementation of the ERROR signature:

structure AllErrors :>
    ERROR where type error   = string
            and type summary = string list =
  struct
    type error   = string
    type summary = error list

    val nothing = []
    val <+> = op @
    fun oneError e = [e]
  end

and the following datatype declarations:

datatype 'a comp = OK of 'a | ERR of AllErrors.summary 

datatype exp = LIT  of int
             | PLUS of exp * exp
             | DIV  of exp * exp

Fill in evaluation code for PLUS case using the AllErrors implementation of ERROR signature:

fun eval (LIT n) = OK n
  | eval (PLUS (e1, e2)) = ...

Fill in evaluation code for DIV case using the AllErrors implementation of ERROR signature:

fun eval (LIT n) = OK n
  | ...
  | eval (DIV  (e1, e2)) - ...

Interpreter: LIT and PLUS cases

fun eval (LIT n) = OK n
  | eval (PLUS (e1, e2)) = 
     (case eval e1
        of OK v1 => 
          (case eval e2
             of OK  v2 => OK (v1 + v2)
              | ERR s2 => ERR s2)
       | ERR s1 =>
          (case eval e2
             of OK  _  => ERR s1
              | ERR s2 => ERR (AllErrors.<+> (s1, s2))))

Interpreter: DIV case

  | eval (DIV (e1, e2)) = 
     (case eval e1
        of OK v1 =>
          (case eval e2
             of OK   0 => ERR (AllErrors.oneError "Div 0")
              | OK  v2 => OK  (v1 div v2)
              | ERR s2 => ERR s2)
       | ERR s1 =>
           (case eval e2
              of OK  v2 => ERR s1
               | ERR s2 => ERR (AllErrors.<+> (s1, s2)))

That’s really painful!

We can extend the computation abstraction with sequencing operations to help.

Combining computations

signature COMPUTATION = sig
  type 'a comp    (* Computation! When run, results in
                     value of type 'a or error summary. *)

  (* A computation without errors always succeeds. *)
  val succeeds : 'a -> 'a comp

  (* Apply a pure function to a computation. *)
  val <$> : ('a -> 'b) * 'a comp -> 'b comp

  (* Application inside computations. *)
  val <*> : ('a -> 'b) comp * 'a comp -> 'b comp

  (* Computation followed by continuation. *)
  val >>= : 'a comp * ('a -> 'b comp) -> 'b comp
end

Example:

eval e1 + eval e2

(op +) (eval e1, eval e2)

curry (op +) (eval e1) (eval e2)

curry (op +) <$> eval e1 <*> eval e2

The first three versions are not well typed. Why?

The last version will thread errors through the compuation behind the scenes.

Note:

eval e1, eval e2 : int comp

curry (op +) : int -> (int -> int)

<$> : (int -> (int -> int)) * (int comp) -> (int -> int) comp

<*> : (int -> int) comp * int comp -> int comp

curry (op +) <$> eval e1 : (int -> int) comp

let pa = curry (op +) <$> eval e1

note by above,  pa : (int -> int) comp 

pa <*> eval e2  : int comp

Buckets of algebraic laws

succeeds a >>= k  == k a                  // identity

comp >>= succeeds == comp                 // identity

comp >>= (fn x => k x >>= h) == (comp >>= k) >>= h  
                                          // associativity

succeeds f <*> succeeds x == succeeds (f x)  // success

...

Environments can use computation abstraction, too!

signature COMPENV = sig
  type 'a env   (* environment mapping strings
                   to values of type 'a *)
  type 'a comp  (* computation of 'a or
                   an error summary *)

  val lookup : string * 'a env -> 'a comp
end

Check your understanding: Computation combinators

Consider the signature of the computation abstraction:

signature COMPUTATION = sig
  type 'a comp    (* Computation! When run, results in
                     value of type 'a or error summary. *)

  (* A computation without errors always succeeds. *)
  val succeeds : 'a -> 'a comp   

  (* Apply a pure function to a computation. *)
  val <$> : ('a -> 'b) * 'a comp -> 'b comp

  (* Application inside computations. *)
  val <*> : ('a -> 'b) comp * 'a comp -> 'b comp

  (* Computation followed by continuation. *)
  val >>= : 'a comp * ('a -> 'b comp) -> 'b comp
end

Now answer the following questions:

The type 'a comp represents a computation that either produces a value of type 'a or results in an error.

True
False

A value of type 'a comp created by calling the succeeds combinator will always return a value of type 'a.

True
False

The <*> combinator applies a pure function f with type 'a -> 'b to a value with type 'a comp.

True
False

Error-tracking interpreter

functor InterpFn(
  structure Error : ERROR
  structure Comp  : COMPUTATION
  structure Env   : COMPENV
  val zerodivide  : Error.error
  val error       : Error.error -> 'a Comp.comp
  sharing type Comp.comp = Env.comp) =
struct
  val (<*>, <$>, >>=) = (Comp.<*>, Comp.<$>, Comp.>>=)
  
  (* Definition of Interpreter *)

end

Error-tracking intepreter, continued

datatype exp = LIT of int
             | VAR of string
             | PLUS of exp * exp
             | DIV  of exp * exp
fun eval (e, rho) =
 let fun ev(LIT n) = Comp.succeeds n
       | ev(VAR x) = Env.lookup (x, rho)
       | ev(PLUS (e1, e2)) = curry op + <$> ev e1 <*> ev e2
       | ev(DIV (e1, e2))  = ev e1 >>= (fn n1 =>
                             ev e2 >>= (fn n2 =>
                             case n2
                               of 0 => error zerodivide
                                | _ => Comp.succeeds
                                              (n1 div n2)))
 in  ev e
 end

Computation abstraction is a ``monad’’

Supported by special syntax in Haskell:

eval :: Exp -> Hopefully Int
eval (LIT v)       = return v
eval (PLUS  e1 e2) = 
  do { v1 <- eval e1
     ; v2 <- eval e2
     ; return (v1 + v2) }
eval (DIV  e1 e2) =
  do { v1 <- eval e1
     ; v2 <- eval e2
     ; if v2 == 0 then Error "div 0"
       else return (v1 `div` v2) }

Check your understanding: Using the computation abstraction

Consider the following code for implementing an expression interpreter using the computation abstraction:

functor InterpFn(
  structure Error : ERROR
  structure Comp  : COMPUTATION
  structure Env   : COMPENV
  val zerodivide  : Error.error
  val error       : Error.error -> 'a Comp.comp
  sharing type Comp.comp = Env.comp) =
struct
  val (<*>, <$>, >>=) = (Comp.<*>, Comp.<$>, Comp.>>=)
  

datatype exp = LIT of int
             | VAR of string
             | PLUS of exp * exp
             | DIV  of exp * exp

fun eval (e, rho) =
 let fun ev(LIT n) = Comp.succeeds n
       | ev(VAR x) = Env.lookup (x, rho)
       | ev(PLUS (e1, e2)) = curry op + <$> ev e1 <*> ev e2
       | ev(DIV (e1, e2))  = ev e1 >>= (fn n1 =>
                             ev e2 >>= (fn n2 =>
                             case n2
                               of 0 => error zerodivide
                                | _ => Comp.succeeds
                                              (n1 div n2)))
 in  ev e
 end

end

Now answer the following questions:

The InterpFn functor takes as a parameter a value declaration that represents a divide by zero error. This value is used in the evaluation of the DIV operation.

True
False

The evaluation of the VAR case can never fail and so doesn’t need to be part of the computation abstraction.

True
False

The evaluation of the PLUS operation uses the <$> and <*> combinators to thread errors through the computation.

True
False

In the evaluation of DIV, n1 is bound to the result of evaluating e1 only in the case where e1 evaluates without error.
If e1 evaluates to an error, the rest of the computation is short-circuited and the error is returned directly.

True
False

ML Module summary

ML module summary

New syntactic category: declaration

Of type, value, exception, or module

Signature groups declarations: interface

Structure groups definitions: implementation

Functor (generic module) enables reuse:

Formal parameter: declarations
Actual parameter: definitions

Opaque ascription hides information

Enforces abstraction

2 December 2020: Drive-by tour of lambda calculus

There are PDF slides for 12/3/2020.

Retrospective: SML’s Module System

Write one thing you learned about ML’s module system so far and one question you have. Your answers may be anonymously shared with your classmates.

Reflection: Models of computation

Think back to our formal models of impcore and μscheme. The simpler a formal model is, while still capturing the essence of what is being modelled, the better, because it allows us to focus on what matters most.

In programming languages, we generally want the langauge to be Turing complete. Practically speaking, that means it has to be able to express conditionals, natural numbers, and general recursion.

Describe how you would go about designing a minimal language that supported these features.

Why study lambda calculus?

A metalanguage for describing other languages, known to all educated PL people

Church-Turing Thesis: Any computable operator can be encoded in lambda calculus
Theoretical underpinnings for most programming languages (all in this class).
Test bench for new language features

The world’s simplest reasonable programming language

Syntax: Only three forms!

M ::= x | \x.M | M M'

Everything is programming with functions

Everything is curried
Application associates to the left
Arguments are not evaluated before application

First example:

(\x.\y.x) M N --> (\y.M) N --> M

Crucial:

Argument N is never evaluated (could have an infinite loop)
Substitution is “capture-avoiding”

Programming in Lambda Calculus

Absolute minimum of code forms:

no set, while, begin
but also no if and no define!

Assembly language of programming languages

Systematic approach to constructed data:

Everything is continuation-passing style

Alert to the reading: Wikipedia is reasonably good on this topic

EXCEPT for the way they encode lists, which is bogus (violates abstraction)

Coding Booleans and `if` expressions

Q: Who remembers the boolean-formula solver?

Q: What classes of results could it produce?

Q: How were the results delivered?

Q: How shall we do Booleans?

A Boolean takes two continuations:

one for true, one for false

Laws:

true  s f = s
false s f = f

Code:

true  = \x.\y.x
false = \x.\y.y

Coding the if expression, laws:

if true  then T else E = T
if false then T else E = E

Therefore, code is:

if B then T else E = 
if-then-else B T E = \B.\T.\E.B T E

Code:

true  = \x.\y.x
false = \x.\y.y
if    = \b.\t.\e.b t e

Coding Interlude: Coding `not` in lambda calculus

Given that we can code booleans in lambda calculus as

true  = \x.\y.x
false = \x.\y.y
if    = \b.\t.\e.b t e

your job is to write the not function.

As a first step, write the algebraic laws for not. What are the forms of the input data? For each form, what should the output be?
Now write the code for not

You can try out your code in the lamstep interpreter on the homework cluster.

Coding data structures

Coding pairs

To pair two values, how many alternatives are there?
(A: one, pair constructor)
How many continuations?
(A: two, one for each element of the pair)
What information does each expect?
(A: the two elements of the pair)
What are the algebraic laws?
```
fst (pair x y) = x
snd (pair x y) = y
```

Code pair, fst, snd
pseudo-code:

pair x y k = k x y
fst p = p (\x.\y.x)
snd p = p (\x.\y.y)

lambda calculus:

pair = \x.\y.\k.k x y
fst  = \p.p (\x.\y.x)
snd  = \p.p (\x.\y.y)

Code pair, fst, snd

pair x y k = k x y
fst p = p (\x.\y.x)
snd p = p (\x.\y.y)

pair = \x.\y.\k.k x y
fst  = \p.p (\x.\y.x)
snd  = \p.p (\x.\y.y)

Coding lists

List laws

null? nil         = true
null? (cons x xs) = false
car (cons x xs)   = x
cdr (cons x xs)   = xs

How many ways can lists be created? (A: two: nil and cons)
How many continuations?
(A: two, one for each, kn and kc)
What does each continuation expect?
- nil: nothing
- cons: car and cdr

For each creator, what are its continuation laws?

cons y ys kn kc = kc y ys
nil       kn kc = kn

car xs = xs error (\y.\ys.y)
cdr xs = xs error (\y.\ys.ys)

null? xs = xs true (\y.\ys.false)

What are the definitions?

cons = \y.\ys.\kn.\kc.kc y ys
nil  = \kn.\kc.kn

car = \xs.xs error (\y.\ys.y)
cdr = \xs.xs error (\y.\ys.ys)

null? = \xs.xs true (\y.\ys.false)

What do those second continuations look like? (This is the source of Wikipedia’s terrible hack)

What are the definitions?

cons = \y.\ys.\kn.\kc.kc y ys
nil  = \kn.\kc.kn

car = \xs.xs error (\y.\ys.y)
cdr = \xs.xs error (\y.\ys.ys)

null? = \xs.xs true (\y.\ys.false)

Check your understanding: Coding data structures in lambda calculus

The encoding of a pair constructor in lambda calculus is as a function that takes the two elements x and y of the pair and a continuation k that says what to do with those elements.

True
False

The function \p.p (\x.y.x) encodes the snd projection function on pairs in the lambda calculus.

True
False

Lists are coded in lambda calculus as functions that take two continuations: one for what to do if the list is nil and one for what to do if the list is a cons cell. The nil continuation takes no arguments, while the cons case takes one argument: what to do for the head of the list.

True
False

The function \xs.xs true (\y.\ys.false) encodes the null? function on lists in the lambda calculus.

True
False

Coding numbers: Church Numerals

Church numerals

Key idea: Encode a number as the number of times a function f is applied.

The number n: Apply f to x, n times

 \f.\x.fold f x n

Church Numerals in lambda calculus

zero  = \f.\x.x;
succ  = \n.\f.\x.f (n f x);
plus  = \n.\m.n succ m;
times = \n.\m.n (plus m) zero;
 ...
-> four;
\f.\x.f (f (f (f x)))
-> three;
\f.\x.f (f (f x))
-> times four three;
\f.\x.f (f (f (f (f (f (f (f (f (f (f (f x)))))))))))

Check your understanding: Church numerals

The Church numeral encoding of the number 3 is \f.\x.f(f(f(f x)))

True
False

The function \n.\f.\x.f(n f x) is the successor function on Church numerals

True
False

Taking stock:

bools
pairs
lists
numbers

Q: What’s missing from this picture?

A: Recursive functions.

Astonishing fact: We don’t need letrec or val-rec.

We can encode recursion in the lambda calculus!
Recursion + numbers makes the lambda calculus Turing complete.

Reflection: Encoding recursion

What value of
f
solves this equation:

f = λn.if n = 0 then 1 else n × f(n − 1)?

Coding recursion

Factorial in lambda calculus

Wish for:

fact = \n.(zero? n) 1 (times n (fact (pred n)));

But: on right-hand side, fact is not defined.

Successive approximations

Function bot always goes into an infinite loop.

Consider how these functions relate to factorial:

fact0 = \n.(zero? n) 1 (times n (bot   (pred n)));


fact1 = \n.(zero? n) 1 (times n (fact0 (pred n)));


fact2 = \n.(zero? n) 1 (times n (fact1 (pred n)));

General pattern:

fact-i = \n.(zero? n) 1 (times n (fact-i-1   (pred n)));

cessive approximations (construct

fact-i = \n.(zero? n) 1 (times n (fact-i-1   (pred n)));


g = \f.  \n.(zero? n) 1 (times n (f          (pred n)));


fact0 = g bot;


fact1 = g fact0;      // = g (g bot)


fact2 = g fact1;      // = g (g (g bot))


fact3 = g fact2;      // = g (g (g (g bot)))


   ...

Fixed point

Suppose f = g f.

f is a fixed point of g.

Claim: f n is n factorial!

Proof: by induction on n.

Left as an exercise to the viewer :-)

Fixed-point combinator

What if we had a combinator fix such that

fix g = g (fix g)

Term fix g is a fixed point of g, and so by previous slide,

fix g n is n factorial!

fix g = g (fix g)
      = g (g (fix g))
      = g (g (g (fix g)))
      = ...

Expand as much as you need to.

Check your understanding: Encoding recursion

Consider the recursively-defined length function:

length = \xs.null? xs 0 (+ 1 (length (cdr xs)))

If we were to take the fixpoint of the following functions, which of them would produce the length function?

lg1 = \lf.\xs.null? xs 1 (+ 1 (lf (cdr xs)))
lg2 = \lf.\xs.null? xs 0 (+ 0 (lf (cdr xs)))
lg3 = \lf.\xs.null? xs 0 (+ 1 (lf (cdr xs)))
lg4 = \lf.\xs.null? xs 0 (+ 0 (lf (car xs)))

Using fixed points

Enrichment Reflection.

For extra credit, for each of the following recursive equations, say whether there is a solution, (ie, a recursive function that makes the equation true). If there is a solution, say whether it is unique and what it is.

f1 = \n.\m.(eq? n m) n 
              (plus n (f1 (succ n) m));

f2 = \n.f2 (isZero? n 100 (pred n));

f3 = \xs.xs nil (\z.\zs.cons 0 (f3 zs));

f4 = \xs.\ys.f4 ys xs;

Summary

Lambda calculus is Turing Complete
- Conditionals (via continuations)
- Numbers (via Church encoding)
- Recursion (via Y combinator)
Essence of most programming languages
- All that we have studied
Evaluation proceeds by substituting arguments for formal variables (called beta reduction)

7 December 2020: Producing Software with Integrity: Motivations, Capabilities, and Impediments

There were no PDF slides on 12/8/2020.

Retrospective: Lambda calculus

Write one thing you learned about lambda calculus so far and one question you have. Your answers may be anonymously shared with your classmates.

Reflection: Security-critical software

Despite high-profile data breaches stemming from flaws in software and frequent demonstrations of the insecurity of security and safety critical systems at Black Hat, high-quality security-critical software seems to remain elusive. Why do you think that might be?

Misused buffers are a major security problem

Graph showing number of buffer overflow
- CVE: Common Vulnerability and Exposures Database (>67,000)
- CWE: Common Weaknesses Enumeration (CWE) (120 categories)
- CVSS: Common Vulnerability Scoring System (CVSS)
  - score 1-10, above a 7 is ‘severe’
  - severe vulnerability has these characteristics:
    - network based attack
    - complexity is low
    - no privilege is required
    - no user interaction is required
    - no workaround exists
  - buffer overflow most common: > 12,000 identified CVEs, more often than not secure
Examples:
- Heartbleed: missing bounds check, exfiltration
- Microsoft Security Bulletin MS15-078: buffer underflow, remote code execution in all supported windows platforms

Check your understanding: Software bugs lead to exploits

Given all the vulnerabilities listed in MITRE’s Common Vulnerability and Expostures Database, issues related to misusing buffers are a relatively minor problem.

True
False

Buffer overflow/underflow attacks such as Heartbleed and the issue reported in MS security bulletin MS15-078 can result in exfiltrating information from a system without leaving any trace in log files and/or allow a remote attacker to take over control of a computer.

True
False

Consequences of poor cybersecurity

Consequences of poor cybersecurity: Ransomware

Consequences of poor cybersecurity: Physical damage

Consequences of poor cybersecurity: Changing the nature of warfare

Check your understanding: Consequences of poor cybersecurity

Ransomware is a way hackers can monetize malware. It typical works when hackers break into a system and encypt files. They then demand payment to decrypt the files. Whether or not victims pay the ransom, recovery costs can be in the millions of US dollars.

True
False

When breaking into automobiles and other cyber-physical systems, hackers have to be physically close to the system they are attacking.

True
False

Although disguised as anonymous ransomware, subsequent analysis has revealed that NotPetya constituted a purely destructive attack launched by one sovereign nation (Russia) against another (Ukraine). It caused billions of U.S. dollars of damage, both within the Ukraine and worldwide.

True
False

There is no known evidence of successful attacks on critical infrastructure such as power distribution systems.

True
False

Reflection: Techniques to build better software

What approaches do you think might be effective for building higher quality software?

We know how to build higher-quality software

Check your understanding: Proof of concept for building higher-quality software

No C code was involved in the high-assurance code for the SMACCMcopter or Boeing’s Unmanned Little Bird.

True
False

At the end of DARPA’s HACMS program, the red team was unable to disrupt the flight operations or communication channels of Boeing’s Unmanned Little Bird despite being given root access to one of the partitions on the helicopter.

True
False

Reflection: Impediments to applying known technology

Assuming we do in fact know how to build higher-quality software, why do you think we might be choosing not to use such capabilities?

The economics of cybersecurity: Misaligned incentives and market failures

The economics of cybersecurity: Misaligned incentives

Check your understanding: Misaligned incentives

In retail banking in the 1990s, the UK had lower fraud rates than the US because consumers were held responsible.

True
False

Critical infrastructure operators are making uninformed decisions when they connect their cyber-physical systems to the internet given the security risks that such linkages cause.

True
False

Market Failure: Information asymmetries

Check your understanding: Information asymmetries

Information asymmetry is a market failure that plagues the software security market: customers can’t evaluate the security of software and so aren’t willing to pay a premium for software with better security. As a result, vendors are reluctant to invest in better security because they won’t be able to recoup the cost in a higher price for their product.

True
False

Another example of information assymetry in the software security industry is that lack of information about cybersecurity incidents. Lacking such data means we cannot properly assess the size of the problem, making it difficult to gauge whether we are properly allocating defensive resources.

True
False

Where do we go from here? Possible policy interventions

Possible policy interventions: Information disclosure

Possible policy interventions: Cyber insurance

Possible policy interventions: Software product liability

Check your understanding: Possible policy interventions

Mandatory disclosure of data breaches has increased the attention paid to data breaches, making cyberinsurance for data breaches the most mature segment of the cyberinsurance market.

True
False

Only a handful of insurance companies offer cyber insurance because it is too hard to measure the risk.

True
False

The software industry has successfully lobbied against software product liability on the grounds that it would stifle innovation.

True
False

Producing software with integrity

Software flaws enabled by obsolete language design enable remote code execution and information exfiltration.
Poor cybersecurity has significant negative consequences at many levels: financial losses, damage in the physical world, and threats to national security.
We know how to build software to much higher quality standards using a variety of techniques. HACMS highlighted some of these, but there are many others.
Misaligned incentives and market failures such as information asymmetry explain why we are not building better software even though we know how.
Policy interventions are necessary to align incentives and fix market failures. Possible policy interventions include requiring information disclosure, mandating cyber insurance, or introducing product liability for software.

A Prediction

Cybersecurity is changing the nature of international conflict.
We are going to take it a lot more seriously
Hopefully before something goes catastrophically wrong.

9 December 2020: Comp 105 Conclusion

There are PDF slides for 12/10/2020.

Retrospective: Producing Software with Integrity

Write one thing you learned about producing software with integrity so far and one question you have. Your answers may be anonymously shared with your classmates.

Comp105 in Review

Review the highlights of what we’ve learned this semester:
- topics
- languages
- projects
Worth reviewing because it’s easy to lose track
- you’ve learned a lot!
Briefly discuss where you might go from here
- languages
- courses

What have you learned?

Principled design process for writing code

Algebraic laws guided by the form of data
9-step design process
Recursive functions whose structure follows from laws
- Explain why function terminates
Inductive proofs that follow same structure
- Explain why induction is well-founded

Understanding a language: Six key questions

What is the abstract syntax?
What are the values?
What are the environments?
How does evaluation happen?
What is the initial basis?
What are the types?

Operational semantics

Precisely describe meaning of programs
- What value does a program evaluate to?
- What side effects are caused in the process?
Written using inference rules
Judgment forms capture all relevant information
Environments track information about variables
Stores map locations to values

First-class functions

What they are
How to use them effectively
Lambdas create anonymous functions
Closures
- run-time representation of functions;
- capture environment at closure-definition time
Continuations capture ``the rest of the computation’’

Cost Models and Optimizations

How many cons cells?
How many activation records?
Method of accumulating parameters
Tail calls

Datatypes
Pattern matching
Exceptions

Type systems

Type Systems: Big Idea

Static vs. Dynamic Typing

Expressiveness (+ Dynamic)
Don’t have to worry about types (+ Dynamic)
Dependent on input (- Dynamic)
Runtime overhead (- Dynamic)
Serve as documentation (+ Static)
Catch errors at compile time (+ Static)
Used in optimization (+ Static)

Type Systems: Big Idea

Undecideability forces tradeoff:
- Dynamic or
- Approximate or
- Non-terminating
Example: array bounds checking
- Occasional negative consequences: e.g., Heartbleed

Type Systems: Mechanics

Monomorphic and Polymorphic Types
Types, Type Constructors, Quantified Types (∀α.τ)
Kinds (κ) classify types:
- well-formed,
- types (*),
- type constructors: κ ⇒ κ
Type Environments: type identifiers → kinds
Typing Rules
- Introduction and Elimination forms
Type Checking

Type inference

Hindley-Milner Type Inference: Big Idea

Inferred vs Declared Types
- Advantages of Inference: write fewer types, infer more general types.
- Advantages of Declarations: better documentation, more general type system.
Canonical example of static analysis:
- Proving properties of programs based only on text of program.
- Useful for compilers and security analysis.

Hindley-Milner Type Inference: Mechanics

Use fresh type variables for unknown types
Generate constraints that collect clues
Solve constraints just before introducing quantifiers
Compromises to preserve decideability:
- Only generalize lets and top-level declarations
- Polymorphic functions aren’t first-class

Object-Oriented Programming

Object-Oriented Programming: Big Ideas

“Programming-in-the-medium”
Advantages and Disadvantages
- Enables code reuse
- Easy to add new kinds of objects
- Hard to add new operations
- Algorithms smeared across many classes
- Hard to know what code is executing
Good match for GUI programming
Smalltalk mantra: Everything is an Object
- Can redefine basic operations

Object-Oriented Programming: Mechanics

Classes and objects
Computation via sending messages
- special names: self and super
Blocks to code anonymous functions & continuations
Inheritance for implementation reuse
Subtyping (“duck typing”) for client code reuse
Subtyping is not Inheritance
Idioms for information sharing
- by convention private methods for like objects
- Double-dispatch

Module Systems

Module Systems a la SML: Big Ideas

“Programming-in-the-large”
Separate implementation from interface
Enforced modularity
- Swap implementations w/o breaking clients

Module Systems a la SML: Mechanics

Signatures describe interfaces
- types, values, exceptions, substructures
- include to extend
Structures provide implementations
Signature ascription hides structure contents (Heap :> HEAP)
Functors
- Functions over structures
- Executed at compile time

Lambda Calculus

Lambda Calculus: Big Ideas

Three forms:
e : : = x | λx.e | e₁e₂
Church-Turing Thesis:
- All computable functions expressable in lambda calculus
- Data types: booleans, pairs, lists, naturals
- Recursive functions via fixed points
Y combinator calculates fixed points:
- Y = λf.(λx.f(x x))(λx.f(x x))

Producing software with integrity

Software flaws have significant costs
- financial, physical, national-security
We know how to build better software
We aren’t building better software
- misaligned incentives, market failures
Policy interventions might change that
- information disclosure, cyber insurance, product liability

Programming experience

Programming Experience

Recursion and higher-order functions are now second-nature to you.
- You’ll miss pattern matching and algebraic data types in any language you use that doesn’t have them!
C for impcore (imperative language)
Scheme (dynamically typed functional language)
ML (statically typed functional language)
uSmalltalk (dynamically typed OO language)

Built substantial pieces of code

SAT solver using continuations
Type checker (ML pattern matching!)
Type inference system (using constraints, reading typing rules)
BigNums (Power of OO abstractions; resulting challenges)

Where might you go from here?

Haskell

At the research frontier: Still evolving.
Lazy:
- Expressions only evaluated when needed.
- Conflict with side-effects.
- Solution: Monads (computation abstraction)
Type Classes:
- Ad hoc polymorphism (aka, overloading)
- ML: Hard-wire certain operations (+, *)
- Haskell: User programmable.

New languages

Swift (Apple)
- type inference, modules, higher-order functions, pattern matching, memory management
Go (Google)
- advanced concurrency, modules (aka packages) higher-order functions, garbage collection
Rust (Mozilla)
- advanced type system, modules, higher-order functions, support for systems programming

Additional Courses

Compilers
Software engineering
Special Topics:
- Programming language design
- Static analysis
- Virtual machines

Congratulations!

You have learned an amazing amount.
You have really impressed me.
I’m confident you will be able to use your newly developed skills in many contexts.
- When that happens, tell us about it!

9 September 2020: Introduction to Comp 105

Introduction

Reflection: Are all programming languages created equal?

Why we are here

What is a programming language?

Check your understanding: The big picture

Check your understanding: Syntax vs. semantics

Reflection: Inductive definitions for numerals

Value structure for numerals: Induction

Example: Numerals

Check your understanding: Representations of numerals

Reflection: Natural numbers

Value structure for natural numbers: induction again

Check your understanding: Representations of numerals

Reflection: Defining functions over natural numbers

Writing code with recursion and induction

Check your understanding: Design process

The all-fours? function

Coding Interlude: all-fours?

Reflection: Various syntactic forms

Syntax in programming languages

14 September 2020: Abstract syntax and operational semantics

Retrospective: Introduction to the study of programming languages, induction and recursion, and Impcore.

Reflection: Exploring composition in programming languages

Programming-language semantics

What problem are we trying to solve?

Why bother with precise semantics?

Behavior decomposes

Check your understanding: Principle of composition

ASTs

Check your understanding: AST Representations

Key question: names

Reflection: What’s in a name?

From ASTs to behaviors

Check your understanding: Operational semantics judgments

16 September 2020: Judgments, Rules, and Proofs

Reflection: Modeling computation in math

Rules and metatheory

Reflection: Operational semantics judgments

Check your understanding: Semantics of variables

Reflection: Mapping semantics to code

Check your understanding: C implementation of environment operations

Reflection: Using semantics to understand how programs are evaluated

Good and bad judgments

Reflection: Good and bad judgments

Proofs: Putting rules together

From rules to proofs

Building a derivation

Building derivations

Check your understanding: Derivation trees

21 September 2020: Derivations and metatheory

Reflection: Metatheory vs theory

Proofs about all derivations: metatheory

Reflection: Writing metatheoretic properties

Check your understanding: Proving metatheoretic properties

Summary and Next Steps

23 September 2020: Scheme I: Recursive Programming with Lists and S-Expressions

Retrospective: Operational Semantics

Reflection: From inference rules to recursive functions

For a new language, five powerful questions

Introduction to Scheme

Why Scheme?

Scheme Values

Lists: A subset of S-Expressions.

Check your understanding: S-Expressions

Reflection: Manipulating values

Understanding values through their operations

Check your understanding: Deconstructing lists

Why are lists useful?

Programming with lists: The member? function

Coding Interlude: member?

Programming with lists: The sum-16? function

Coding Interlude: sum-16?

Reflection: Quote syntax

Check your understanding: Quoted literals

Reflection: New syntactic forms

μScheme’s new syntax

28 September 2020: First-class and higher-order functions

Retrospective: Scheme I: Recursive programming with lists

Reflection: Where is there shared structure?

The `all-fours?` function

Coding Interlude: `all-fours?`

Programming with lists: The `member?` function

Coding Interlude: `member?`

Programming with lists: The `sum-16?` function

Coding Interlude: `sum-16?`

Implementing `any-two?`

Check your understanding: The `exists?` function

Introduction to `lambda`

μScheme’s semantics: Overview and `let`

Check your understanding: The semantics of `let`

Semantics of `lambda`

List search: `exists?`

Coding Interlude: `exists?`

Coding Interlude: `filter`

Reflection: Lifting functions to lists: `map`

The universal list function: `foldr`

Coding Interlude: `compose`

Coding Interlude: `witness-cps`