Implementing Toy ML, Part I

In this part you will modify the Scheme intrepreter written in ML to make it into a pure functional language. In the following assignment, you will add static ML-style type inference.

You will modify the Scheme interpreter in manageable steps:

Remove imperative features from the Scheme interpreter, including both the implementation and the language implemented.
Remove [[let]]. This is not really necessary, but it simplifies things.
Eliminate ``control ops'' by using abstract syntax instead.
Add suitable new primitives.

You should save your interpreter after each step in case you mess up one of the later stages and have to start over. You can get partial credit for earlier steps if for some reason you don't complete later ones.

You can work with the noweb source or with the generated ML.

Incidentally, if you call your interpreter ml.sml, you can build a standalone version in a.out by running mosmlc ml.sml.

A. Pure applicative programming. Modify scheme.ml to remove almost all of the imperative features from both the language and its implementation (we will retain print). This means:

Delete set, while, and begin from the language. At the top level, write val, which should use the same syntax as set did and should add a binding to the top-level environment. val should be a ``special form'' (processed directly in the interpreter, like define in the Scheme interpreter), not a new primop. Hint: to debug in this new language, try using the following idiom:
(val print-then (lambda (x y) y)) ... (print-then (print 'about-to-eval-e) e)
Remove all uses of ref, !, and := from the implementation. You may retain imperative constructs like print and sequence (;) since they are good for issuing error messages.

You'll have to make extensive changes to the code for environments and read-eval-print, and modest changes elsewhere. Be sure to clean up the environment code; it should become much simpler, since we'll essentially be able to use a ``map'' as an environment. The loop code will become a bit more complicated, as you'll have to pass the top-level environment around as a loop, and you'll have to return a new top-level environment when you process a val declaration. (It isn't enough to fix the normal-case code in process; you also have to arrange for the exception handlers have to return a ``new'' top-level environment, even though the environment hasn't actually changed. That's how you implement your error recovery.)

The simplest possible implementation would make it impossible to write a recursive function. To permit recursion, we'll use a trick reminiscent of chapter 1. That is, we'll pass eval both a local environment resulting from any nested functions, and also a top-level environment for globals. You'll have to change the code in eval for VARexp to look up first in the local, then the global environment. Exception handlers make this easy.

To use this trick, you'll also have to change the definition of primop to accept two environments instead of one. That will mean changing the definition of strict as well as the non-strict primitives. It won't be too bad because you'll be deleting primitives.

Finally, you'll have to change the initialization of environments. foldr makes it very easy.

You are not required to implement use or define, but both are useful as special forms. If you implement a define special form you can test your interpreter on the functional topological sort from the Lisp homework.

Hint: Some functions that previously had side effects now have to return a fresh environment. They probably include loop and process as well as the anonymous function used to initialize the environment.

Your solution to problem 1 should be a bit shorter than scheme.ml. I had to add or change 84 lines, but I also got to delete lots of lines.

B. Let. Remove [[let]] from the language. This will make type inference simpler.

C. Abstract Syntax. Starting with your solution to problem 2, eliminate the ``control ops'' by translating them into abstract syntax. This means if and quote will be treated like lambda by being turned into something special during parse. Also, change the definition of primop so that all primitive operations are strict. Eliminate unnecessary functions like strict. Hints:

You'll have to rearrange some definitions. In particular, because abstract syntax can now include (quoted) S-expressions, and S-expressions include closures, which include lambda-terms, which include abstract syntax, you'll have to make sx and exp mutually recursive, and you'll want to use lambda as a withtype. In the process, you should change the definition of VALexp to include any sx. You'll also want to fiddle with quote---try making it operate directly on ipt instead of on exp. (Otherwise you'll find yourself doing painful and unnecessary work.)
The following [[datatype]] definition may be useful:
exp = VALexp of sx | VARexp of name | IFexp of (* you fill in this part *) | APexp of exp list | LAMexp of lambda

You'll probably want to write functions parse_if and parse_quote to play a role like that of parse_lambda. You can steal much of what you need from ifop and quoteop, which you'll be deleting. The parts you don't use in parsing should show up in eval. .

For this problem you should have to add, change, or move about 50 lines from your solution to problem 1.

Mutation (extra credit). Add mutation back into the interpreter by adding new primitives ref, !, and := with the same meanings as in ML. You will have to add a case to the definition of type sx, but you should not have to touch environments or the evaluator. (You might also wish to add begin so you can write imperative programs more easily.) Write a paragraph or two comparing and contrasting these two ways to have imperative features.

Don't throw away your old interpreter! You'll need it for later parts. Type inference in the presence of mutation is very difficult, so you won't want to include it.

D. Primitives. Once we add type ingerence, list elements will all have to have the same type. We need another way of making heterogeneous data structures (i.e., structures containing values of different types). We can continue to use lists as the representation, but we'll provide [[pair]], [[fst]], and [[snd]] as constructor and observers. We could use cons cells to represent pairs, making our implementations exactly the same as for [[cons]], [[car]], and [[cdr]], but we'll see it's better to use two-element lists: <<*>>= fun sxnth 1 [LISTsx(car, cdr)] = car | sxnth n [LISTsx(car, cdr)] = sxnth (n-1) [cdr] | sxnth _ _ = raise Type "Impossible projection -- should never have typechecked" <>= ("pair", inject_list) :: ("fst", sxnth 1):: ("snd", sxnth 2):: @ Add these primitives to the interpreter.

Add primitive values [[t]] and [[f]] for use as booleans. Keep the same representation.