CS152 Smalltalk Homework

CS152 Homework: Smalltalk

Due Tuesday, December 7, at 12:10 AM (aka Monday night at midnight)

This assignment is short because of the short week. It will count 40 points, not the usual 50.
Do problem 4 and problem C.

For this assignment, you will need my replacement for Kamin's Smalltalk interpreter, which is in ~cs152/bin/smalltalk. Source for the interpreter is online.
You will also find the code from Kamin online.

4. Collection classes. [15 points]
The purpose of this problem is to introduce you to the rich class hierarchy at the heart of Smalltalk.
Do exercise 4 on page 345(?) of Kamin, that is, add Bag to the collection hierarchy. Note especially the following advice:

Try to find a place in the Collection hierarchy to insert Bag so as to write as little new code as possible; do not change the existing hierarchy.

Hints: remember the characteristic-function representation of sets, and also the digit function from the CLU homework. Think about analogs for this assignment.

My solution to this problem requires 32 lines of code.

C. Caching methods. [25 points]
Smalltalk's ``pure object'' model might seem very inefficient. The purpose of this problem is to help develop your intuition about how Smalltalk might be implemented efficiently.

For Smalltalk to be efficient, method search must be efficient. Since all computation --- even integer arithmetic --- is done by message sending, reducing search overhead is critical. A surprisingly effective way to reduce this overhead is in-line caching of method addresses. The idea works like this: With each message send, associate two words of memory, one giving the class of the last object to receive that message, the other the address of the method that was invoked the last time the message was sent. Whenever the message is sent, this one-element cache is consulted; if the class of the current receiver is the same as the class of the last receiver (as stored in the cache), then the method stored in the cache is the one to be invoked, and the method search can be side-stepped. If the class of the current receiver is different, then undertake the usual method search, and update the cache. This technique will save time if there is a high probability that the receiver of a message will be of the same class as the most recent receiver of that message. Some researchers have placed the effectiveness of this cache at 95%.

Instrument the interpreter with such caches and measure the hit ratio for a variety of programs.

Hints:

You will need to change the representation of ASTs to include the cache in the application, as follows:
```
and ast = ...
        | APP    of name * ast list * method_cache
   ...
withtype value = class * rep
and method_cache = (class * function option) option ref
```
Note that the ref cell contains an option because the cache is initially empty, and the function in the class is optional because the method might actually be a call to a global function.
(In a real Smalltalk implementation, everything is a method --- there are no functions --- and the cache would be simpler.)
You have to be able to compare classes for equality, but class is not an equality type. You can't simply compare class names, because if you redefine a class, your code will break. You should therefore add a unique identifier to each class, as follows:
```
datatype class 
  = CLASS of
      { name    : name                 (* name of the class *)
      , super   : class option         (* superclass, if any *)
      , rep     : name list            (* instance variables *)
      , methods : function env         (* both exported and private *)
      , uid     : int                  (* unique identifier *)
      }
```
The following code will be useful for creating unique identifiers:
```
local val n = ref 0 in fun classUid () = !n before n := !n + 1 end
```
In a real Smalltalk implementation, you would simply use pointer comparison to compare classes, but ML doesn't have pointer operations.

We recommend that you change the evaluator as follows:

  | ev(APP (f, args, cache))  = 
      (case map ev args
         of [] => applyGlobalFun(f, [])
          | args as ((h as (class, v)) :: t) =>
              (case cachedMethodSearch(f, class, cache)
                 of SOME m => applyMethod(m, h, t)
                  | NONE => applyGlobalFun(f, args)))

You should write function cachedMethodSearch to

Search the cache
Update the cache if needed
Update a global count of cache hits and cache misses

Note that you could simplify the representation of the method cache, and also your code for cachedMethodSearch, by defining an internal class that is not the class of any object, and using that class to initialize each method cache.
Finally, note that a significant fraction of your grade will depend on your report of your measurements. You need not write any new code, but use the code and examples from Kamin to learn as much as you can. In addition to the code in the Smalltalk chapter, you should be able to use code from Chapter 1 without change. You may also be able to use some list codes from Chapter 2, perhaps with minor modifications.

To solve this problem, I had to add or change no more than 40 lines of code, some of which are shown above.

What to submit

For this assignment, please submit the files README, 4.small, and smallC.sml.