* Programming Languages for Reusable Software Components
Matthew Flatt, Ph.D. Thesis. June 1999.
http://www.cs.rice.edu/CS/PLT/Publications/thesis-flatt.ps.gz

Reuse means a _black-box reuse_: reuse of a component without
inspecting or modifying its source code. This means components can be
developed and compiled separately, distributed and combined in their
_binary_ form. A component's vendor retains the freedom to
extend/improve/re-implement the component; the user of the component
will be unaffected as long as the old interfaces are preserved.

Examples: C++ templates cannot be compiled separately; closed functor
modules in ML can be compiled separately.

The thesis emphasizes the principle of external connections: A
language should separate component definitions from component
connections.

Example: why a module system via 'packages' (like in Modula-3, Ada 95,
Haskell and Java) is deficient (pp. 4-5 of the thesis)

	For example, the definition of a dictionary package in Java
might include the following: 
      import com.supersoft.splaytree.*; This
import specification hard­wires dictionary to the splay tree
implementation from SuperSoft. A programmer using dictionary might
want instead to use a compatible splay tree implementation from
UltraSoft. Even if SuperSoft and UltraSoft export the same classes and
methods for splay trees, the programmer must modify the definition of
dictionary to use UltraSoft's implementation, replacing supersoft with
ultrasoft [The programmer must modify his _source code_ to use another
splaytree package with a compatible interface!] A Java programmer
might hack around the problem by defining a class loader to remap
com.supersoft.splaytree to com.ultrasoft.splaytree. This
name-remapping strategy, however, fails to scale to the general
case. Suppose that the programmer wants to use both dictionary and
thesaurus -- which also imports com.supersoft.splaytree -- and the
programmer wants to preserve the SuperSoft import for thesaurus while
switching dictionary's import to UltraSoft. Because a class loader
cannot map a single package path to multiple packages, the programmer
is forced to modify either dictionary or thesaurus.  The underlying
problem is that a package declares both the shape and source of its
imports. The shape part of the declaration specifies that the imports
include certain classes and methods; shape information is necessary
for separate compilation. The source part of the declaration specifies
that the com.supersoft.splaytree package provides those classes. Thus,
the failure of dictionary and thesaurus is a failure to obey the
principle of external connections, which indicates that the source of
a module's imports should be specified external to the module. If the
source of a package's imports were specified external to its
definition, then a programmer could use dictionary and thesaurus in
the same program, specifying different sources for each packages
imports without modifying either package.

	On the other hand, ML's module system follows the principle of
external connections. An ML functor abstracts over a collection of
definitions in the same way that a procedure abstracts over an
expression. A functor imports other modules as formal arguments,
describing the shape of each import using a signature. The signature
does not specify the source of the imports; a given program may
contain several modules that all implement the same
interface. Instead, the programmer explicitly links the functor to the
source of its imports via a functor application.  Unfortunately,
although ML's module system satisfies the principle of external
connections, its mechanism for connecting functors is overly
restrictive. Functors cannot define mutually­recursive procedures,
since functor application can combine only a single functor with other
unparameterized modules. In addition, functor application conflates
linking with instantiation, which prohibits a mixture of hierarchical
linking and multiple instantiation. [See my "NRC work" notes on more
details].

	Thus this dissertation is to overcome the above limitations of
the ML functors: "This dissertation presents a new language of
modules, called units. Units combine the benefits of external linking
specifications, graph­based linking to support mutual recursion across
modules, and hierarchical linking separated from instantiation."

p. 9 of the thesis defines reuse in more concrete terms: suppose a
component1 implements an ADT. This component1 is being used in
client1. Suppose component1x extends component1
with new operations _and_ data members. It should be possible for
component1x rely on a _binary_, separately compiled implementation of
component1 as it was. Furthermore, it should be possible to
(dynamically) link component1x with client1, without any modification
to the source code of the client1 (furthermore, without any need to
recompile client1). Secs. 2.1-2.2 give a very elaborate example of
this idea.

Although page 9 of the thesis speaks disapprovingly of
double-dispatching (citing Kuhne [49] solution), the thesis proposes
basically the same. The way units are elaborated into closures,
indirect cells: see Figs 2.9 (p.18) and 2.13 (p. 23) for an example of
double-indirection in units. Especially Fig. 3.20 on p 50.

Other comments: on p. 13, he says that "the standard OO architecture
does not support operation extensions to the shape datatype without
modifying existing clients.... An Abstract Factory pattern is
required". Indeed, keeping the example above (different from the one
in the thesis), a client1 may at some point do 'new
component1'. Because the client refers to component1 explicitly, it's
quite impossible to substitute component1x for component1 without
modifying the source code of client1. One solution is to use an
abstract factory: a client1 should import a 'factory' object and ask
it to give it an instance of component1: factory.make(args). depending
on the concrete implementation of the factory, it can return an
instance of component1 or component1x. Note, Smalltalk implements this
factory pattern implicitly. Another solution, not mention in the
thesis, is that of a proxy. client1 deals with a proxy component. The
proxy redirects all the messages (including the 'create' message) to
either component1 or component1x. This is a solution used in CORBA.


Note Fig. 2.14 (p. 24): Note how a Color-Shape extends Shape, which is
a base class of a Rectangle, Circle, etc. A Color-Shape can subsume
the Shape in a Rectangle without requiring a Rectangle unit to be
recompiled. This basically solves the Fragile-Base class problem!

Also note on Fig. 2.14: an export declaration in
Basic+Union+BB+Color-Shapes:
 (export ... (CR (C-Shape Rectangle)) ... )
This basically downcasts a C-Shape of CR into a Rectangle. It's safe
in this particular case; but in general downcasts are not safe!
Allowing them can lead to subtle problems!

Note that Matt's class* contain an 'override' directive. That is, in
MzScheme OO system, a derived class can override the base class
behavior. This leads to subtle subtyping errors (similar to those I
showed in my article about Bags and Sets).

Sec 3.1, Existing Module Languages and units (p. 27, bottom):
"Modern linking systems, such as ELF [77], support dynamic linking, but
even the most advanced linking systems rely on a global namespace of
function names and module (i.e., file) names. As a result, modules can
be linked and invoked only once in a program." That is not true: GNU
ld includes a --wrap option. With this option, a single module may be
instatiated several times, under different names (that's what I do in
HTTPFS). dlopen() allows for wrappers (see man dlopen).

Sec 3.2, p. 28, a good phrase that tells the essence of the proposed
units: "Like a package in Java or Modula-3, a program unit is an
unevaluated fragment of code, but there is no global namespace of
units. Instead, like an ML functor, a unit describes its import
requirements without specifying a particular unit that supplies those
imports. The actual linking of the unit is specified externally at a
later stage. Unlike in ML, unit linking is specified for groups of
units with a graph of connections, which allows mutual recursion
across unit boundaries. Furthermore, the result of linking a
collection of units is a new (compound) unit that is available for
further linking."

I disagree with a footnote 2 on p. 28: "Java's class system can also
be viewed as a kind of module system or as a complement to the package
system. Classes suffer the same drawbacks as packages: links, such as
a superclass name, are hard­wired to a specific class." That is true
for Java classes, but not classes in general: in prototype-based OO
systems (JavaScript, Self) superclass (the content of a _proto slot)
can be changed at will.


Comment to p. 40: A functor application in ML is like a call-by-value
function application: the arguments (i.e., structures for a functor)
must be evaluated before the functor is applied. That's why one can't
have mutually-recursive functors in ML, just as an argument to a
call-by-value function can't refer to the function's result. Import
and export clauses of units follow a call-by-name semantics. Hence the
power of units to express mutually-recursive modules. 

Sec 3.5, The structure and interpretation of units. p.44 explains this
very well: "The rigorous description of the unit language, including
its type structures and semantics, relies on well­known type checking
and rewriting techniques for Scheme and ML [22, 34, 85]. In the
rewriting model of evaluation, the set of program expressions is
partitioned into a set of values and a set of non­values. Evaluation
is the process of rewriting a non­value expression within a program to
an equivalent expression, repeating this process until the whole
program is rewritten to a value. An atomic unit expression is a value,
whereas a compound unit expression -- a box containing linked boxes --
is not a value. Thus, a compound unit expression must be re­written to
obtain a value. A compound unit expression with known constituents can
be re­written to an equivalent unit expression by merging the text of
its constituent units. Invocation for a unit is similar: an 'invoke'
expression is rewritten by extracting the invoked unit's definitions
and initialization expression, and then replacing references to
imported variables with values. Otherwise, the standard rules for
functions, assignments, and exceptions apply." See Figure 3.19 for an
example of reduction rules: operational semantics. Note this technique
is akin to a dataflow analysis that a compiler performs in an
optimization pass. It traces data and relationship along the
evaluation paths (= control-flow paths), and finds dependent and
independent pieces of code (which can be rearranged or combined).

Note Figure 3.17 (p. 45): in my work I can define a similar language,
but augmented with higher-level linking expressions. As p. 47 of the
thesis states, the 'compound' forms links variables by name. I think
of a more complex linking expression rather than simple unification,
which can possibly include complex predicates and complex mapping of
tuples of valiables to other tuples of values.

Note Figure 3.18 on p.48: It lists _well-formedness dules_!
Well-formedness means all the variables that are used in expression
are bound, all imported variables are exported, etc. These checks are
context-sensitive: indeed the expression (e1 e2) is well-formed only
in the context both e1 and e2 are well-formed. The context is
explicitly denoted by Gamma. on Fig. 3.18, Gamma is the environment of
all defined variables. letrec and unit forms augment the parent
environment within their body (which introduces a local scope and
binds local variables).

Reduction rules on 3.19 -- operationla semantics -- are noteworthy. I
can use them with my own units.

Figure 3.20 (p. 50) shows how dynamically-typed units are actually
implemented: extra indirection. All accesses to imported/exported
variables involve an extra indirection. On Fig. 3.20, assignment to
the oddcell is a side-effect of the unit's initialization expression.

Typed units: unnit signatures have a subtyping relationship. So
specialized units can be used in place of more general units. Note a
contravariance on types of imported variables! (p. 53) Evaluation for
typed units augments the concept of expressions with pairs of
(typed-store, expression) where typed-store maintains the type
information for variant types. The typed-store is used only when
reducing the letrec-type form. 

Typed units and more general typed units with dependency and
type-expressions seem academic. While dynamically-typed units are
implemented in MzScheme, typed units are introduced to prove type
soundness. No implementation is mentioned. Furthermore, the summary
(p. 65) says: "Our text-based model is far too verbose, and we do not
address the design of a linking language. Instead, we provide a simple
construct for linking units and rely on integration with the core
language to build up linking expressions. This integration simplifies
our presentation, and we believe it is an essential feature of
units. Nevertheless, future research should explore more carefully the
implications of integrating the core and module languages." My monads
and staging?

Chapter 4, "Mixins", introduces ClassicJava, on which mixins will be
later grafted. ClassicJava has a few very interesting features that
make formal reasoning easy. Evaluation is modeled as a reduction on
expression-store pairs in the context of a static type graph. Each
object in the store is a tagged record of field values, where the tag
indicates the class of the object and its field values are references
to other objects. A single reduction step may extend the store with a
new object, or it may modify a field for an existing object in the
store. Dynamic method dispatch is accomplished by matching the class
tag of an object in the store with a node in the _static_ class tree;
a simple relation on this tree selects an appropriate method for the
dispatch.  The class model relies on as few implementation details as
possible. For example, the model defines a mathematical relation,
rather than a selection algorithm [e.g., dynamic dispatch via virtual
tables or CLOS-like dispatch], to associate fields with classes for
the purpose of type­checking and evaluation. Similarly, the reduction
semantics only assumes that an expression can be partitioned into a
proper redex and an (evaluation) context; it does not provide a
partitioning algorithm... [further, on pp. 73-74] The operational
semantics for ClassicJava is defined as a contextual rewriting system
on pairs of expressions and stores. A store S is a mapping from
objects to class-tagged field records. A field record F is a mapping
from elaborated field names to values. The evaluation rules are a
straightforward modification of those for imperative Scheme [22].  The
complete evaluation rules are in Figure 4.8. For example, the call
rule invokes a method by rewriting the method call expression to the
body of the invoked method, syntactically replacing argument variables
in this expression with the supplied argument values. The dynamic
aspect of method calls is implemented by selecting the method based on
the _run-time_ type of the object (in the store). In contrast, the
super reduction performs super method selection using the class
annotation that is _statically_ determined by the type­checker."
continues, on p. 79: "In our semantics, types are simply the names of
entities declared in the program; the collection of types forms a dag,
which is specified by the programmer. The collection of types is
static during evaluation and is only used for field and method
lookups and casts. The evaluation rules describe how to transform
statements, formed over the given type context, into plain values. The
rules work on plain program text such that each intermediate stage of
the evaluation is a complete program. In short, the model is as simple
and intuitive as that of first­order functional programming enriched
with a language for expressing hierarchical relationships among data
types." The latter phrase is particularly well-said!


Fig. 4.3 (p. 70) presents the syntax of ClassicJava. Notice the
absence of array types. No wonder Matt later proves that ClassicJava
is type-sound. When arrays, subclassing, and assignments are all
present, Java type system is unsound (see the article in the IEEE
Computer, Dealing with Java programming stress (in my notes)). Fig 4.4
(p. 71) shows well-formedness predicates. The figure makes it clear
that methods in ClassicJava are not the methods in real Java. To start
with, methods in ClassicJava do not receive as implicit argument the
object they're invoked on (this raises the specter of
contravariance). This "trivial" distinction becomes crucial when we
consider a predicate ClassMethodsOK:
	 method(tag=t,signature=T,arguments=V,body=e) in class-c
   and
	 method(tag=t,signature=T',arguments=V',body=e') in class-c'
  => T = T' or not(c subclass c')
As the thesis itself says, Method overriding preserves types. But this
_precludes_ overriding in Java style. 

for example, this code in Java:
    class A { public int a; public void inc(void) { a = a + 1; } }
    class B extends A 
    { public int b; public void inc(void) { b = b + 1; a = a + 1; } }
cannot be represented in ClassicJava! Method A.inc has a signature
       A->void (note that the first argument is implicit in Java!)
Method B.inc has a signature
       B->void.
But this makes B.void in violation of ClassMethodsOK. If B.inc is
restricted to be of type A->inc (just as the method A.inc it
overrides), then B.inc cannot increment member B.b (which is declared
only in class B, and is not present in A). ClassMethodsOK makes it
impossible to implement a well-defined hierarchy of Points and Tiles
(see Cardelli's paper "On understanding of types..."). 
Note similarity of ClassMethodsOK with my BRules, which ban overriding
as well. No wonder Matt could prove many soundness theorems: when
overriding is practically forbidden, many subtyping problems
disappear.

The same problem shows up again when they introduce mixins. Predicate
MixinMethodsOK demands that "method definitions match inheritance
interface". That is, whenever mixin implements an abstract method
declared in one of the interfaces the mixin implements, the signature
of the implementation must match the signature of the declaration
exactly. No subsumption is allowed. This -- couples with the absense
of the implicit argument of a method invocation -- makes overriding
useless, and make methods useless! Methods can't access members of the
class the methods are defined in! For example,
      interface A { public void inc(void); }
      class B implements A { int b; public void inc(void) { b = b+1; } }
cannot be implemented, as it is in violation of MixinMethodsOK.

Thus all this chapter about Mixins has nothing to do with real Java,
and practically useless.


5.1 Units with Signatures in MzScheme, p. 97

"The MzScheme unit forms described in Chapter 2 provide no support for
managing groups of exported variables, which makes those forms
impractical for implementing realistic components. For example, a
typical component in DrScheme exports ten to twenty variables;
repeatedly listing all of the exports of a unit -- at its definition,
at every import site, and at every linking site -- is too unwieldy.  To
support practical programming with units, MzScheme provides the
following additional constructs:
  - a define­signature form for defining a signature, which is
a _named collection of variables_,
  - a unit/sig form for defining a unit with exports and imports that
match specified signatures, and
  - a compound­unit/sig form for linking together units with signature
information.
MzScheme implements these forms by elaborating them to the basic unit
forms. The make-signed-unit primitive creates a record that
encapsulates a unit along with _signature information for its imports
and exports_. The compound-unit/sig form uses the signature information
in a signed unit to validate linking."

Note, in the (statically) typed Units language (Sec. 3.5.2 and 3.5.3),
a signature is unit's _type_: a collection of all exported and
imported variables along with their types. MzScheme is a
dynamically-typed language. Yet it can benefit from signatures
nevertheless (to better manage lists of imported/exported variables).

Experience with mixins in MzScheme and DrScheme (Secs. 5.2-5.3,
pp. 98-100). It appears that mixins in MzScheme are implemented
similar to the classes in my pure oo system: as a closure
encapsulating a message map. Note that MzScheme prohibits implicit
override: if a derived class wishes to override a method in a base
class, it should use a keyword 'override'. It's an error to declare a
'public' method that has the same name as a method in a base
class. Like in my pure oo system, the (class ...) form is a
first-class expression (just as my make-class form). mixins in
MzScheme are dynamically typed (so all the typechecking theory of
MixedJava in Chapter 4 does not apply).


Note a phrase: "Much of the existing literature on reuse fails to
distinguish between the reuse of source code and the reuse of semantic
abstractions that can be separately compiled. The distinction is
crucial to our view of components, and Krishnamurthi and Felleisen
[47] provide a foundation for formalizing the distinction." (p. 102).

He says on p. 103: "In present software practice, COM [72], CORBA
[64], and JavaBeans [40] define the standards for component
programming. These standards, however, merely define low-level wiring
conventions. They do not provide a language for specifying how
components are linked together, and they do not support verification
that components are linked properly before executing the program. Our
model of units as components addresses both of these problems." I
disagree. CORBA provides a repository of interfaces. A client code can
be statically checked against the repository. Dynamic verification is
also possible. The same with JavaBeans, which permits strong
typechecking of dynamically loaded component (bean). 

Interesting references:

[5] Ancona, D. and E. Zucca. An algebraic approach to mixins and
modularity. In Hanus, M. and M. Rodr'iguez-Artalejo, editors,
Proc. Conference on Algebraic and Logic Programming, volume 1139 of
Lecture Notes in Computer Science, pages 179-193. Springer-Verlag,
1996.

[14] Crary, K., R. Harper and S. Puri. What is a recursive module? In
Proc. ACM Conference on Programming Language Design and
Implementation, pages 50-63, May 1999.

[22] Felleisen, M. and R. Hieb. The revised report on the syntactic
theories of sequential control and state. Technical Report 100, Rice
University, June 1989.  Theoretical Computer Science, volume 102,
1992, pp. 235-271.

[30] Glew, N. and G. Morrisett. Type-safe linking and modular assembly
language.  In Proc. ACM Symposium on Principles of Programming
Languages, pages 250-261, Janurary 1999.

[34] Harper, R., J. Mitchell and E. Moggi. Higher-order modules and
the phase distinction. In Proc. ACM Symposium on Principles of
Programming Languages, pages 341-354, Janurary 1990.

[60] Mezini, M. and K. Lieberherr. Adaptive plug-and-play components
for evolutionary software development. In Proc. ACM Conference on
Object-Oriented Programming, Systems, Languages, and Applications,
pages 97-116, 1998.

[85] Wright, A. and M. Felleisen. A syntactic approach to type
soundness. Technical Report 160, Rice University, 1991. Information
and Computation, volume 115(1), 1994, pp. 38-94.