One of the most fun, exciting, and educational opportunities for
students is the chance to build interesting artifacts that may be seen
and used by people all over the world!
Many professors have laundry lists of such projects; here's mine.
All the projects more or less relate to my research interests in programming
languages and environments;
related material is covered in
CS 152.
Projects include:
See something that looks cool? Write nr@cs.tufts.edu.
We have tools that are pretty good at converting scanned documents to
texts, but there are many small errors.
The project is to adapt
Dasher
to enable a person to, at high speed, correct all the errors in an
OCR'ed document.
Many people want to run 32-bit codes on 64-bit hardware or vice
versa.
We've recently discovered
systematic way to run 32-bit
codes on a 64-bit machine.
The investigation turned up some surprisingly regular properties of
machine arithmetic.
The new project is to investigate techniques for running 64-bit codes
on 32-bit machines.
(We're interested in compilation techniques, not simulation techniques
like the Gnu Multiprecision
Library.)
We're hoping either to discover similar regular properties or figure
out why they don't exist.
Any ML programmer can write a function to abstract over a certain
pattern of values.
For example, if you want triples in which the first and last elements
are equal and the middle element is NONE, you can write
fun myTriple x = (x, NONE, x)
But tragically, there is no way to write something similar to get a
pattern that matches only these triples.
The project is to develop an extension to the ML dialect
Objective Caml to support abstraction
in pattern matching.
This is not just a language-design project---we are looking for practical
results we can apply to our C-- compiler.
Our community has gotten pretty good at getting different
languages to work together, but it's another story when it
comes to run-time systems.
Usually any language with a garbage collector things it should control
the world.
But there is an exception: the Lua
language is designed to coexist peacefully with other implementations.
The project is to push that design to try to get Lua to interoperate
with Perl.
(This might be the scripting version of ``Calling Hell from Heaven and
Heaven from Hell.'')
This project would have two benefits:
- It would be great community service if Lua programs could call
everything in the CPAN library.
- We are bound to learn something interesting about memory management
and run-time systems.
Soft typing
is an idea that allows you to get useful error messages from a static type
checker while still being able to run your code if type checking
fails.
Most of the work on soft typing has been done on functional languages.
It would be interesting to see how well the idea applies to
object-oriented languages.
I can imagine two interesting settings:
- In a classroom setting, one could try to apply soft typing to the
micro-Smalltalk language used in
CS 152.
- In a practical setting, one could try to apply soft typing to Lua.
System F has explicit polymorphism, but the explicit abstractions and
applications are a bit hard to program with.
One way to cope is to use Hindley-Milner type inference, which
eliminates explicit annotations by restricting polymorphism.
But another is to find a notation that makes explicit polymorphism
usable.
The programming language CLU offers one such
notation, and it also has some interesting mechanisms for constraining
the specialization of polymorphic types.
The project would be to formalize the CLU type system and develop a
precise characterization of its relationship to
System F.
There are also connections with a recent study of different languages'
support for polymorphism, done at Indiana's
Open Systems Lab
(there is an OOPSLA paper).
Kent Beck's discipline of Extreme Programming advocates continuous
refactoring of systems under construction.
Refactoring by hand requires that code be carefully edited by hand and
the system test suite be re-run.
But many refactoring transformations could be standardized:
- Pull out a chunk of code and convert it into a method.
- Change the relationship between objects from inheritance to
delegation, or vice versa.
- Adjust private methods so they touch fields only through accessor
methods---or not.
The project is to identify an interesting set of refactoring
transformations (e.g., based on Martin Fowler's book) and to develop a
refactoring tool for a modern programming language like Java or
Standard ML.
This project would make a good honors thesis.
The New Jersey
Machine-Code Toolkit generates an API for assembler-like
functions, but it can't conveniently write the results to disk until
all references have been resolved. The project is to develop a form
of object code usable with the toolkit.
Possible tasks are:
- Devise a machine-independent form of object code.
This should include performance measurements of alternative ways of
representing relocation information.
- Develop a machine-independent assembler, using the toolkit to
generate the machine-dependent parts.
- Develop a machine-independent linker usable with your object code.
Develop a loader that can turn an object file without dangling
references into an executable file.
I expect this work to lead to publication and to provide a foundation
for further research.
A companion to the previous project.
Here the idea is to dispose of distinctions between assemblers once
and for all, by having the assembler written in terms of the
semantics of the instructions.
No more remembering opcode names or the order of operands!
This work would be a chance to apply some of the Computer Systems
Description Languages being developed for the Zephyr project.
Automatic generation of processor emulators h2
Emulators are used not only for the high purpose of
playing NES games
on your PC, but also to build systems software for new hardware
even before it's built or to study the performance of computer
systems.
The project is to figure out how to generate an emulator,
given a precise description of the target machine's instruction set.
The long-term goal is to develop several strategies, ranging from
simple strategies that produce slow emulators easily, to the most
aggressive binary-translation techniques.
I expect this work to lead to publication.
-->
Pick your favorite compiler and add support for source-level
debugging, drawing on the ideas in the Debugging
Everywhere project.
I expect this work to lead to publication.
ldb is a retargetable debugger for ANSI C.
ldb works in part by exploiting a novel representation of
debugging information that is entirely executable.
One problem is that this representation is ten times larger than
conventional representations
The project is to devise and implement a new representation that mixes
data and executable parts, such that it retains the retargeting
advantages of the current representation but has the same efficiency
as more conventional representations.
I expect this work to lead to publication.
Simon Peyton Jones
and I are developing a new code generator for C--, a
language to be used as a target
for compilation.
The project is to choose a compiler for a high-level or
very-high-level language, and to retarget it to C--.
This project would make an excellent honors thesis
and might lead to publication.
We have some preliminary designs for
C-- to support
concurrency.
There are several possible avenues, ranging from deep development of
one or two ideas to pulling all the ideas together into a coherent design.
This project should lead to publication.
It's quite useful to be able to write linked data structures to disk
and read them back later (sometimes called ``pickling'' and ``unpickling''),
but to do it by hand is tedious.
The project is to find a good way to automate the process, first for
trees, then for general linked data structures.
The devil here is in the details:
- generating code with the right
idioms
- finding both a readable ASCII representation and also a
compact binary representation
- exploring XML compatibility
- making pickles self-identifying
- managing the software engineering of generating code for different
programming languages in a way that programs written in different
languages can
exchanging data transparently.
This project, tentatively named ``Pickle Master,'' could lead to
software that might be very widely used.
A ``fingerprinter'' is a function F that can be applied to very long
strings (e.g., disk files), such that for any string A, F(A) is of
fixed size (typical sizes are 64-128 bits), and
if A and B are two different
strings, then F(A) and F(B) are also different with very high
probability.
Fingerprints have applications in security as well as filesystem
replication.
The engineering tradeoffs involved in the implementation of
fingerprints are not well understood.
The project is to study alternatives experimentally in order to
engineer a very fast fingerprinter.
Application generators (e.g., parser generators) need to emit code in
high-level languages so the emitted code can be linked with
applications written in high-level languages.
The idioms used to express the code may vary depending on the language
or the application.
This project is to explore idiom choices for the New Jersey
Machine-Code Toolkit, which generates an API for assembler-like
functions.
Even restricting our attention to C, we have several ways to consider
emitting instructions:
- invoke a macro for each instruction
- call an external procedure for each instruction
- call through a function pointer for each instruction
Each of these approaches has advantages in different situations.
The project is to use the Standard ML modules system to devise a
suitable way of choosing an idiom for a particular application.
Familiarity with Standard ML is a plus but is not required.
I expect this work to lead to publication.
Even simple hypertext browsers like Netscape or Internet Explorer can make a
literate program dramatically easier to understand.
(See examples.)
Unfortunately, all existing literate-programming tools assume they are
indexing and cross-referencing a single document.
This project is to develop interdocument cross-reference tools for the
noweb literate-programming tool.
I expect this work to lead to publication.
The
Noweb literate-programming tool currently uses an
approximate algorithm to compute cross-reference information.
The project is to figure out how the Noweb framework can be changed to
provide for completely accurate cross-reference information,
and to develop one or more accurate cross-reference tools.
There are many front ends that might be adapted for cross reference,
supporting languages like Java (Sun javacc),
C (lcc front
end),
C++ (Edison Design Group front end),
Standard ML (Moscow ML front end or ML Kit), and Modula-3 (Compaq SRC
front end).
Prettyprinting technology is well developed, but it is not well
realized in flexible software. This project is to adapt existing
prettyprinting technology so it can work with any programming
language and in a variety of contexts, including standalone ASCII
prettyprinter, literate-programming tool, and PostScript or
FrameMaker.
As a bonus, one might consider extending TeX with support for
prettyprinting.
This work could possibly lead to publication.
Make rebuilds object files whenever source files have changed.
In a big system, this can cause slow builds.
Lots of people have build ``smart compilation'' systems, which
understand the changes and build new binaries only when
necessary.
But these systems have to be built over and over again for each new
language.
The project is to build a replacement for Make that can support smart
compilation for more than one language.
I expect this work to lead to publication.
The project is to develop SLED
and Lambda-RTL descriptions
for a small selection of popular machines.
These descriptions, which will be used in the
C-- project, can be used to
generate machine-dependent tools.
Candidate machines include the PowerPC, the StrongARM, the HP PA RISC, the Motorola
68k family, Hitachi HS-1, and other processors.
Sorin Lerner, Craig Chambers, and Todd Millstein have recently
published some interesting work on provably sound optimizations,
combined automatically.
At present, these optimizations are written in a special-purpose
domain-specific language, which is then interpreted in the Whirlwind
compiler.
This project has two parts:
- Identify what has to be done to adapt the ideas to other
compilers
- Look for opportunities to translate the domain-specific
optimizations to native code
This work would take place in the context of the Quick C-- project.
This work would make a good senior project and might lead to
publication.
The ldb debugger evaluates the conditions associated with
conditional breakpoints by
translating high-level expressions into a low-level intermediate code
and interpreting the intermediate code.
The language of the intermediate code is a subset of PostScript, so
there are two levels of interpretation in the system.
A better solution would be to use machine code
to evaluate the conditions.
If the machine code can go in the target address space, we can avoid
significant context-switching overhead in the case where the
breakpoint is not taken.
It would be interesting to see if the PostScript could be reused for
this purpose.
The project would include
- adding runtime support for managing code in the target address space
- writing new postscript to emit machine code into the target space
- possibly adding the appropriate Modula-3 support to the
New Jersey
Machine-Code Toolkit
I expect this work to lead to publication.
Todd Proebsting's
recent work on superoperators looks like a nice technique for building
bytecode interpreters, but it isn't very conveniently packaged.
The project is to combine this work and my earlier work on specifying
PostScript operators to form an environment for building very
efficient interpreters and code emitters.
This work may lead to publication.
EBNF is a parser generator written in Icon and emitting Icon that
accepts EBNF and generates LL(1) or SLR(1) parsers.
Possible projects include:
- Implement the DeRemer and Penello LALR(1) algorithm.
- Add support for emitting Modula-3 or Standard ML.
- Implement an error-correction algorithm.
The project should result in useful software that can be distributed on
the net.
The implementation of text
adventure games is greatly simplified by the use of
special-purpose languages like Inform and
TADS.
Still, designing and debugging these games is awkward, because it's
difficult to recover the game state once a change is made.
The project is to develop an interactive, incremental compiler so you
can change the state of a running game, then resume play.
This would make a good undergraduate thesis project.
One goal of interactive
fiction is to allow players to interact with seemingly autonomous
agents or ``non-player characters'' (NPCs).
NPCs are currently very difficult to program using interactive-fiction
languages like Inform and
TADS.
The purpose of this project is to develop a programming-language
extension for implementing NPCs.
This will involve
- Reading the literature on interactive NPCs/agents and developing a
simple programming model based either on that literature or on other
well-known techniques (e.g., state machines or goal-directed
evaluation).
- Designing a defining a notation for creating NPCs.
- Implementing the design as a preprocessor for a popular compiler like
TADS or Inform.
This would make a good undergraduate thesis project.
Most text adventure games use bytecode interpreters for portability.
This project is to design and implement a procedure-oriented bytecode interpreter that will
enable an implementation based on binary translation.
This in turn should lead to excellent playing times on such handheld
platforms as the Palm Pilot.
The interpreter alone might make an interesting undergraduate thesis
project.
A full demonstration, complete with back end and translator, would
make an excellent masters' project.
It is currently possible to use the PSTricks package to create figures
for use in TeX documents, but PSTricks is very complex and has a steep
learning curve.
The project is to explore established user-interface principles and
techniques, including two-view editing, to identify and correct the
weaknesses in PSTricks.
Back to Norman Ramsey's home page.