Stuart Shieber

Division of Applied Sciences
Harvard University
33 Oxford Street -- 14
Cambridge, MA 02138


Telephone: (617)495-2344
Fax: (617)495-2344




Speech and Natural Language Understanding.


Computational linguistics, natural-language processing, automated graphic design, combinatorial optimization


Documents are human artifacts used for human communication. In the information age, it would be attractive for computers to be document-fluent just as people are, and just as people make use of both textual and graphical languages in documents, we would like computers to be both linguistically and graphically articulate, if they are to aid in document processing in its widest sense.

The long-term goal of the research funded by the Interactive Systems Program as Presidential Faculty Fellow Award IRI-9350192 is the development of linguistic and graphical articulateness in computers. The path we are pursuing to this goal is through research on computational linguistics and automated graphic design.

Computational Linguistics

Towards the long-term goal of linguistically articulate computers, we address three central problems confronting researchers in computational linguistics: robustness, fluency, and modularity. These issues can be addressed through the use of systems of constraints as the underlying method for encoding the structures of natural language.

Automated Graphic Design

Towards the goal of graphically articulate computers, we have been developing methods for automated and semi-automated graphic design; the ultimate goal is to allow computers to communicate using informational graphics. As human beings have been using natural language for perhaps hundreds of thousands of years, but widespread use of symbolic graphical languages dates from only the late 18th century, graphical artifacts are quite a bit more conventional, providing some basis for the expectation that building a graphically articulate computer may be more practical in the shorter term than building a linguistically articulate one. However, many graphic-design problems, such as placing labels on maps or laying out nodes in a network diagram, are NP-hard, hence undoubtedly intractable. The problem is not merely theoretical. For instance, it has been reported that as much as half of the time in designing production-quality maps is taken up by label placement, with good cartographers able to place only 20 to 30 labels per hour on average.

Our approach has been to view graphic design problems as a kind of combinatorial optimization problem, where the combinatorics is derived from the space of possible graphical artifacts and the function to be optimized is the perceptual quality of the generated artifact. Many simple graphic design problems can be viewed in this way: chart design, graph layout, label placement, page layout, and so forth. Because the functions to be optimized are quite complex (since tied to the idiosyncrasies of perception), and the combinatorics are intractable, we have concentrated on weak heuristic methods such as stochastic search for solving them.

To date, we have addressed the following graphic design problems in this way: cartographic label placement, page layout, automated region delineation, computer animation.

Combinatorial Optimization

This research program -- viewing automated graphic design as combinatorial optimization and using stochastic methods for solution -- led naturally to investigations of stochastic methods for combinatorial optimization problems in general. To investigate the issues, we turned to some ``pure'' combinatorial optimization problems such as number and graph partitioning. Our studies showed that (contra the assumption in much work in stochastic optimization) variations in search space representation can be orders of magnitude more important than variations in search method. For the number partitioning problem, stochastic methods using a simple, direct representation performed several orders of magnitude worse than the best known deterministic heuristic regardless of the stochastic search method used. By changing representations, these same stochastic search methods, and even methods as trivial as ``random generate and test'' could be made to perform three orders of magnitude better than the best (previously) known method. From these studies, we were able to adduce several general principles for devising good representations for combinatorial optimization problems.

We have begun more detailed investigation applying the techniques we developed for number partitioning to other combinatorial optimization problems. In particular, we have begun an investigation of the problem of graph partitioning.


The following papers were prepared with partial support from this grant.

Auslander, J., A. Fukunaga, H. Partovi, J. Christensen, L. Hsu, P. Reiss, A. Shuman, J. Marks, and T. Ngo. 1995. Further developments in automatic motion synthesis for articulated figures. To appear in ACM Transactions on Graphics.

Chen, Stanley F. 1993. Aligning sentences in bilingual corpora using lexical information. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages 9-16, Ohio State University, Columbus, Ohio, June.

Chen, Stanley F. 1995. Bayesian grammar induction for language modeling. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, Massachusetts, June. Massachusetts Institute of Technology.

Christensen, Jon, Joe Marks, and Stuart M. Shieber. 1994. Placing text labels on maps and diagrams. In Paul Heckbert, editor, Graphics Gems IV. Academic Press, Cambridge, Massachusetts.

Christensen, Jon, Joe Marks, and Stuart M. Shieber. 1995. An empirical study of algorithms for point-feature label placement. To appear in ACM Transactions on Graphics.

Christensen, Jon. 1995. Managing Design Complexity: Using Stochastic Optimization in the Production of Computer Graphics. Ph.D. thesis, Harvard University, Cambridge, Massachusetts, June.

Dalrymple, Mary and Andrew Kehler. 1995. On the constraints imposed by respectively. Linguistic Inquiry, 26(3).

Kehler, Andrew. 1994a. Common topics and coherent situations: Interpreting ellipsis in the context of discourse inference. In Proceedings of the 32nd Annual Conference of the Association for Computational Linguistics, University of New Mexico, Las Cruces, New Mexico, June.

Kehler, Andrew. 1994b. Temporal relations: Reference or discourse coherence? In Proceedings of the 32nd Annual Conference of the Association for Computational Linguistics (student session), University of New Mexico, Las Cruces, New Mexico, June.

Kehler, Andrew. 1995. Interpreting Cohesive Forms in the Context of Discourse Inference. Ph.D. thesis, Harvard University, Cambridge, Massachusetts, June.

Kehler, Andrew, Mary Dalrymple, John Lamping, and Vijay Saraswat. 1995. The semantics of resource sharing in lexical-functional grammar. In Proceedings of the Seventh Meeting of the European ACL, Dublin, Ireland, March.

Kosak, Corey, Joseph Marks, and Stuart Shieber. 1994. Automating the layout of network diagrams with specified visual organization. Transactions on Systems, Man and Cybernetics, 24(3), March.

Ruml, Wheeler, J. Thomas Ngo, Joe Marks, and Stuart M. Shieber. 1996. Easily searched encodings of number partitioning. Journal of Optimization Theory and Applications, 80(2). To appear.

Ryall, Kathy, Stuart M. Shieber, Joe Marks, and Murray Mazer. 1995. Semi-automatic delineation of regions in floor plans. In Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR '95), Montreal, Canada, August.

Schabes, Yves and Stuart M. Shieber. 1994. An alternative conception of tree-adjoining derivation. Computational Linguistics, 20(1):91-124.

Shieber, Stuart M., Yves Schabes, and Fernando C. N. Pereira. 1995. Principles and implementation of deductive parsing. Journal of Logic Programming, 24(1-2):3-36, 1995.

Shieber, Stuart M. 1994. Restricting the weak-generative capacity of synchronous tree-adjoining grammars. Computational Intelligence, 10(4):371-385.

Shieber, Stuart M., Fernando C. N. Pereira, and Mary Dalrymple. To appear. Interactions of scope and ellipsis. Linguistics and Philosophy.