TOWARDS LINGUISTICALLY AND GRAPHICALLY
ARTICULATE COMPUTERS

Stuart Shieber

Division of Applied Sciences
Harvard University
33 Oxford Street -- 14
Cambridge, MA 02138

CONTACT INFORMATION

Email: shieber@das.harvard.edu
Telephone: (617)495-2344
Fax: (617)495-2344

WWW PAGE

http://das-www/users/faculty/Stuart_Shieber/Stuart_Shieber.html

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

Computational linguistics, natural-language processing, automated graphic design, combinatorial optimization

PROJECT SUMMARY

Documents are human artifacts used for human communication. In the information age, it would be attractive for computers to be document-fluent just as people are, and just as people make use of both textual and graphical languages in documents, we would like computers to be both linguistically and graphically articulate, if they are to aid in document processing in its widest sense.

The long-term goal of the research funded by the Interactive Systems Program as Presidential Faculty Fellow Award IRI-9350192 is the development of linguistic and graphical articulateness in computers. The path we are pursuing to this goal is through research on computational linguistics and automated graphic design.

Computational Linguistics

Towards the long-term goal of linguistically articulate computers, we address three central problems confronting researchers in computational linguistics: robustness, fluency, and modularity. These issues can be addressed through the use of systems of constraints as the underlying method for encoding the structures of natural language.

Robustness: Natural-language-processing systems tend to be fragile, especially in the face of novel or unknown aspects of language. Grammatical formalisms assume that all constraints are categorical as opposed to defeasible in some sense, leading to fragile behavior. Probabilistic systems have, in the past, ignored the grammatical structure of language, leading to intrinsic limitations on their accuracy. We are exploring the integration of these two models building on recent work on statistical analysis of natural-language corpora.
Fluency: Systems must become more fluent in reconstructing the ubiquitous implicit relationships that hold within and among sentences. These relationships are perhaps best exemplified by elliptical and coordinate constructions, which are found universally in language and which have been among the most intransigent problems in natural-language processing. These phenomena seem to require the ability to abstract higher-order relationships from concrete first-order patterns; such an ability is foreign to the basically first-order nature of constraint-based formalisms. We are studying the addition of higher-order constraints that may allow for solution of some of these problematic cases.
Modularity: To make natural-language-processing systems easier to build and extend, they must be structured in a more modular fashion. One of the motivating features of constraint-based formalisms is the observation that, informally speaking, the interaction of independent constraints increases expressivity geometrically. It is thus worthwhile examining new methods of factoring linguistic constraints in such a way that linguistic descriptions can be further simplified. For instance, the factoring of phrase-structure information postulated in categorial or tree-adjoining grammars can eliminate the need for many of the constraints found in more traditional constraint-based formalisms. We have investigated the space of possibilities to determine where more modular and expressive languages can be constructed without sacrificing computational effectiveness.

Automated Graphic Design

Towards the goal of graphically articulate computers, we have been developing methods for automated and semi-automated graphic design; the ultimate goal is to allow computers to communicate using informational graphics. As human beings have been using natural language for perhaps hundreds of thousands of years, but widespread use of symbolic graphical languages dates from only the late 18th century, graphical artifacts are quite a bit more conventional, providing some basis for the expectation that building a graphically articulate computer may be more practical in the shorter term than building a linguistically articulate one. However, many graphic-design problems, such as placing labels on maps or laying out nodes in a network diagram, are NP-hard, hence undoubtedly intractable. The problem is not merely theoretical. For instance, it has been reported that as much as half of the time in designing production-quality maps is taken up by label placement, with good cartographers able to place only 20 to 30 labels per hour on average.

Our approach has been to view graphic design problems as a kind of combinatorial optimization problem, where the combinatorics is derived from the space of possible graphical artifacts and the function to be optimized is the perceptual quality of the generated artifact. Many simple graphic design problems can be viewed in this way: chart design, graph layout, label placement, page layout, and so forth. Because the functions to be optimized are quite complex (since tied to the idiosyncrasies of perception), and the combinatorics are intractable, we have concentrated on weak heuristic methods such as stochastic search for solving them.

To date, we have addressed the following graphic design problems in this way: cartographic label placement, page layout, automated region delineation, computer animation.

Combinatorial Optimization

This research program -- viewing automated graphic design as combinatorial optimization and using stochastic methods for solution -- led naturally to investigations of stochastic methods for combinatorial optimization problems in general. To investigate the issues, we turned to some ``pure'' combinatorial optimization problems such as number and graph partitioning. Our studies showed that (contra the assumption in much work in stochastic optimization) variations in search space representation can be orders of magnitude more important than variations in search method. For the number partitioning problem, stochastic methods using a simple, direct representation performed several orders of magnitude worse than the best known deterministic heuristic regardless of the stochastic search method used. By changing representations, these same stochastic search methods, and even methods as trivial as ``random generate and test'' could be made to perform three orders of magnitude better than the best (previously) known method. From these studies, we were able to adduce several general principles for devising good representations for combinatorial optimization problems.

We have begun more detailed investigation applying the techniques we developed for number partitioning to other combinatorial optimization problems. In particular, we have begun an investigation of the problem of graph partitioning.

PROJECT REFERENCES

The following papers were prepared with partial support from this grant.

Auslander, J., A. Fukunaga, H. Partovi, J. Christensen, L. Hsu, P. Reiss, A. Shuman, J. Marks, and T. Ngo. 1995. Further developments in automatic motion synthesis for articulated figures. To appear in ACM Transactions on Graphics.

Chen, Stanley F. 1993. Aligning sentences in bilingual corpora using lexical information. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages 9-16, Ohio State University, Columbus, Ohio, June.

Chen, Stanley F. 1995. Bayesian grammar induction for language modeling. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, Massachusetts, June. Massachusetts Institute of Technology.

Christensen, Jon, Joe Marks, and Stuart M. Shieber. 1994. Placing text labels on maps and diagrams. In Paul Heckbert, editor, Graphics Gems IV. Academic Press, Cambridge, Massachusetts.

Christensen, Jon, Joe Marks, and Stuart M. Shieber. 1995. An empirical study of algorithms for point-feature label placement. To appear in ACM Transactions on Graphics.

Christensen, Jon. 1995. Managing Design Complexity: Using Stochastic Optimization in the Production of Computer Graphics. Ph.D. thesis, Harvard University, Cambridge, Massachusetts, June.

Dalrymple, Mary and Andrew Kehler. 1995. On the constraints imposed by respectively. Linguistic Inquiry, 26(3).

Kehler, Andrew. 1994a. Common topics and coherent situations: Interpreting ellipsis in the context of discourse inference. In Proceedings of the 32nd Annual Conference of the Association for Computational Linguistics, University of New Mexico, Las Cruces, New Mexico, June.

Kehler, Andrew. 1994b. Temporal relations: Reference or discourse coherence? In Proceedings of the 32nd Annual Conference of the Association for Computational Linguistics (student session), University of New Mexico, Las Cruces, New Mexico, June.

Kehler, Andrew. 1995. Interpreting Cohesive Forms in the Context of Discourse Inference. Ph.D. thesis, Harvard University, Cambridge, Massachusetts, June.

Kehler, Andrew, Mary Dalrymple, John Lamping, and Vijay Saraswat. 1995. The semantics of resource sharing in lexical-functional grammar. In Proceedings of the Seventh Meeting of the European ACL, Dublin, Ireland, March.

Kosak, Corey, Joseph Marks, and Stuart Shieber. 1994. Automating the layout of network diagrams with specified visual organization. Transactions on Systems, Man and Cybernetics, 24(3), March.

Ruml, Wheeler, J. Thomas Ngo, Joe Marks, and Stuart M. Shieber. 1996. Easily searched encodings of number partitioning. Journal of Optimization Theory and Applications, 80(2). To appear.

Ryall, Kathy, Stuart M. Shieber, Joe Marks, and Murray Mazer. 1995. Semi-automatic delineation of regions in floor plans. In Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR '95), Montreal, Canada, August.

Schabes, Yves and Stuart M. Shieber. 1994. An alternative conception of tree-adjoining derivation. Computational Linguistics, 20(1):91-124.

Shieber, Stuart M., Yves Schabes, and Fernando C. N. Pereira. 1995. Principles and implementation of deductive parsing. Journal of Logic Programming, 24(1-2):3-36, 1995.

Shieber, Stuart M. 1994. Restricting the weak-generative capacity of synchronous tree-adjoining grammars. Computational Intelligence, 10(4):371-385.

Shieber, Stuart M., Fernando C. N. Pereira, and Mary Dalrymple. To appear. Interactions of scope and ellipsis. Linguistics and Philosophy.

TOWARDS LINGUISTICALLY AND GRAPHICALLY ARTICULATE COMPUTERS