TOWARDS LINGUISTICALLY AND GRAPHICALLY
ARTICULATE COMPUTERS
Stuart Shieber
Division of Applied Sciences
Harvard University
33 Oxford Street -- 14
Cambridge, MA 02138
CONTACT INFORMATION
Email: shieber@das.harvard.edu
Telephone: (617)495-2344
Fax: (617)495-2344
WWW PAGE
http://das-www/users/faculty/Stuart_Shieber/Stuart_Shieber.html
PROGRAM AREA
Speech and Natural Language Understanding.
KEYWORDS
Computational linguistics, natural-language processing, automated
graphic design, combinatorial optimization
PROJECT SUMMARY
Documents are human artifacts used for human communication. In the
information age, it would be attractive for computers to be
document-fluent just as people are, and just as people make use of
both textual and graphical languages in documents, we would like
computers to be both linguistically and graphically articulate, if
they are to aid in document processing in its widest sense.
The long-term goal of the research funded by the Interactive Systems
Program as Presidential Faculty Fellow Award IRI-9350192 is the
development of linguistic and graphical articulateness in computers.
The path we are pursuing to this goal is through research on
computational linguistics and automated graphic design.
Computational Linguistics
Towards the long-term goal of linguistically articulate computers,
we address three central problems confronting researchers in
computational linguistics: robustness, fluency, and modularity. These
issues can be addressed through the use of systems of constraints as the
underlying method for encoding the structures of natural language.
-
Robustness:
Natural-language-processing systems tend to be
fragile, especially in the face of novel or unknown aspects of
language. Grammatical formalisms assume that all constraints are
categorical as opposed to defeasible in some sense, leading to fragile
behavior. Probabilistic systems have, in the past, ignored the
grammatical structure of language, leading to intrinsic limitations on
their accuracy. We are exploring the integration of these two models
building on recent work on statistical analysis of natural-language
corpora.
-
Fluency:
Systems must become more fluent in reconstructing
the ubiquitous implicit relationships that hold within and among
sentences. These relationships are perhaps best exemplified by
elliptical and coordinate constructions, which are found universally
in language and which have been among the most intransigent problems
in natural-language processing. These phenomena seem to require the
ability to abstract higher-order relationships from concrete
first-order patterns; such an ability is foreign to the basically
first-order nature of constraint-based formalisms. We are studying
the addition of higher-order constraints that may allow for solution
of some of these problematic cases.
-
Modularity:
To make natural-language-processing systems
easier to build and extend, they must be structured in a more modular
fashion. One of the motivating features of constraint-based
formalisms is the observation that, informally speaking, the
interaction of independent constraints increases expressivity
geometrically. It is thus worthwhile examining new methods of
factoring linguistic constraints in such a way that linguistic
descriptions can be further simplified. For instance, the factoring
of phrase-structure information postulated in categorial or
tree-adjoining grammars can eliminate the need for many of the
constraints found in more traditional constraint-based formalisms. We
have investigated the space of possibilities to determine where more
modular and expressive languages can be constructed without
sacrificing computational effectiveness.
Automated Graphic Design
Towards the goal of graphically articulate computers, we have been
developing methods for automated and semi-automated graphic design;
the ultimate goal is to allow computers to communicate using
informational graphics. As human beings have been using natural
language for perhaps hundreds of thousands of years, but widespread
use of symbolic graphical languages dates from only the late 18th
century, graphical artifacts are quite a bit more conventional,
providing some basis for the expectation that building a graphically
articulate computer may be more practical in the shorter term than
building a linguistically articulate one. However, many
graphic-design problems, such as placing labels on maps or laying out
nodes in a network diagram, are NP-hard, hence undoubtedly
intractable. The problem is not merely theoretical. For instance, it
has been reported that as much as half of the time in designing
production-quality maps is taken up by label placement, with good
cartographers able to place only 20 to 30 labels per hour on average.
Our approach has been to view graphic design problems as a kind of
combinatorial optimization problem, where the combinatorics is derived
from the space of possible graphical artifacts and the function to be
optimized is the perceptual quality of the generated artifact. Many
simple graphic design problems can be viewed in this way: chart
design, graph layout, label placement, page layout, and so forth.
Because the functions to be optimized are quite complex (since tied to
the idiosyncrasies of perception), and the combinatorics are
intractable, we have concentrated on weak heuristic methods such as
stochastic search for solving them.
To date, we have addressed the following graphic design problems in
this way: cartographic label placement, page layout, automated region
delineation, computer animation.
Combinatorial Optimization
This research program -- viewing automated graphic design as
combinatorial optimization and using stochastic methods for solution
-- led naturally to investigations of stochastic methods for
combinatorial optimization problems in general. To investigate the
issues, we turned to some ``pure'' combinatorial optimization problems
such as number and graph partitioning. Our studies showed that
(contra the assumption in much work in stochastic optimization)
variations in search space representation can be orders of magnitude
more important than variations in search method. For the number
partitioning problem, stochastic methods using a simple, direct
representation performed several orders of magnitude worse than the
best known deterministic heuristic regardless of the stochastic search
method used. By changing representations, these same stochastic
search methods, and even methods as trivial as ``random generate and
test'' could be made to perform three orders of magnitude better than
the best (previously) known method. From these studies, we were able
to adduce several general principles for devising good representations
for combinatorial optimization problems.
We have begun more detailed investigation applying the techniques we
developed for number partitioning to other combinatorial optimization
problems. In particular, we have begun an investigation of the
problem of graph partitioning.
PROJECT REFERENCES
The following papers were prepared with partial support from this grant.
Auslander, J., A. Fukunaga, H. Partovi, J. Christensen, L. Hsu,
P. Reiss, A. Shuman, J. Marks, and T. Ngo. 1995. Further developments
in automatic motion synthesis for articulated figures. To appear in ACM
Transactions on Graphics.
Chen, Stanley F. 1993. Aligning sentences in bilingual corpora using
lexical information. In Proceedings of the 31st Annual Meeting of the
Association for Computational Linguistics, pages 9-16, Ohio State
University, Columbus, Ohio, June.
Chen, Stanley F. 1995. Bayesian grammar induction for language
modeling. In Proceedings of the 33rd Annual Meeting of the Association
for Computational Linguistics, Cambridge, Massachusetts, June.
Massachusetts Institute of Technology.
Christensen, Jon, Joe Marks, and Stuart M. Shieber. 1994. Placing text
labels on maps and diagrams. In Paul Heckbert, editor, Graphics Gems
IV. Academic Press, Cambridge, Massachusetts.
Christensen, Jon, Joe Marks, and Stuart M. Shieber. 1995. An empirical
study of algorithms for point-feature label placement. To appear in ACM
Transactions on Graphics.
Christensen, Jon. 1995. Managing Design Complexity: Using Stochastic
Optimization in the Production of Computer Graphics. Ph.D. thesis,
Harvard University, Cambridge, Massachusetts, June.
Dalrymple, Mary and Andrew Kehler. 1995. On the constraints imposed by
respectively. Linguistic Inquiry, 26(3).
Kehler, Andrew. 1994a. Common topics and coherent situations:
Interpreting ellipsis in the context of discourse inference. In
Proceedings of the 32nd Annual Conference of the Association for
Computational Linguistics, University of New Mexico, Las Cruces, New
Mexico, June.
Kehler, Andrew. 1994b. Temporal relations: Reference or discourse
coherence? In Proceedings of the 32nd Annual Conference of the
Association for Computational Linguistics (student session), University
of New Mexico, Las Cruces, New Mexico, June.
Kehler, Andrew. 1995. Interpreting Cohesive Forms in the Context of
Discourse Inference. Ph.D. thesis, Harvard University, Cambridge,
Massachusetts, June.
Kehler, Andrew, Mary Dalrymple, John Lamping, and Vijay Saraswat. 1995.
The semantics of resource sharing in lexical-functional grammar. In
Proceedings of the Seventh Meeting of the European ACL, Dublin, Ireland,
March.
Kosak, Corey, Joseph Marks, and Stuart Shieber. 1994. Automating the
layout of network diagrams with specified visual organization.
Transactions on Systems, Man and Cybernetics, 24(3), March.
Ruml, Wheeler, J. Thomas Ngo, Joe Marks, and Stuart M. Shieber. 1996.
Easily searched encodings of number partitioning. Journal of
Optimization Theory and Applications, 80(2). To appear.
Ryall, Kathy, Stuart M. Shieber, Joe Marks, and Murray Mazer. 1995.
Semi-automatic delineation of regions in floor plans. In Proceedings of
the Third International Conference on Document Analysis and Recognition
(ICDAR '95), Montreal, Canada, August.
Schabes, Yves and Stuart M. Shieber. 1994. An alternative conception
of tree-adjoining derivation. Computational Linguistics, 20(1):91-124.
Shieber, Stuart M., Yves Schabes, and Fernando C. N. Pereira. 1995.
Principles and implementation of deductive parsing. Journal of Logic
Programming, 24(1-2):3-36, 1995.
Shieber, Stuart M. 1994. Restricting the weak-generative capacity of
synchronous tree-adjoining grammars. Computational Intelligence,
10(4):371-385.
Shieber, Stuart M., Fernando C. N. Pereira, and Mary Dalrymple. To
appear. Interactions of scope and ellipsis. Linguistics and
Philosophy.