ELLIPSIS RESOLUTION IN ENGLISH

Daniel Hardt

Villanova University
Department of Computing Science
Villanova University
Villanova, PA 19085-1699

CONTACT INFORMATION

hardt@vill.edu
610-519-7337
610-519-7889 (fax)

WWW PAGE

http://renoir.vill.edu/faculty/hardt/html/home.html

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

Corpus, Ellipsis, Discourse, NLP

PROJECT SUMMARY

The objective of the proposed research is to develop a system that reliably resolves ellipsis in English. The system will accept syntactically annotated input, and it will produce output in which elliptical expressions are resolved, either by producing a non-elliptical paraphrase or by linking the elliptical expression to its antecedent.

The system will be developed using the Penn Treebank, a syntactically annotated corpus of several million words, containing a wide range of texts of varying styles. The proposed project makes crucial use of newly available syntactically annotated Treebanks and associated tools. The system will be tested on examples from the Treebank to determine the percentage of cases in which ellipsis is correctly resolved. The goal is to achieve a success rate of 95%. Preliminary research suggests that this goal is realistic.

The system would represent a practical solution to a problem that confronts virtually any Natural Language Processing application that attempts to process English in a realistic setting. In addition, the system could be used as a tool in further annotating treebanks with ellipsis resolution information. Finally, the project would produce a massive amount of valuable data for theoreticians studying ellipsis and related phenomena.

The planned education activities during the proposed award period involve: teaching and developing a graduate and undergraduate course in Natural Language Processing (NLP), teaching and developing a variety of courses for non-majors, and teaching graduate and undergraduate courses in programming languages.

PROJECT REFERENCES

Hardt, Daniel. 1995. An Empirical Approach to VP Ellipsis. AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation.

Hardt, Daniel. 1994. Sense and Reference in Dynamic Semantics. Proceedings of the Ninth Amsterdam Colloquium. Amsterdam, Netherlands.

Hardt, Daniel. 1993. Verb Phrase Ellipsis: Form, Meaning, and Processing. Ph.D. Dissertation. University of Pennsylvania.

Hardt, Daniel. 1992. VP Ellipsis and Contextual Interpretation. Proceedings of the Fifteenth International Conference on Computational Linguistics. Nantes, France.

Hardt, Daniel. 1992. An Algorithm for VP Ellipsis. Proceedings, 29th Annual Meeting of the Association for Computational Linguistics. Newark, DE.

Hardt, Daniel. 1992. Some Problematic Cases of VP Ellipsis. Proceedings, 29th Annual Meeting of the Association for Computational Linguistics. Newark, DE.

Hardt, Daniel. 1992. VP Ellipsis and Semantic Identity. Proceedings of the Second Conference on Semantics and Linguistic Theory. Edited by Chris Barker and David Dowty. Columbus, OH.

Hardt, Daniel. 1991. A Discourse Model Approach to VP Ellipsis. Proceedings AAAI Symposium on Discourse Structure in Natural Language Understanding and Generation. Asilomar, CA.

Hardt, Daniel. 1991. Towards a Discourse Level Account of VP Ellipsis. Proceedings of the 8th Eastern States Conference on Linguistics. G. Westphal, J. Dai and B. Ao (editors). Ohio State University.

AREA BACKGROUND

The ultimate goal of Natural Language Processing (NLP) is the development of computer systems that can communicate using Natural Language. Closely related to this practical goal is a more theoretical one: the development of an adequate model of the human cognitive abilities underlying the use of language. There is a synergistic, if often contentious, relationship between the computational activities in NLP and the more theoretical inquiries of linguists, logicians, and others. In NLP we rely on concepts, methods, and frameworks developed by the theoreticians, but we also often cause the theories to be modified, extended, and improved.

The field of NLP can be divided into three sub-areas: Syntactic Processing, which involves the determination of a parse tree for a given sentence, Semantic Interpretation, which involves determining the truth conditions, or logical structure for a sentence, and Context and World Knowledge, which involves the integration of sentences with surrounding discourse as well as the general situation. The study of discourse context has long been important in NLP, and in the past two decades, the emphasis on discourse issues in NLP has contributed to a re-evalutation of the emphasis on individual sentences in theoretical linguistics, as seen in theories such as Discourse Representation Theory and Dynamic Semantics. Probably the most important current development in the field is the increased use of on-line corpora (that is, large bodies of text) as a means of developing and evaluating NL systems and theories of all kinds. I believe this development presents the opportunity to place the study of language, both computational and theoretical, on a sound empirical basis for the first time.

AREA REFERENCES

Allen, James. 1995. Understanding Natural Language. Benjamin/Cummings Publishing.

Pereira, Fernando and Stuart Shieber. 1987. Prolog and Natural Language Analysis. Chicago: Chicago University Press.

RELATED PROGRAM AREAS

4. Adaptive Human Interfaces. 5. Usability and User-Centered Design. 6. Intelligent Interactive Systems for Persons with Disabilities.

POTENTIAL RELATED PROJECTS

The proposed project will result in a system that resolves a variety of well-defined forms of ellipsis in English. This might be of use in the area ``Usability and User-Centered Design". It is often more natural for humans to use elliptical or reduced forms of input. The proposed system could be used to make this possible, if a basic syntactic structure for the input is provided. This could be done either through the use of a broad-coverage parser for English, or by restricting the input language to a parsable fragment of English. In either case, the possibility of elliptical input might enhance the ``cognitive ergonomics" of the system. These issues might also be relevant for the areas of ``Adaptive Human Interfaces" and ``Intelligent Interactive Systems for Persons with Disabilities".