Villanova University
Department of Computing Science
Villanova University
Villanova, PA 19085-1699
The system will be developed using the Penn Treebank, a syntactically annotated corpus of several million words, containing a wide range of texts of varying styles. The proposed project makes crucial use of newly available syntactically annotated Treebanks and associated tools. The system will be tested on examples from the Treebank to determine the percentage of cases in which ellipsis is correctly resolved. The goal is to achieve a success rate of 95%. Preliminary research suggests that this goal is realistic.
The system would represent a practical solution to a problem that confronts virtually any Natural Language Processing application that attempts to process English in a realistic setting. In addition, the system could be used as a tool in further annotating treebanks with ellipsis resolution information. Finally, the project would produce a massive amount of valuable data for theoreticians studying ellipsis and related phenomena.
The planned education activities during the proposed award period involve: teaching and developing a graduate and undergraduate course in Natural Language Processing (NLP), teaching and developing a variety of courses for non-majors, and teaching graduate and undergraduate courses in programming languages.
Hardt, Daniel. 1994. Sense and Reference in Dynamic Semantics. Proceedings of the Ninth Amsterdam Colloquium. Amsterdam, Netherlands.
Hardt, Daniel. 1993. Verb Phrase Ellipsis: Form, Meaning, and Processing. Ph.D. Dissertation. University of Pennsylvania.
Hardt, Daniel. 1992. VP Ellipsis and Contextual Interpretation. Proceedings of the Fifteenth International Conference on Computational Linguistics. Nantes, France.
Hardt, Daniel. 1992. An Algorithm for VP Ellipsis. Proceedings, 29th Annual Meeting of the Association for Computational Linguistics. Newark, DE.
Hardt, Daniel. 1992. Some Problematic Cases of VP Ellipsis. Proceedings, 29th Annual Meeting of the Association for Computational Linguistics. Newark, DE.
Hardt, Daniel. 1992. VP Ellipsis and Semantic Identity. Proceedings of the Second Conference on Semantics and Linguistic Theory. Edited by Chris Barker and David Dowty. Columbus, OH.
Hardt, Daniel. 1991. A Discourse Model Approach to VP Ellipsis. Proceedings AAAI Symposium on Discourse Structure in Natural Language Understanding and Generation. Asilomar, CA.
Hardt, Daniel. 1991. Towards a Discourse Level Account of VP Ellipsis. Proceedings of the 8th Eastern States Conference on Linguistics. G. Westphal, J. Dai and B. Ao (editors). Ohio State University.
The field of NLP can be divided into three sub-areas: Syntactic Processing, which involves the determination of a parse tree for a given sentence, Semantic Interpretation, which involves determining the truth conditions, or logical structure for a sentence, and Context and World Knowledge, which involves the integration of sentences with surrounding discourse as well as the general situation. The study of discourse context has long been important in NLP, and in the past two decades, the emphasis on discourse issues in NLP has contributed to a re-evalutation of the emphasis on individual sentences in theoretical linguistics, as seen in theories such as Discourse Representation Theory and Dynamic Semantics. Probably the most important current development in the field is the increased use of on-line corpora (that is, large bodies of text) as a means of developing and evaluating NL systems and theories of all kinds. I believe this development presents the opportunity to place the study of language, both computational and theoretical, on a sound empirical basis for the first time.
Pereira, Fernando and Stuart Shieber. 1987. Prolog and Natural Language Analysis. Chicago: Chicago University Press.