COLLABORATIVE RESEARCH ON KNOWLEDGE ACQUISITION
FOR JAPANESE-ENGLISH MACHINE TRANSLATION

Michiko Kosaka

Monmouth University
Computer Science Department
West Long Branch, NJ 07764

CONTACT INFORMATION

Mail: Monmouth University, Computer Science Department, West Long Branch, NJ 07764

E-mail: kosaka@monmouth.edu

Phone: (908) 571-4493

FAX: (908) 571-3554

WWW PAGE

http://www.monmouth.edu/faculty/kosaka/

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

Machine translation, natural language understanding, transfer, knowledge acquisition

PROJECT SUMMARY

Machine translation systems require large amounts of detailed information about the correspondences bettween expressions in the source and target languages. System performance is limited to a great extent by our ability to encode such information manually. As an alternative, this project will seek to gather such correspondences from a pair of parallel corpora in the source and target languages, and then generalize from these correspondences to create rules for use in a translation system.

The texts in the source and target languages will be parsed and then syntactically regularized, producing regularized parse trees. A tree-matching procedure will then align the correesponding trees from the source and target texts, producing a set of detailed correspondences between source and target structures. A set of sublanguage (semantic) word classes will be defined, and to the extent possible these correspondences will be generalized using these word classes; the result will be a set of rules for the transfer phase of a trnslation system.

This approach will be evaluated using corresponding programming language manuals in Japanese and English.

PROJECT REFERENCES

J. Barnett, I. Mani, E. Rich, C. Aone, K. Knight, J. Martinez. Capturing Language-Specific Semantic Distinctions in Interlingua-Based MT. In Proc. of Machine Translation Summit III, Washington, D. C., 1991, 25-32.

P. F. Brown, J. Coke, S. A. Della Pietra, V. J. Della Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roosin. A Statistical Approach to Machine Translation. In Computational Linguistics , 16 (2), 1990.

P. F. Brown, J. C. Lai, and R. L. Mercer. Aligning Sentences in Parallel Corpora. In Proc. 29th Annl. Meeting Assn. for Computational Linguistics, Berkeley, CA, 1991, 169-176.

Jing-Shin Chang, Yih-Fen Luo, and Keh-Yih Su. GPSM: A Generalized Probabilistic Semantic Model for Ambiguity Resolution. In Proc. 30th Ann. Meeting Assn. for Computational Linguistics, Newark, DE, 1992, 177-184.

Mahesh Chitrao. Statistical Techniques for Parsing Messages. Doctoral dissertation, Dept. of Computer Science, New York University, December 1990.

Mahesh Chitrao and Ralph Grishman. Statistical Parsing of Messages. Proc. Speech and Natural Language Workshop, Hidden Valley, PA, June 24-27, 1990, Morgan Kaufman Publishers.

B. Dorr. Solving Thematic Divergences in Machine Translation. In Proc. 28th Annl. Meeting Assn. for Computational Linguistics, Pittsburgh, PA, 1990, 127-134.

Gale and Church. A Program for Aligning Sentences in Bilingual Corpora. Proc. 29th Annl. Meeting Assn. for Computational Linguistics, Berkeley, CA, 1991, 177-184.

R. Grishman, L. Hirschman, and N. T. Nhan. Discovery Procedures for Sublanguage Selectional Patterns: Initial Experiments. In Computational Linguistics, 12, (3), 1986, 205-216.

R. Grishman and M. Kosaka. Combining Rationalist and Empiricist Approaches to Machine Translation. In Proc. 4th International Conf. on Theoretical and Methodological Issues in Machine Translation, Montreal, Canada, 1992, 263-274.

R. Grishman and J. Sterling. Acquisition of Selectional Patterns. In Proc. of the 14th International Conf. on Computational Linguistics (COLING 92), Nantes, France, 1992.

Z. Harris. Mathematical structures of language. New York: Wiley Interscience, 1968.

D. Hindle and M. Rooth. Structural Ambiguity and Lexical Relations. In Proc. of the 29th Annual Meeting of the Association for Computational Linguistics, 1991, 229-236.

L. Hirschman, R. Grishman, and N. Sager. Grammatically-based Automatic Word Class Formation. In Information Processing and Management, 11 (1/2), 1975, 39-57.

L. Hirschman. Discovering sublanguage structures. In R. Grishman and R. Kittredge (Eds.). Analyzing language in restricted domains: Sublanguage description and processing. Hillsdale, NJ: Erlbaum, 1986, 211-234.

H. Kaji, Y. Kida, and Y. Morimoto. Learning Translation Templates from Bilingual Text. In Proc. of 14th International Conf. on Computational Linguistics, Nantes, 1992, 672-678.

M. Kameyama, R. Ochitani, S. Peters. Resolving Translation Mismatches with Information Flow. In Proc. of 29th Annual Meeting of the Assoc. for Computational Linguistics, Berkeley, CA, 1991, 193-200.

M. Kosaka, V. Teller, and R. Grishman. A Sublanguage Approach to Japanese-English Machine Translation. In New Directions in Machine Translation, D. Maxwell, K. Schubert and A. Witkam, eds., 109-121. Dordrecht: Foris, 1988.

W. Lehnert and B. Sundheim. An Evaluation of Text Analysis Technologies. In AI Magazine, 12 (3), 1991, 81-94.

M. Nagao, J. Tsujii and J. Nakamura. The Japanese Government Project for Machine Translation. In Computational Linguistics, 11 (2-3), 1985, 91-110.

M. Nagao. Role of Structural Transformation in a Machine Translation System. In Machine Translation, S. Nirenburg, editor, Cambridge: Cambridge University Press, 1987.

J. Pustejovsky. The Generative Lexicon. In Computational Linguistics, 17 (4), 1991, 409-441.

V. Sadler and R. Vendelman. Pilot Implementation of a Bilingual Knowledge Bank. In Proc. of the 13th International Conf. on Computational Linguistics, Helsinki, 1990, 449-451.

N. Sager. Natural Language Information Processing. Reading, MA: Addison-Wesley, 1981.

S. Sato and M. Nagao. Toward Memory-based Translation. In Proc. 13th International Conf. on Computational Linguistics, Helsinki, 1990, 247-252.

S. Sekine, J. Carroll, S. Ananiadou, and J. Tsujii. Automatic Learning for Semantic Collocation. In Proc. 3rd Conf. Applied Natural Language Processing, Trento, 1992, 104-110.

S. Sekine, S. Ananiadou, J. Carroll, and J. Tsujii. Linguistic Knowledge Generator. In Proc. 14th International Conf. on Computational Linguistics, Nantes, France, 1992.

E. Sumita and H. Iida. Experiments and Prospects of Example-Based Machine Translation. In Proc. 29th Annual Meeting Assn. for Computational Linguistics, Berkeley, CA, 1991.

V. Teller, M. Kosaka, and R. Grishman. A Comparative Study of Japanese and English Sublanguage Patterns. In Proc. Second Int'l Conf. on Theoretical and Methodological Issues in Machine Translation of Natural Languages, 1988.

J. Tsujii. Future Directions of Machine Translation. In Proc. 11th International Conf. on Computational Linguistics (COLING 86), Bonn, 1986.

Tsutsumi. Wide-Range Restructuring of Intermediate Representations in Machine Translation. In Computational Linguistics, 16 (2), 1990, 71-78.

T. Utsuro, Y. Matsumoto, M. Nagao. Lexical Knowledge Acquisition from Bilingual Corpora. In Proc. 14th International Conf. on Computational Linguistics, Nantes, 1992, 581-587.

P. Velardi, M. Pazienza and M. Fasolo. How to Encode Semantic Knowledge: A Method for Meaning Representation and Computer-Aided Acquisition. In Computational Linguistics, 17 (2), 1991, 153-170.

U. Zernik. Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Hillsdale, NJ: Lawrence Erlbaum Assoc., 1991.

AREA BACKGROUND

The area of machine translation consists of primarily three models: direct, transfer and interlingua. It is also recognized that speech and text translation differ in significant ways. Our work falls under the transfer architecture, text translation.

AREA REFERENCES

M. Nagao. Machine Translation: How Far Can It Go?. Oxford, Oxford University Press, 1989.

J. Allen. Natural Language Understanding. Redwood City, CA: Benjamin-Cummmings, 1995.

RELATED PROGRAM AREAS

Adaptive Human Interfaces

Intelligent Interactive Systems for Persons with Disabilities