Building knowledge bases for natural language understanding

February 22, 2018
2:50pm - 4:00pm
Halligan 102

Abstract

One of the ways we can formulate natural language understanding is by treating it as a task of mapping natural language text to its meaning representation: entities and relations anchored to the world. Knowledge bases (KBs) can facilitate natural language understanding by mapping words to their meaning representations, for example nouns to entities and verbs to relations. State of the art knowledge bases such as NELL, Freebase, and YAGO have been successful at constructing such knowledge bases, which contain beliefs about real world entities and relations, by leveraging the redundancy of millions of documents to detect language patterns. The accumulated knowledge have been used to improve the ability of intelligent systems to make inferences. Under multilingual and multimodal settings, knowledge bases present a virtuous learning opportunity: more and higher confident beliefs can be extracted by processing data in more languages or modalities; in turn, since entities and their relations in the KBs exist in the world no matter what language or modality is being used to express them, KBs can act as interlingua for relating corpora in different languages and modalities through KB entities and relations. This is is especially useful for low resource languages where there are few if any aligned bilingual texts to support effective natural language processing (NLP) tasks such as machine translation or cross-lingual disambiguation. In this talk, I will elaborate on this virtuous circle, starting with building knowledge bases that map verbs to real world relations, followed by results on using knowledge bases for translating words from monolingual only corpora.

Bio: Derry Wijaya is a postdoctoral researcher at University of Pennsylvania. Her research interests include machine learning, natural language processing, and data mining. She works with Professor Chris Callison-Burch on using machine learning to build computer systems that intelligently process and understand human languages particularly under low resource and multilingual settings. She received her Ph.D. from Carnegie Mellon University working with Professor Tom Mitchell on the Never Ending Language Learning (NELL) project, and her MSc and Bachelor or Computing from National University of Singapore.