Knowledge Graph learning using Open Information Extraction

September 13, 2018

3:00 PM

Halligan 102

Speaker: Damir Cavar, Indiana University

Host: Matthias Scheutz

Abstract

Using Knowledge Graphs as encodings of world knowledge in domains like conversational agents and dialog systems is a very common approach in AI-related systems. The automatic acquisition of coherent and consistent Knowledge Graphs from spoken language conversations or unstructured text sources is the main topic of this presentation. We will discuss an ongoing project at IU for Open Domain Information Extraction (OpenIE) from unstructured text sources. Our approach is based on deep linguistic processing (NLP) of text. Core semantic relations from unstructured text sources are mapped at the clause and sentence level onto structured and well-defined graph representations. My goal is to experiment with detailed concept and relation extraction using NLP technologies that is maximally language agnostic. Based on the extracted graphs of entities and relations we perform linking of concepts to large Knowledge Graphs using for example resources like YAGO or the Microsoft Concept Graph. The entity link disambiguation is based on vector based concept and hypernym similarity computation, using word embeddings and graph-based algorithms (e.g. Edge2Vec, Structure2Vec). For the linking of extracted entity relations to semantic predicate encodings we utilize VerbNet and PropBank (also the Universal Proposition Bank). These (large) general Knowledge Graphs, and verb or predicate semantics databases allow us to link entities and relations from multi-lingual texts to unique language-independent semantic identifiers in the resulting graphs, rendering a language independent graph representation of concepts and relations extracted from unstructured text sources from different linguistic origin. In addition to linking with general KGs, we foresee the possibility to link concepts and relations to domain specific taxonomies, ontologies, or KGs, e.g. for the medical, legal, business, or cybersecurity domain. The resulting graph representations can be considered Description Logic approximations of the text content that allow us to reason using for example OWL-based ontologies, to match content, to generate summaries, and in general provide a representation of content that can be queried using common graph query languages (e.g. SPARQL or Cypher). I argue that such a symbolic and semantic mapping of text to restricted and controlled concept graphs, via our Text2ConceptGraph extractor, opens up new ways to handle world and discourse knowledge in autonomous agents, or to measure for example text similarities using these hybrid methods. These approaches can outperform purely end-to-end based Deep Learning semantic search methods significantly. The resulting graphs can also be seen as knowledge representations that can be integrated into conversational systems and chatbots, and a wide variety of systems or applications. The current system is a scalable and flexible ensemble of RESTful Microservices, mainly platform independent and based on open software standards.

Bio

Damir Cavar studied Linguistics and Computer Science in Germany, at the University of Frankfurt, University of Potsdam, University of Hamburg (CS, Natural Language Division), and the Technical University of Berlin (CS, AI Division). His research has focused on Computational Linguistics, with appointments both in the U.S. and in the Republic of Croatia. His current research focus is on computational semantics and pragmatics in NLP, aspects of ontology learning, knowledge graphs, concept graph-based search and content similarity metrics related to Information Extraction and Information Retrieval. He is currently an associate professor at Indiana University.