Interactive Machine Learning for Information Extraction

March 3, 2016

2:50 pm - 4:00 pm

Halligan 102

Speaker: Sameer Singh, University of Washington

Host: Roni Khardon

Abstract

Most of the world's knowledge, be it factual news, scholarly research, social communication, subjective opinions, or even fictional content, is now easily accessible as digitized text. Unfortunately, due to the unstructured nature of text, much of the useful content in these documents is hidden. The goal of "information extraction" is to address this problem: extracting meaningful, structured knowledge (such as graphs and databases) from text collections. The biggest challenges when using machine learning for information extraction include the high cost of obtaining annotated data and lack of guidance on how to understand and fix mistakes.

In this talk, I propose interpretable representations that allow users and machine learning models to interact with each other: enabling users to inject domain knowledge into machine learning and machine learning models to provide explanations as to why a specific prediction was made. I study these techniques using relation extraction as the application, an important subtask of information extraction where the goal is to identify the types of relations between entities that are expressed in text.

I first describe how symbolic domain knowledge, if provided by the user as first-order logic statements, can be injected into relational embeddings to improve the predictions. In the second part of the talk, I present an approach to "explain" machine learning predictions using symbolic representations, which the user may annotate directly for more effective supervision. I present experiments that demonstrate that an interactive interface between a user and machine learning is effective in reducing annotation effort and in quickly training accurate extraction systems.

Bio: Sameer Singh is a Postdoctoral Research Associate at the University of Washington, working on large-scale and interactive machine learning applied to information extraction and natural language processing. He received his PhD from the University of Massachusetts, Amherst, during which he also interned at Microsoft Research, Google Research, and Yahoo! Labs on massive-scale machine learning. He was recently selected as a DARPA Riser, won the grand prize in the Yelp dataset challenge, has been awarded the Yahoo! Key Scientific Challenges and the UMass Graduate School fellowships, and was a finalist for the Facebook PhD fellowship. (http://sameersingh.org)