Extracting Timelines from Unstructured Text

February 20, 2015

12 noon - 1:15 pm

Halligan 102

Speaker: Steve Bethard, University of Alabama, Birmingham

Host: Soha Hassoun

Abstract

Extracting timelines from unstructured text is a critical component of applications like reviewing patient medical records or summarizing news stories. It is also a deceptively simple task: find the events, link each event to the time it happened, then sort the events chronologically. But human language is rarely explicit in the way that would be most convenient for a computer, and events, times and temporal relations are often implicit, left to be inferred by the reader. In this talk, I will first present a typical architecture for constructing timelines from the explicit and implicit cues of language: a series of supervised machine learning components trained on example texts whose timelines have been annotated manually by humans. Then I will show how we can improve this approach by leveraging big data that has not been annotated by humans but nonetheless reveals some patterns in how humans talk about time. Finally, I will present an alternative approach to timeline extraction, motivated by linguistic and psychological theories, that uses a structured application of machine learning methods to produce better timelines in the form of dependency structures.

Bio

Steven Bethard is an assistant professor in Computer and Information Sciences at the University of Alabama at Birmingham, where he is director of the Computational Representation and Analysis of Language (CoRAL) laboratory. He works on topics in machine learning and natural language processing, including constructing timelines from unstructured text, building information extraction models for clinical narratives, and applying machine learning to educational applications. He previously received a joint Ph.D. in Computer Science and Cognitive Science from the University of Colorado in 2007, and worked as a postdoctoral researcher at Stanford University's Natural Language Processing group, Johns Hopkins University's Human Language Technology Center of Excellence, KULeuven's Language Intelligence and Information Retrieval group in Belgium, and the University of Colorado's Center for Language and Education Research.