People-centric Natural Language Processing
Machine learning and natural language processing provide a framework for extracting meaning from text, and have given us great advances over the past fifty years in areas as diverse as machine translation, question answering and information extraction. Many of the written texts that we apply these techniques to -- news articles, emails, social media, books - - are the product of a profoundly social phenomenon with people at its core. People are the authors of text, they are its audience, and often the subject of that text itself: news articles detail the roles of actors in current events, social media (including Twitter and Facebook) documents the actions and attitudes of friends, and books chronicle the stories of fictional characters and real people alike.
In this talk, I will present a set of probabilistic latent variable models that learn patterns of identity and behavior in descriptions of people in text. Unsupervised models of personas allow us to learn abstract entity types from the stereotypical actions they perform in movies and books; and unsupervised models of biographical structure allow us to learn the way different life events (such as graduating high school, marriage, and becoming a citizen) are described in text, along with the typical times in a person's life when they occur. This work reveals large-scale patterns in descriptions of people while also uncovering implicit biases of the authors; I argue that developing computational models that capture the complexity of the interaction of people with text will yield deeper, socio-culturally relevant descriptions of these actors, and that these deeper representations open the door to socially-aware language technologies that have a more useful understanding of the world.
David Bamman is a PhD candidate in the School of Computer Science at Carnegie Mellon University. His research applies natural language processing and statistical machine learning to empirical questions in the humanities and social sciences. David has published in collaboration with co-authors whose home departments include English, Linguistics, Classics, and Near Eastern Studies, and designed and co-taught an interdisciplinary (English/Computer Science) course at CMU on "Digital Literary and Cultural Studies," for which he received Carnegie Mellon's 2014 Alan J. Perlis Teaching Award. Prior to CMU, David was a senior researcher in computational linguistics at the Perseus Project of Tufts University.