Designing Visual Text Analysis Methods to Support Sensemaking and Modeling
The growing amount of digital documents provides us with an opportunity to study patterns in human languages, social networks, and academic discourse -- at an unprecedented scale. Visualization and machine learning are two contrasting approaches that can greatly increase our ability to make sense of this massive amount of textual information. Visualization leverages the capabilities of human perception. Machine learning leverages the power of statistical models. How do we best integrate these two approaches and build effective visual text analysis tools?
My research takes a human-centered approach. In this talk, I will present three components of my work on applying HCI (human-computer interaction) methodologies to create text visualizations that communicate the content of large text corpora: (1) empirical studies to characterize the strategies and limitations of how people process text data, (2) devising and selecting appropriate natural language processing techniques to augment our capabilities to comprehend text data, and (3) visual analysis tools that support effective explorations of large document collections, incorporate expert knowledge, and provide feedback to machine learning researchers.
I will draw on examples from the following projects. I will introduce interpretation and trust, two design considerations distilled from our experiences building the Stanford Dissertation Browser. I will present our findings on how human experts organize large document collections and the implications for word-based analyses, based on our InfoVis Topics Survey. Finally, I will introduce Termite, a visualization tool for building and refining statistical topic models.
Jason Chuang is a Ph.D. Candidate in Computer Science at Stanford University. He received his M.S. in Computational Mathematics from Stanford University, and B.Sc. in Mathematics from the University of British Columbia. His interests are in harnessing the combined strengths of human perception and statistical learning to enable scalable analyses of large data.