PhD Defense: Learning from Users' Interactions with Visual Analytics Systems
Experts in disparate fields from biology to business are increasingly called upon to make decisions based on data, but their background is not in data science, which is itself a separate field requiring years to master. Machine learning approaches tend to focus on finding a black box answer, which the user may not understand or trust. Visualization on its own can leverage the power of human insight, but may miss out on the computational power provided by automated analysis. Visual analytics researchers aim to provide tools for domain experts to find the patterns they need in their data, and have recently been interested in systems that combine the two approaches. One promising method is to blend the best of visualization and machine learning by building systems that provide interfaces for users to explore their data interactively with visual tools, gather their feedback through interaction mechanisms, and apply that feedback to use machine learning to build analytical models.
In this dissertation, I discuss my research on such systems, showing techniques for learning from user interactions about the data and about the users themselves. Specifically, I first describe a prototype system for learning distance functions from user interactions with high-dimensional data. These distance functions are weighted Euclidean functions that are human-readable as the relative importance of the dimensions of the data. Next, I show an adaptation of that prototype for text documents, with a study showing how to make use of the vector representation of the distance functions for numerically examining the analysis processes of the participants’. Observing that users of such systems may be required to review large amounts of data to be effective, I propose an algorithm for better leveraging user efforts in this interactive context. Turning the focus of the learning back onto the user, I provide a proof-of-concept that models of users as opposed to the data can be learned from their interactions. Finally, I introduce the sketch of a framework for future systems that will empower data stakeholders to find the answers they need without leaving their comfort zone.