Using Provenance to Support Data Analysis
Abstract
David's talk will be given as a lecture in the Visualization
Seminar course (COMP 250VIS). The talk will be open to the public.
=== Abstract ===
The mountains of data being gathered and generated each day have
brought about opportunities to discover and test new ideas.
Algorithms and visualization techniques help users explore,
summarize, and filter this data, and in complex analyses, there
are many connected, computational steps between raw data and
published results. Science demands provenance--a careful
accounting of this process--to facilitate reproducibility, but
this same information can also be used to inform and speed up
future investigation. VisTrails introduced change-based
provenance, represented as a version tree, to capture not only
version history but also the exact actions taken when constructing
analysis workflows. Users can interact with a version tree to
revisit past work, create branches with new changes, and add
annotations. Furthermore, this provenance is rich data that can be
used to offer suggestions and provide analogy-based update
functions. Recently, we have constructed more intuitive operations
over workflow collections by modifying the version tree itself
instead of individual workflows. This has led to interesting
questions about the provenance of the resulting workflows.
=== Bio ===
David Koop is an Assistant Professor in the Computer and
Information Science Department at UMass Dartmouth. He received his
Ph.D.in Computing from the University of Utah in 2012. His
research interests include data visualization, computational
provenance, and scientific data management. He has served as a
core developer for the VisTrails project and has collaborated with
scientists in the fields of climate science, quantum physics, and
invasive species modeling.