Using Provenance to Support Data Analysis
David's talk will be given as a lecture in the Visualization
Seminar course (COMP 250VIS). The talk will be open to the public.
=== Abstract ===
The mountains of data being gathered and generated each day have brought about opportunities to discover and test new ideas. Algorithms and visualization techniques help users explore, summarize, and filter this data, and in complex analyses, there are many connected, computational steps between raw data and published results. Science demands provenance--a careful accounting of this process--to facilitate reproducibility, but this same information can also be used to inform and speed up future investigation. VisTrails introduced change-based provenance, represented as a version tree, to capture not only version history but also the exact actions taken when constructing analysis workflows. Users can interact with a version tree to revisit past work, create branches with new changes, and add annotations. Furthermore, this provenance is rich data that can be used to offer suggestions and provide analogy-based update functions. Recently, we have constructed more intuitive operations over workflow collections by modifying the version tree itself instead of individual workflows. This has led to interesting questions about the provenance of the resulting workflows.
=== Bio ===
David Koop is an Assistant Professor in the Computer and Information Science Department at UMass Dartmouth. He received his Ph.D.in Computing from the University of Utah in 2012. His research interests include data visualization, computational provenance, and scientific data management. He has served as a core developer for the VisTrails project and has collaborated with scientists in the fields of climate science, quantum physics, and invasive species modeling.