Programming Language Ideas Escape the Lab: A Declarative Data Description Language for Managing Ad hoc Data

April 1, 2010
2:50 pm - 4:00 pm
Halligan 111
Speaker: Kathleen Fisher, AT&T Labs Research
Host: Carla Brodley

Abstract

XML. HTML. CSV. JPEG. MPEG. These data formats represent vast quantities of scientific, governmental, and industrial data. Because the formats have been standardized and are widely used, many reliable, efficient, and convenient tools exist for processing such data. In an ideal world, all data would be in such formats. In reality, vast amounts of data exist in ad hoc formats, which do not have readily available tools. Every day, financial analysts, computer scientists, physical scientists, and others deal with ad hoc data in a myriad of complex formats, wasting valuable time on low-level chores like parsing and format translation instead of actually using the information stored in their data.

In this talk, I will describe the PADS data description language that we have created to address this problem. PADS allows users to describe both the physical layout of ad hoc data sources and semantic properties of that data. From such descriptions, the PADS compiler generates libraries and tools for manipulating the data, including parsing routines, statistical profiling tools, translation programs to produce well-behaved formats such as XML, and tools for running queries over raw PADS data sources. I will highlight how various ideas from the programming language research community have informed the design and implementation of the PADS system.

Information about PADS and a list of contributors is available from the project web site: www.padsproj.org.

About Kathleen Fisher

Kathleen Fisher is a Principal Member of the Technical Staff at AT&T Labs Research, where she has worked since receiving her Ph.D. in Computer Science from Stanford University in 1996. Throughout her career, Kathleen has worked to apply ideas from programming language research to other domains. For example, she co-led the effort to design and build Hancock, a domain-specific language for cleanly expressing and efficiently computing signatures, which are evolving customer profiles used to detect fraud.

Kathleen is Past Chair of SIGPLAN, which is ACM's Special Interest Group on Programming Languages. She is an elected member of the CRA Board and she is co-chair of CRA-W, CRA's committee focused on increasing the representation of women in research roles in computer science. She is an editor of the Journal of Functional Programming, and has served as program chair for the research meetings FOOL (Foundations of Object-Oriented Languages), CUFP (Commercial Uses of Functional Programming), and ICFP (International Conference on Functional Programming).