Match-Reference Regular Expressions and Match-Reference Bijective Lenses.

June 12, 2023

10:00 am ET

Cummings 402

Speaker: Jeanne-Marie Musca - PhD Defense

Host: Kathleen Fisher

Abstract

PhD Defense:

Bidirectional transformations of data are ubiquitous in programming tasks, and a domain-specific language called Boomerang developed by Foster et al. allows its users to write such bidirectional transformations as lenses that encode a translation and its inverse in a single expression. Boomerang's core set of bijective lenses provide strong correctness guarantees, but have limited expressiveness, given that they translate between formats that are represented by regular expressions. We present match-reference lenses, which extend the core bijective lenses and maintain the strong correctness guarantees for translations between data formats that have repeated sub-strings, a foundational kind of dependency. To represent the formats, we have designed match-reference regexes, a theoretically well-defined version of PCRE back-references. Together, match-reference regexes and match- reference lenses provide the ability to write translations that are guaranteed to maintain internal dependencies in a dataset.

In this thesis, we present match-reference regexes and match- reference lenses as follows. First, we focus on match-reference regexes. We show how to use match-reference regexes to represent a variety of formats, and we introduce a machine, a match-reference regex automata system (MRRAS), that decides membership in the language of match-reference regexes. We give a formal specification of the semantics of match-reference regexes and, in that context, we prove that the compilation from match-reference regexes to their corresponding machines is correct. Second, we present match-reference lenses. We extend the core bijective lenses' type system, using match- reference regexes as types for match-reference lenses. We introduce a big-step semantics for lenses and use it to prove that match-reference lenses satisfy two properties: they perform the expected translations and they obey round-tripping laws. Third, we present our prototype implementation of match-reference lenses which includes both a naive implementation and a more efficient implementation. The efficient implementation uses transducers that are modified versions of MRRASs. We use the naive implementation which is correct by construction but inefficient to validate our faster implementation. We apply our implementation to a selection of real-world datasets.