Formal Verification of Top-Down Parser Interpreters

May 25, 2022
10:00 am ET
JCC 280, Zoom
Speaker: Samuel Lasser
Host: Kathleen Fisher


PhD Defense:

Parsers are security-critical components of many software systems, and verified parsing therefore has a key role to play in secure software design. However, existing verified parsers for context-free grammars (CFGs) are limited in terms of their termination properties, performance characteristics, or expressiveness. Some verified parsers do not guarantee termination on all inputs. Others are not designed to be performant on grammars for commonly used programming languages and data formats. Finally, real-world data formats often have non- context-free specifications that purely CFG-based parsers cannot fully capture.

This talk will present Vermillion and CoStar, two formally verified parser interpreters that represent attractive points in the verified parsing space because of their termination and correctness guarantees, their empirical performance characteristics, and their expressiveness. Vermillion and CoStar are based on the LL(1) and ALL(*) parsing algorithms, respectively. We use the Coq Proof Assistant to implement both interpreters, and to prove them sound and complete with respect to high-level specifications. Vermillion provably terminates without error on all LL(1) grammars (grammars for which parsing decisions are unambiguous given a single token of lookahead), and CoStar guarantees error-free termination on all non-left-recursive grammars. Vermillion and CoStar demonstrably run in linear time on a range of grammars for popular programming languages and data formats. In addition, CoStar accepts grammars augmented with semantic predicates-- functions that represent non-context-free components of the language specification. CoStar applies these functions at parse time to ensure that its input is semantically well-formed. As part of the CoStar performance evaluation, we integrate the interpreter with Verbatim, a verified lexer that we developed in parallel. Together, these two tools constitute a verified pipeline for lexing and parsing.

Research area: programming languages, formal methods

Please join meeting in Cummings 280 or via Zoom.

Password: See colloquium email

Dial-in not an option for this event.