Feedback from example - PL research in the wild

From doing these activities students will

In class Announcements - 5 min (9:00 ~ 9:05)

Warm-up activity - 5 min (9:05 ~ 9:10)

Vocab 2 - electric boogaloo

Groups and Roles

Scribes

Managers

Ambassadors

Small group discussion - 35 min (9:10 ~ 9:45)

Below is our design questions from the 11th. Read the following reviews. In your groups, do the following tasks with examples and justifications if possible:

A review for a domain specific language for declaring memory layout.

Abstract:

In modern runtime systems, memory layout calculations are hand-coded in systems languages. Primitives in these languages are not powerful enough to describe a rich set of layouts, leading to reliance on ad-hoc macros, numerous interrelated static constants, and other boilerplate code. Memory management policies must also carefully orchestrate their application of address calculations in order to modify memory cooperatively, a task ill-suited to low-level systems languages at hand which lack proper safety mechanisms.

In this paper we introduce {DSL}, a declarative language for specifying high level memory layouts. Constraints formerly implemented by describing how to compute locations are, in Floorplan, defined declaratively using explicit layout constructs. The challenge here was to discover constructs capable of sufficiently enabling the automatic genera- tion of address calculations, including code deemed ‘unsafe’ according to the memory safety paradigm of Rust. Floorplan is implemented as a compiler for generating a Rust library from a {DSL} specification. In a case study of an existing implementation of the immix garbage collection algorithm, Floorplan eliminates 55 out of the 63 unsafe lines of code: 100% of the unsafe lines of code pertaining to memory safety.

Review 1:

This paper presents a well-motivated and interesting idea. The code shown in figure 1 and the pain that goes with it will be familiar to anyone who has implemented a memory manager. The details of {DSL}, however, seem inaccessible in this version of the paper. Part of the problem is the paper's reliance on {PARTICULAR ALGORITHM} as the sole example, and part of the problem is that the example is presented all at once; that's a problem for readers who are not already familiar with {PARTICULAR ALGORITHM}, and it means that readers do not get a handle on basic ideas before they have to deal with larger ideas. While {PARTICULAR ALGORITHM} and the benchmark results are compelling as an application of {DSL}, the paper needs more examples to provide a gentler on-ramp for readers and make clear that Floorplan is more than a domain-specific language for Immix.

As a concrete idea, consider showing how {DSL} would support the {EXAMPLE ALGORITHMS} in Bryant and O'Hallaron's textbook. Many potential readers will be familiar with those {EXAMPLE ALGORITHMS} --- and, as a bonus, might have seen how students struggle with {PROBLEMS THE DSL IS TRYING TO SOLVE}, the same as any programmer who implements a memory allocator.

I'm suggesting new material, but 12 pages is not a lot of space, and the paper is already full. To get more space, consider cutting from the version the detailed explanation of {PARTICULAR ALGORITHM} in section 4. Just Figure 5 with the results in section 6 make the example compelling enough. (Figure 4 is also really nice.) Meanwhile, explaining some simpler memory allocators will help readers understand the details and believe that {DSL} generalizes to many allocators. Section 5 looks like great work, but cutting that may also be a way to get space in this version of the paper, leaving those technical details to a future extended version.

Review 2 (ON JUST A DESIGN PAPER ON THE SAME DSL):

The goal of providing a more formal way of specifying the memory layout is definitely worth pursuing.

As this paper stands, however, I am not really sure what a {PROGRAM IN THE DSL} can be used for? At this point they do not appear to contain sufficient information to ingest raw {MEMORY LAYOUT DESCRIPTIONS} and parse/interpret them.

I also do not really understand how the approach works for relating several different parts of memory (e.g. metadata and the actual memory being managed) which is fundamental to how I understand how memory is being used in a runtime system. To be clear, when talking about how memory is laid out I would often describe how the various bits and bytes are laid out. But without combining information in multiple locations together into logical entities I would find it very difficult to describe or understand a layout. I think it would be much easier to motivate the choices for the description if there was a simple tool that used it, or if there was a way for developers to use the description as a way of providing a central authoritative description of how memory is laid out.

The authors have a tendency to be overly dramatic. They claim a culture of "specification by imperative implementation", and in section 2.4 they claim bugs are "almost impossible" to track down, and that defining constants is a way to "provide some semblance of discipline". I don't think these statements are particularly accurate or helpful.

A DSL for describing file system layouts

Abstract

A filestore is a structured collection of data files housed in a conventional hierarchical file system. Many applications use filestores as a poor-man’s database, and the correct execution of these applications requires that the collection of files, directories, and symbolic links stored on disk satisfy a variety of precise invariants. Moreover, all of these structures must have acceptable ownership, permission, and timestamp attributes. Unfortunately, current programming languages do not provide support for documenting assumptions about filestores, detecting errors in them, or safely loading from and storing to them.

This paper describes the design, implementation, and semantics of Forest, a new domain-specific language for describing filestores. The language uses a type-based metaphor to specify the expected structure, attributes, and invariants of filestores. Forest generates loading and storing functions that make it easy to connect data on disk to an isomorphic representation in memory that can be manipulated as if it were any other data structure. Forest also generates metadata that describes the degree to which the structures on the disk conform to the specification, making error detection easy. In a nutshell, Forest extends the rigorous discipline of typed programming languages to the untyped world of file systems.

We have implemented Forest as an embedded domain-specific language in Haskell. In addition to generating infrastructure for reading, writing and checking file systems, our implementation generates type class instances that make it easy to build generic tools that operate over arbitrary filestores. We illustrate the utility of this infrastructure by building a file system visualizer, a file access checker, a generic query interface, description-directed variants of several standard UNIX shell tools and (circularly) a simple Forest description inference engine. Finally, we formalize a core fragment of Forest in a semantics inspired by classical tree logics and prove round-tripping laws showing that the loading and storing functions behave sensibly.

Review 1:

I found the paper mildly interesting to read, but in the end was disappointed because this is really a niche of a niche paper. It is a domain-specific extension of a niche language, to perform extremely specific operations on directory information. The potential target audience for this is tiny.

My gut feeling also argued against the paper. In fact, I got increasingly uncomfortable while reading this, because it just seems that the file system structure is actually the wrong abstraction to manipulate for this purpose. This may be the effect of a badly chosen example - would one really model the principals as individual files here? It seems unnatural for me, and both a database solution or an XML table appear far more natural.

So I am afraid that this is a solution in search of a problem. I appreciate the information that this has been fully implemented. Not much information about the implementation is given, other than that it exists. There is a formal semantics, but really, this is a toy language anyway. So overall, this is a "cute" project, but I cannot really see it at PLDI. Perhaps this should be presented at a venue that is focusing exclusively on domain-specific languages.

Review 2:

In general, I rather like the idea of a DSL for manipulating file stores. The utility of such a language is, to me, immediately obvious, and this paper seems like a good start. Unfortunately, I feel like the current state of Forest (and this paper) feels a bit unfinished. Particular components that I think are missing:

1) A way to update file stores. Most of the described use cases essentially boil down to querying the file store for various pieces of information. But it seems that updating the store is a critical component of any filestore management solution, especially because ensuring the consistency of a store during and after updates can be quite tricky. For example, correctly moving to the next year in the Princeton system would require copying the seniors folder to the graduates folder, creating a new juniors folder, etc. Correctly implementing the update is potentially error-prone, and support for updates (with the corresponding update of the in-memory representation) would make Forest quite compelling.

2) Any sort of evaluation of the implementation. One of the stated contributions of the work is as a case study in DSL design, but there really isn't any discussion of the implementation or its performance.

Review 3:

The mechanism presented by the paper provides a very clean solution to a problem that up to this point, most languages had more or less ignored. Even languages with very good standard library support to manipulating the file system do so through relatively complicated APIs. The running example does a very good job of showing the different features of the language and its different capabilities. That said, the paper felt like it was missing a punchline; the paper starts with a formal development of the core calculus, and just when it looks like it's about to prove some property, it ends very abruptly with some related work.

Large group discussion - 20min (9:45 ~ 10:05)

Derive a plan for what feedback we would like to receive on our work and what we don't want to receive.

Cool-down activity - 10 min (10:05 ~ 10:15)

All scribes will get together to start their wiki write up to be finished at latest a week from today.

All managers will get together and make tickets for the course. This can include the tickets from their group discussions as mentioned above, a ticket to follow up on the scribe write up, and any other issues they think will be helpful.

All ambassadors will get together with Matthew and Kathleen and discuss how they think the day went, how they think the pacing went, what they are looking forward to, any worries they might have about the class, etc.

Class Dismissed

Thursday hw due on coding with the IO, Q, and Parser monads. Monday's paper will be posted soon.