Datastructure Driven Array Programming

February 7, 2024
JCC 402
Host: Jeff Foster


From FORTRAN to Numpy, arrays have revolutionized how we express computation. Arrays are the highest-performing datastructure with a long history of investment and innovation, from hardware support to compiler technology. However, arrays can only handle dense rectilinear integer grids. Real world arrays often contain underlying structure, such as sparsity, runs of repeated values, or symmetry. In this talk, we will describe a compiler, Finch, which adapts existing programs and interfaces to the structure and sparsity of the inputs. We also extend the array abstraction beyond integer grids to continuous data (e.g., A[3.14159]). Finch enables programmers to capture complex, real-world data scenarios with the same productivity they expect from dense arrays. Our approach enables new loop optimizations across multiple domains, unifying techniques such as sparse tensors, databases, and lossless compression.

In this talk, we will show how Finch uses a language of basic loop building blocks called Looplets to hierarchically decompose structured sparsity and generate efficient code. We then discuss the problem of compiler optimization in this new datastructure-driven programming paradigm. We give a notation for the asymptotic cost of sparse programs, and a tool, Pigeon, which can automatically optimize loop order and temporaries in sparse programs to minimize asymptotic cost.

Willow Ahrens is a Ph.D. student at MIT studying tensor compilers, advised by Saman Amarasinghe and graduating next year. She is the developer of Finch, a productive datastructure-driven array programming language. Willow received her BS in Computer Science with a minor in Mathematics from University of California, Berkeley. Willow is a Department of Energy Computational Science Graduate Fellow, and values scientific applications. Willow is also a glassblower, and teaches first-time glassblowers at the MIT Glass Lab.