Automatic Discovery and Synthesis of Checksum Algorithms from Binary Data Samples

December 18, 2020
10:00-11:00 am ET
Sococo VH 209; Zoom
Speaker: Lauren Labell
Host: Kathleen Fisher

Abstract

Reverse engineering unknown binary message formats is an important part of security research. Error detecting codes such as checksums and Cyclic Redundancy Check codes (CRCs) are commonly added to messages as a guard against corrupt or untrusted input. Therefore, before an analyst can manufacture input for software which uses checksums, they must discover the algorithm to calculate a valid checksum. To address this need, we have developed a program synthesis-based approach for detecting and reverse-engineering checksum algorithms automatically. Our approach takes a small set of binary messages as input and automatically returns a Python implementation of the checksum algorithm if one can be found.

Our approach first performs a search over the message space to identify the location of the checksum and then uses program synthesis to identify the operations performed on the message to compute the checksum. We return to the user runnable code to both calculate a checksum from a message and to validate a message according to the checksum algorithm. We also generate unit tests, allowing the user to validate the synthesized checksum algorithm is correct with regard to the input messages.

We created the Tufts Checksum Corpus consisting of 12 checksum inference questions collected from posts on reverse engineering question and answer sites and 2 instances of common internet protocol checksums. Our approach successfully synthesized the underlying checksum algorithms for 12 out of 14 cases in our test suite.

Please join meeting in Sococo, Halligan 209. Login: tuftscs.sococo.com

Join Zoom Meeting: https://tufts.zoom.us/j/98610939077

PASSWORD: see colloquium email

Dial by your location: +1 646 558 8656 US (New York)

Meeting ID: 986 1093 9077

Passcode: see colloquium email