AI for chemical space navigation and synthesis

November 10, 2022
3:00-4:15pm ET
Cummings 270, Zoom
Speaker: Connor Coley, MIT
Host: Soha Hassoun


The discovery of functional molecules is an expensive and time- consuming process, exemplified by the rising costs of small molecule therapeutic discovery. The process of discovering a preclinical candidate is a search in a vast chemical space, where we iteratively design molecular compounds and test their performance. We should think both about designing better candidates—making each iteration more informative—and about searching more rapidly—testing a greater number of candidates.

Advances in laboratory automation promise to decrease the effort required to synthesize small molecule compounds, but determining how to synthesize a molecule is still a manual process that requires significant time investment from expert chemists. Computer-aided synthesis planning (CASP) focuses on accelerating this process by recommending synthetic pathways. The two primary aspects of CASP— proposing retrosynthetic disconnections to connect the target to purchasable materials and validating proposed reactions in silico—are highly amenable to supervised learning approaches.

Machine learning and artificial intelligence have enabled new data- driven approaches to CASP where statistical models are trained directly on published experimental data. We have developed several of these tools in a software suite, ASKCOS, that is capable of proposing retrosynthetic routes to new molecules, proposing reaction conditions for each step, and assessing the likelihood of experimental success. I will talk about the many learning tasks associated with the goal of synthesis planning, the progress that we and others in the field have made, and ongoing challenges in improving the fidelity of these models.


Connor W. Coley is an Assistant Professor at MIT in the Department of Chemical Engineering and the Department of Electrical Engineering and Computer Science. He received his B.S. and Ph.D. in Chemical Engineering from Caltech and MIT, respectively, and did his postdoctoral training at the Broad Institute. His research group at MIT develops new methods at the intersection of data science, chemistry, and laboratory automation to streamline discovery in the chemical sciences with an emphasis on therapeutic discovery. Key research areas in the group include the design of new neural models for representation learning on molecules, data-driven synthesis planning, in silico strategies for predicting the outcomes of organic reactions, model-guided Bayesian optimization, and de novo molecular generation. Connor is a recipient of C&EN’s “Talented Twelve” award, Forbes Magazine’s “30 Under 30” for Healthcare, the NSF CAREER award, and the Bayer Early Excellence in Science Award.

Please join meeting in Cummings 270 or via Zoom.

Join Zoom Meeting:

Meeting ID: 960 3825 1227

Passcode: see colloquium email

Dial by your location: +1 646 558 8656 US (New York)

Meeting ID: 960 3825 1227

Passcode: see colloquium email