Using Biotransformation Rules to Generate Novel Molecular Structures for Spectra Annotation

April 7, 2023
2:30pm EST
Cummings 265
Speaker: Margaret Martin - Quals talk
Host: Soha Hassoun

Abstract

Quals Talk:

Despite increased reference library coverage and new tools, metabolite annotation remains a pressing problem hindering the interpretation and analysis of spectral data. As biochemically-related compounds often exhibit high structural similarity and subsequently, high spectral similarity, our hypothesis is that known biochemical transformations can recover the identity of an unannotated spectrum. Here, we present a method that suggests candidate molecular structures for an unannotated spectrum using biotransformation rules, given the structure of a known similar compound and its mass difference from the unknown compound. We first build biotransformation rules from known biochemical transformations (e.g., methylation, reduction, oxidation, etc.). Then, we apply the rules that exhibit the inputted mass difference to the inputted known structure to generate candidates for the target molecule. Finally, we rank the generated candidates by the likelihood that the transformations are applied to atomic locations within the structure, using a state-of-the-art graph neural network- based approach that considers the enzyme class associated with the biotransformation. To test our hypothesis and evaluate our method, we create a dataset of 60,368 paired compounds that includes data from 1,335 experiments. These pairs consist of two compounds that have high spectrally similarity and as a result, are appropriate to test our hypothesis. Our method first builds 8,543 biotransformations. Our method identifies the true structure in 4.55% of pairs, ranking the true structure first in 63.52% of cases and second in 83.84% of cases. The average rank is 1.66. The true generated candidates map to over 120,000 spectra in 360 datasets. Thus, applying this method to such spectral data provides important new biological insights for a wide variety of scientific studies.

Research area: Computational Method for Metabolite Annotation