Machine Learning for Spectra Translation

April 14, 2023
10:20am ET
Cummings 265
Speaker: Frederick Zhang - Quals talk
Host: Soha Hassoun


Quals talk:

Mass spectrometers are valuable instruments for the identification of molecules in biological samples, thus expediting biological discovery and aiding biological engineering applications. The mass spectrometer characterizes each molecule by a spectrum, which provides a unique signature for each molecule. Despite progress in creating spectral libraries and in developing annotation tools that map measured spectra to molecular identities, assigning molecular identities to spectra remains a difficult challenge. Molecular annotation is made more challenging as biological samples are often analyzed under different instrument settings or using different mass spectrometry instruments to maximize the identification of molecules within the sample. In this work, we address the problem of translating spectra measured under one instrument (source) setting to another spectra that represents the measured spectra under a second (target) instrument setting. We first investigate the “amenability” problem, where we predict the likelihood of the translatability of a spectra from source to target setting. We then develop two translation models, an encoder/decoder model and an attention based model. To evaluate our amenability and translation models, we utilize the NIST2020 dataset, and evaluate translation between positive and negative ionization modes, [M+H]+ and [M-H]-, respectively. Our amenability prediction model results in an AUROC of 0.63 when translating from positive ionization mode, and 0.80 when translating from a negative ionization mode. Our translation from positive-mode spectra results in 0.48 cosine similarity between the predicted negative mode-spectra and the ground truth data, and 0.53 cosine similarity when translating from positive-mode spectra. Importantly, our model outperforms a base combined model that translates spectra to molecular fingerprint and fingerprint to spectra, which results in 0.33 and 0.41 concise similarity when translating from positive and from negative modes, respectively. Our work demonstrates that value in end-to-end training for the spectra translation task.

Research area: Machine learning, metabolomics