soha hassoun


Computer Science (primary)
Chemical & Biological Engineering
Electrical & Computer Engineering
Tufts University




Computational Systems

Modeling and Design

Contact Info

161 College Ave
Medford, MA 02155
soha (at)
Follow @sohahassoun

Office Hours

See class-specific office hours

Hassoun Lab
Computational Systems + Synthetic Biology

about     publications     members     collaborating labs     service

The mission of my research lab is to develop ANALYSIS + DESIGN tools to advance (RE-)DESIGNING BIOLOGY. Our tools provide insight into complex biological systems. They also enable building novel biological components to produce useful chemicals and therapeutics. My lab now focuses on developing MACHINE LEARNING models that are custom-tailored for biological data to build such tools. My industrial and academic experiences in design automation for electronic systems informs how I approach re-designing biology.
Research Highlights

Graph Matching and molecular alignment. A novel deep learning method for aligning two graphs, with application to molecular matching. Useful for knowing where two molecules match and differ.
Enzyme Classification on Molecules. A new method for training enzyme-specific predictors that take as input a given query substrate molecule and return whether the enzyme would act on that substrate or not.
Metabolic Disruption analysis in engineered hosts. Engineering cellular hosts may result in unexpected and undesirable byproducts due to promiscuous interactions of native/heterelogous enzymes and molecules. These effects are not only disruptive to the host metabolism but also to the intended end-objective of high yield. How do you analyze such disruptions? MDFlow is the answer!
Enzymatic Link Prediction. Learning graph representations of biochemical networks and its application to predicting enzymatic links between two molecules.
Probabilistic Analysis of Pathway Activities and Metabolite Annotations. Using inference, we learn the likelihood of metabolic pathways being responsible for presence of metabolomics measurements, and the likelihood of annotations.
Extended Metabolic Models (EMMs). This workflow utilizes our tool PROXIMAL to create Extended Metabolic Models (EMMs) that contain not only canonical substrates and products of enzymes already cataloged for an organism, but also metabolites that can form due to substrate promiscuity. We created an EMM for E. coli and to analyze metabolomics data for CHO and murine cecal microbiota.
Machine learning analysis pipeline for antibody sequences. Important features in antibody sequences are identified relative to a reference set using machine learning and statistical analysis.
ProPASS. A workflow that links synthesis pathway construction with the exploration of available enzyme sequences that are predicted soluble in the host. ProSol DB is a database cataloging the predicted solubility of over 250,000 reviewed enzyme sequences from UNIPROT.
gEFM. A method for computing elementary flux modes.
PROXIMAL. A method to derive biotransformation operators from a reaction and to apply them to a target molecule. Operators can be applied to predict xenobiotic products (as suggested in the paper), or applied more generically to identify derivative metabolites for a specific query molecule and a specified enzyme.