Discovering Disease Subtypes that Improve Treatment Predictions: Interpretable Machine Learning for Personalized Medicine

March 15, 2018
Halligan 102
Speaker: Michael Hughes, Harvard University
Host: Matthias Scheutz


For complex diseases like depression, choosing a successful treatment from several possible drugs remains a trial-and-error process in current clinical practice. By applying statistical machine learning to the electronic health records of thousands of patients, can we discover subtypes of disease which both improve population-wide understanding and improve patient-specific drug recommendations? One popular approach is to represent noisy, high-dimensional health records as mixtures of low-dimensional subtypes via a probabilistic topic model. I will introduce this common dimensionality reduction method and explain how off-the-shelf topic models are misspecified for downstream prediction tasks across many domains from text analysis to healthcare. To overcome these poor predictions, I will introduce a new framework -- prediction-constrained training -- which learns interpretable topic models that offer competitive drug recommendations. I will also discuss open challenges in using machine learning to improve clinical decision-making.


Michael C. Hughes ("Mike") is currently a postdoctoral fellow in computer science at Harvard University, where he develops new machine learning methods for healthcare applications with Prof. Finale Doshi-Velez. His current research focus is helping clinicians understand and treat complex diseases like depression by training statistical models from big, messy electronic health record datasets. Other research interests include Bayesian data analysis, optimization algorithms, and any machine learning applications that advance medicine and the sciences. He completed a Ph.D. in the Department of Computer Science at Brown University in May 2016, advised by Prof. Erik Sudderth. You can find his papers and code on the web at