Towards Reliable Machine Learning for Health

February 20, 2020
3:00 - 4:00 pm
Halligan 102
Speaker: Shalmali Joshi
Host:

Abstract

Machine Learning (ML) has shown tremendous success in computer vision and natural language understanding. Despite this, advances in ML fall short when reliability for applications such as healthcare is considered. In particular, widely used domains such as multivariate time series data have not been investigated for reliability. Understanding how ML models learn can help identify vulnerabilities and lead to new approaches for improved reliability. In the first part of my talk, I propose a novel method to characterize the feature level importance of multivariate time series models. To understand how these models rely on features, I use an attribution method that leverages carefully designed perturbations using deep generative models. This is the first method to characterize observation-level importance over time for every individual feature in a multivariate time series model. In my second contribution, I demonstrate how the reliability of ML models can be improved by incorporating domain constraints. In particular, I show how to incorporate these constraints as noisy or weak supervision for learning phenotypes for chronic conditions using non-negative matrix factorization. The proposed framework enhances the identifiability of unsupervised methods and provides improved clinical relevance of the learned phenotypes. Finally, I demonstrate how to audit the reliability of the existing ML-based decision-making tools for target objectives such as fairness. To do so, I demonstrate the use of variational auto-encoder based generative models to provide algorithmic recourse for improving outcomes under ML-based decision-making systems. I end with a research vision of enhancing the reliability of ML tools by combining improved representations and scalable causal inference for applications ranging from supervised classification, reinforcement learning, and fair machine learning for healthcare.