Addressing Bias and Subjectivity in Machine Learning

April 14, 2017
Halligan 102
Speaker: Yijun Zhao, Tufts University
Host: Roni Khardon


The success of supervised machine learning algorithms rests on the assumption that data are drawn from the same underlying distribution. However, this assumption is often violated in real world applications where collected data involves human judgement. In this talk, we propose a collection of approaches that address bias and subjectivity in real world data. We illustrate our work through three applications: predicting disease progression in Multiple Sclerosis (MS) patients, detecting epileptogenic lesions in focal cortical dysplasia (FCD) patients and selecting the best performing students in the graduate admission process. In each of these applications, subjectivity and/or bias manifest themselves in different ways.

We present a total of four models each of which takes on unique challenges associated with each task. In the MS research, we introduce two models to estimate the prognosis for MS patients while addressing the patient bias and physician subjectivity in the data: a classification model that predicts the MS disease progression (‘high’ versus ‘low’), and a regression model that forecasts the actual MS severity scores. In the epilepsy research, we present a model that addresses the paucity of features from MRI images and biases in the data originated from inter-patient variability. Lastly, in the third application, we introduce a new variant of SVM that exploits both labeled and unlabeled data and addresses the subjectivity arising from the admission process.