Addressing Bias and Subjectivity in Machine Learning
Abstract
The success of supervised machine learning algorithms rests on the
assumption that data are drawn from the same underlying distribution.
However, this assumption is often violated in real world applications
where collected data involves human judgement. In this talk, we
propose a collection of approaches that address bias and subjectivity
in real world data. We illustrate our work through three applications:
predicting disease progression in Multiple Sclerosis (MS) patients,
detecting epileptogenic lesions in focal cortical dysplasia (FCD)
patients and selecting the best performing students in the graduate
admission process. In each of these applications, subjectivity and/or
bias manifest themselves in different ways.
We present a total of four models each of which takes on unique
challenges associated with each task. In the MS research, we introduce
two models to estimate the prognosis for MS patients while addressing
the patient bias and physician subjectivity in the data: a
classification model that predicts the MS disease progression (‘high’
versus ‘low’), and a regression model that forecasts the actual MS
severity scores. In the epilepsy research, we present a model that
addresses the paucity of features from MRI images and biases in the
data originated from inter-patient variability. Lastly, in the third
application, we introduce a new variant of SVM that exploits both
labeled and unlabeled data and addresses the subjectivity arising from
the admission process.