Tufts ML Alumni
Bio [+]
Current location: Brown University
Homepage: http://www.cebm.brown.edu/byron
Associated Publications: [+]
Authors: Carla E. Brodley, Umaa Rebbapragada, Kevin Small, and Byron C. Wallace.
Artificial Intelligence Magazine
33
Year: 2012
Authors: Byron C. Wallace, Kevin Small, Carla E. Brodley, Joseph Lau and Thomas A. Trikalinos
International symposium on Health Informatics (IHI)
Year: 2012
Authors: Byron C. Wallace
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)
Year: 2012
Authors: Byron C. Wallace, Kevin Small, Carla E. Brodley, Joseph Lau, Chistopher H. Schmid, Lars Bertram, Christina M. Lill, Josh T. Cohen, and Thomas A. Trikalinos
Genetics in Medicine
Year: 2012
Authors: Byron C. Wallace, Kevin Small, Carla E. Brodley and Thomas A. Trikalinos
International Conference on Data Mining (ICDM)
Year: 2011
Authors: Kevin Small, Byron C. Wallace, Carla E. Brodley, Thomas A. Trikalinos
ICML
Year: 2011
Abstract: Applying supervised learning methods to new classification tasks
requires domain experts to label sufficient training data for the
classifier to achieve acceptable performance. It is desirable to
mitigate this annotation effort. To this end, a pertinent observation
is that instance labels are often an indirect form of supervision; it
may be more efficient to impart domain knowledge directly to the model
in the form of {it labeled-features}. We present a novel algorithm for exploiting such domain knowledge which we call the
emph{Constrained Weight Space SVM} (CW-SVM). In addition to
exploiting binary labeled features, our approach allows domain experts
to provide {it ranked} labeled features, and, more generally, to
express arbitrary expected relationships between sets of features. Our
empirical results show that the CW-SVM outperforms both baseline
supervised learning strategies and previously proposed methods for
learning with labeled features.
Authors: Byron C. Wallace, Kevin Small, Carla E. Brodley and Thomas A. Trikalinos
Proc. of the SIAM International Conference on Data Mining (SDM)
Year: 2011
Abstract: The active learning (AL) framework is an increasingly popular strategy for reducing the amount of human labeling effort required to induce a predictive model. Most work in AL has assumed that a single, infallible oracle provides labels requested by the learner at a fixed cost. However, real-world applications suitable for AL often include multiple domain experts willing to provide labels of varying cost and quality. We explore this multiple expert active learning (MEAL) scenario and develop a novel algorithm for instance allocation that exploits the meta-cognitive abilities of novice (cheap) experts in order to make the best use of the experienced (expensive) annotators. We demonstrate that this strategy outperforms strong baseline approaches to MEAL on both a sentiment analysis dataset and two datasets from our motivating application of biomedical citation screening. Furthermore, we provide evidence that novice labelers are often aware of which instances they are likely to mislabel.
Authors: Byron C Wallace, Kevin Small, Carla E Brodley, Thomas A Trikalinos
KDD
Year: 2010
Abstract: Active learning (AL) is an increasingly popular strategy for mitigating the amount of labeled data required to train classifers, thereby reducing annotator effort. We describe a real-world, deployed application of AL to the problem of biomedical citation screening for systematic reviews at the Tufts Evidence-based Practice Center. We propose a novel active learning strategy that exploits a priori domain knowledge provided by the expert (specifically, labeled features) and extend this model via a Linear Programming algorithm for situations where the expert can provide ranked labeled features. Our methods outperform existing AL strategies on three real-world systematic review datasets. Additionally, we propose a new evaluation framework for finite-pool
scenarios, wherein the primary aim is to label a fixed set of examples rather than to simply induce a good predictive model. We argue that evaluation must be specific to the scenario under consideration. To this end, we use a method from medical decision theory for eliciting the relative costs of false positives and false negatives from the domain expert, constructing a utility measure of classification performance that integrates the expert preferences. Our findings suggest that the expert can, and should, provide more information than instance labels alone. In addition to achieving strong empirical results on the citation screening problem, this work outlines many important steps for moving away from simulated active learning and toward deploying AL for real-world applications.
Url: http://tuftscaes.org/citation_screening/articles/wallace_et_al_kdd_2010_preprint.pdf
Authors: Byron C. Wallace, Kevin Small, Carla E. Brodley, Joseph Lau and Thomas A. Trikalinos
ACM International Health Informatics Symposium
Year: 2010
Abstract: Comparative effectiveness reviews (CERs), a central methodology of comparative effectiveness research, are increasingly used to inform healthcare decisions. During these systematic reviews of the scientific literature, the reviewers (MD-methodologists) must screen several thousands of citations for eligibility according to a pre-specified protocol. While previous research has demonstrated the theoretical potential of machine learning to reduce the workload in CERs, practical obstacles to deploying such a system remain. In this article, we describe work on an end-to-end, interactive machine learning system for assisting reviewers with the tedious task of citation screening for CERs. Specifically, we present extsc{Abstrackr}, our open-source annotation tool. In addition to allowing reviewers to designate citations as `relevant' or `irrelevant' to the review at hand, extsc{Abstrackr} facilitates communicating other information useful to the classification model, such as terms that are suggestive of the relevance (or irrelevance) of a citation. The tool also records the time taken to screen citations, over which we conducted a time-series analysis to derive an annotator model. Using this model, we found that both the order in which the citations are screened and the length of each citation affect annotation time. We propose a strategy that integrates labeled terms and timing data into the Active Learning (AL) framework, in which an algorithm selects citations for the reviewer to label. We demonstrate empirically that this additional information can improve the performance of the semi-automated citation screening system.
Authors: Byron C Wallace, Thomas A Trikalinos, Joseph Lau, Carla Brodley, Christopher H Schmid
BMC Bioinformatics
11
Year: 2010
Abstract: Background
Systematic reviews address a specific clinical question by unbiasedly assessing and analyzing the pertinent literature. Citation screening is a time-consuming and critical step in systematic reviews. Typically, reviewers must evaluate thousands of citations to identify articles eligible for a given review. We explore the application of machine learning techniques to semi-automate citation screening, thereby reducing the reviewers' workload.
Results
We present a novel online classification strategy for citation screening to automatically discriminate "relevant" from "irrelevant" citations. We use an ensemble of Support Vector Machines (SVMs) built over different feature-spaces (e.g., abstract and title text), and trained interactively by the reviewer(s).
Semi-automating the citation screening process is difficult because any such strategy must identify all citations eligible for the systematic review. This requirement is made harder still due to class imbalance; there are far fewer "relevant" than "irrelevant" citations for any given systematic review. To address these challenges we employ a custom active-learning strategy developed specifically for imbalanced datasets. Further, we introduce a novel undersampling technique. We provide experimental results over three real-world systematic review datasets, and demonstrate that our algorithm is able to reduce the number of citations that must be screened manually by nearly half in two of these, and by around 40% in the third, without excluding any of the citations eligible for the systematic review.
Conclusions
We have developed a semi-automated citation screening algorithm for systematic reviews that has the potential to substantially reduce the number of citations reviewers have to manually screen, without compromising the quality and comprehensiveness of the review.
Url: http://www.biomedcentral.com/1471-2105/11/55
Current Research Topics:
Description:
Description:
Past Research Topics: [+]
Description: We are looking at problems related to the generation of training data. We are interested in two scenarios. 1) A new class of problems we have defined, Active Class Selection (ACS). ACS addresses the question: if one can collect n additional instances, how should they be distributed with respect to class? 2) Active Learning, in which one requests labels for existing training data.
Specifically, Active Class Selection addresses the tasks for which one can control the classes from which training data are generated. In such cases, utilizing feedback during learning to guide the generation of new training data will yield better performance than learning from an a priori fixed class distribution. Our methods work within a multi-armed bandit framework.
In regard to active learning, we are investigating several real-world issues. Speficially, how to perform active learning in the context of severe class imbalance, how to adapt to changes in the underlying concept to be learned (concept drift), and how to inject domain knowledge into the AL framework.