tufts logo
department of computer science
machine learning group




Log In
  • HOME
  • RESEARCH
  • PEOPLE
  • PUBLICATIONS
  • DATA/SOFTWARE
  • PROSPECTIVE STUDENTS
  • COURSES



Byron Wallace, Ph.D.

Tufts ML Alumni


Bio  [+]

Current location: Brown University


Homepage:  http://www.cebm.brown.edu/byron


Associated Publications:  [+]

  • Carla E. Brodley, Umaa Rebbapragada, Kevin Small, and Byron C. Wallace., Challenges and opportunities in applied machine learning, Artificial Intelligence Magazine, 33, 2012  [+]

    Authors: Carla E. Brodley, Umaa Rebbapragada, Kevin Small, and Byron C. Wallace.

    Artificial Intelligence Magazine
    33

    Year: 2012


    Associated Research Topics:
    • Time Series Data Mining
    • Multiple Expert Active Learning
    • Finding and Eliminating Class Label Noise
    • Active Learning under Class Imbalance
    • Active Class Selection/Active Learning

    Affiliated Tufts Members:
    • Carla Brodley

    Tufts / Purdue Alumni:
    • Umaa Rebbapragada
    • Kevin Small
    • Byron Wallace
  • Byron C. Wallace, Kevin Small, Carla E. Brodley, Joseph Lau and Thomas A. Trikalinos, Deploying an Interactive Machine Learning System in an Evidence-Based Practice Center, International symposium on Health Informatics (IHI), 2012  [+]

    Authors: Byron C. Wallace, Kevin Small, Carla E. Brodley, Joseph Lau and Thomas A. Trikalinos

    International symposium on Health Informatics (IHI)

    Year: 2012


    Associated Research Topics:
    • Active Class Selection/Active Learning
    • Active Learning under Class Imbalance

    Affiliated Tufts Members:
    • Carla Brodley

    Tufts / Purdue Alumni:
    • Byron Wallace
    • Kevin Small
  • Byron C. Wallace, Multiple narrative disentanglement: Unraveling infinite jest, Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2012  [+]

    Authors: Byron C. Wallace

    Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)

    Year: 2012


    Affiliated Tufts Members:
    • None.

    Tufts / Purdue Alumni:
    • Byron Wallace
  • Byron C. Wallace, Kevin Small, Carla E. Brodley, Joseph Lau, Chistopher H. Schmid, Lars Bertram, Christina M. Lill, Josh T. Cohen, and Thomas A. Trikalinos, Towards Modernizing the Systematic Review Pipeline: Efficient Updating via Data Mining, Genetics in Medicine, 2012  [+]

    Authors: Byron C. Wallace, Kevin Small, Carla E. Brodley, Joseph Lau, Chistopher H. Schmid, Lars Bertram, Christina M. Lill, Josh T. Cohen, and Thomas A. Trikalinos

    Genetics in Medicine

    Year: 2012


    Affiliated Tufts Members:
    • Carla Brodley

    Tufts / Purdue Alumni:
    • Kevin Small
    • Byron Wallace
  • Byron C. Wallace, Kevin Small, Carla E. Brodley and Thomas A. Trikalinos, Class Imbalance, Redux, International Conference on Data Mining (ICDM), 2011  [+]

    Authors: Byron C. Wallace, Kevin Small, Carla E. Brodley and Thomas A. Trikalinos

    International Conference on Data Mining (ICDM)

    Year: 2011


    Associated Research Topics:
    • Active Class Selection/Active Learning

    Affiliated Tufts Members:
    • Carla Brodley

    Tufts / Purdue Alumni:
    • Kevin Small
    • Byron Wallace
  • Kevin Small, Byron C. Wallace, Carla E. Brodley, Thomas A. Trikalinos, The Constrained Weight-Space SVM: Learning With Ranked Features, ICML, 2011  [+]

    Authors: Kevin Small, Byron C. Wallace, Carla E. Brodley, Thomas A. Trikalinos

    ICML

    Year: 2011


    Abstract:  Applying supervised learning methods to new classification tasks
    requires domain experts to label sufficient training data for the
    classifier to achieve acceptable performance. It is desirable to
    mitigate this annotation effort. To this end, a pertinent observation
    is that instance labels are often an indirect form of supervision; it
    may be more efficient to impart domain knowledge directly to the model
    in the form of {it labeled-features}. We present a novel algorithm for exploiting such domain knowledge which we call the
    emph{Constrained Weight Space SVM} (CW-SVM). In addition to
    exploiting binary labeled features, our approach allows domain experts
    to provide {it ranked} labeled features, and, more generally, to
    express arbitrary expected relationships between sets of features. Our
    empirical results show that the CW-SVM outperforms both baseline
    supervised learning strategies and previously proposed methods for
    learning with labeled features.


    Associated Research Topics:
    • Active Class Selection/Active Learning

    Affiliated Tufts Members:
    • Carla Brodley

    Tufts / Purdue Alumni:
    • Kevin Small
    • Byron Wallace
  • Byron C. Wallace, Kevin Small, Carla E. Brodley and Thomas A. Trikalinos, Who Should Label What? Instance Allocation in Multiple Expert Active Learning, Proc. of the SIAM International Conference on Data Mining (SDM), 2011  [+]

    Authors: Byron C. Wallace, Kevin Small, Carla E. Brodley and Thomas A. Trikalinos

    Proc. of the SIAM International Conference on Data Mining (SDM)

    Year: 2011


    Abstract:  The active learning (AL) framework is an increasingly popular strategy for reducing the amount of human labeling effort required to induce a predictive model. Most work in AL has assumed that a single, infallible oracle provides labels requested by the learner at a fixed cost. However, real-world applications suitable for AL often include multiple domain experts willing to provide labels of varying cost and quality. We explore this multiple expert active learning (MEAL) scenario and develop a novel algorithm for instance allocation that exploits the meta-cognitive abilities of novice (cheap) experts in order to make the best use of the experienced (expensive) annotators. We demonstrate that this strategy outperforms strong baseline approaches to MEAL on both a sentiment analysis dataset and two datasets from our motivating application of biomedical citation screening. Furthermore, we provide evidence that novice labelers are often aware of which instances they are likely to mislabel.


    Associated Research Topics:
    • Active Class Selection/Active Learning

    Affiliated Tufts Members:
    • Carla Brodley

    Tufts / Purdue Alumni:
    • Byron Wallace
    • Kevin Small
  • Byron C Wallace, Kevin Small, Carla E Brodley, Thomas A Trikalinos, Active Learning for Biomedical Citation Screening, KDD, 2010  [+]

    Authors: Byron C Wallace, Kevin Small, Carla E Brodley, Thomas A Trikalinos

    KDD

    Year: 2010


    Abstract:  Active learning (AL) is an increasingly popular strategy for mitigating the amount of labeled data required to train classifers, thereby reducing annotator effort. We describe a real-world, deployed application of AL to the problem of biomedical citation screening for systematic reviews at the Tufts Evidence-based Practice Center. We propose a novel active learning strategy that exploits a priori domain knowledge provided by the expert (specifically, labeled features) and extend this model via a Linear Programming algorithm for situations where the expert can provide ranked labeled features. Our methods outperform existing AL strategies on three real-world systematic review datasets. Additionally, we propose a new evaluation framework for finite-pool
    scenarios, wherein the primary aim is to label a fixed set of examples rather than to simply induce a good predictive model. We argue that evaluation must be specific to the scenario under consideration. To this end, we use a method from medical decision theory for eliciting the relative costs of false positives and false negatives from the domain expert, constructing a utility measure of classification performance that integrates the expert preferences. Our findings suggest that the expert can, and should, provide more information than instance labels alone. In addition to achieving strong empirical results on the citation screening problem, this work outlines many important steps for moving away from simulated active learning and toward deploying AL for real-world applications.


    Url: http://tuftscaes.org/citation_screening/articles/wallace_et_al_kdd_2010_preprint.pdf


    Associated Research Topics:
    • Active Class Selection/Active Learning

    Affiliated Tufts Members:
    • Carla Brodley

    Tufts / Purdue Alumni:
    • Kevin Small
    • Byron Wallace
  • Byron C. Wallace, Kevin Small, Carla E. Brodley, Joseph Lau and Thomas A. Trikalinos, Modeling Annotation Time to Reduce Workload in Comparative Effectiveness Research, ACM International Health Informatics Symposium, 2010  [+]

    Authors: Byron C. Wallace, Kevin Small, Carla E. Brodley, Joseph Lau and Thomas A. Trikalinos

    ACM International Health Informatics Symposium

    Year: 2010


    Abstract:  Comparative effectiveness reviews (CERs), a central methodology of comparative effectiveness research, are increasingly used to inform healthcare decisions. During these systematic reviews of the scientific literature, the reviewers (MD-methodologists) must screen several thousands of citations for eligibility according to a pre-specified protocol. While previous research has demonstrated the theoretical potential of machine learning to reduce the workload in CERs, practical obstacles to deploying such a system remain. In this article, we describe work on an end-to-end, interactive machine learning system for assisting reviewers with the tedious task of citation screening for CERs. Specifically, we present extsc{Abstrackr}, our open-source annotation tool. In addition to allowing reviewers to designate citations as `relevant' or `irrelevant' to the review at hand, extsc{Abstrackr} facilitates communicating other information useful to the classification model, such as terms that are suggestive of the relevance (or irrelevance) of a citation. The tool also records the time taken to screen citations, over which we conducted a time-series analysis to derive an annotator model. Using this model, we found that both the order in which the citations are screened and the length of each citation affect annotation time. We propose a strategy that integrates labeled terms and timing data into the Active Learning (AL) framework, in which an algorithm selects citations for the reviewer to label. We demonstrate empirically that this additional information can improve the performance of the semi-automated citation screening system.


    Associated Research Topics:
    • Active Class Selection/Active Learning

    Affiliated Tufts Members:
    • Carla Brodley

    Tufts / Purdue Alumni:
    • Kevin Small
    • Byron Wallace
  • Byron C Wallace, Thomas A Trikalinos, Joseph Lau, Carla Brodley, Christopher H Schmid, Semi-automated screening of biomedical citations for systematic reviews, BMC Bioinformatics, 11, 2010  [+]

    Authors: Byron C Wallace, Thomas A Trikalinos, Joseph Lau, Carla Brodley, Christopher H Schmid

    BMC Bioinformatics
    11

    Year: 2010


    Abstract:  Background

    Systematic reviews address a specific clinical question by unbiasedly assessing and analyzing the pertinent literature. Citation screening is a time-consuming and critical step in systematic reviews. Typically, reviewers must evaluate thousands of citations to identify articles eligible for a given review. We explore the application of machine learning techniques to semi-automate citation screening, thereby reducing the reviewers' workload.
    Results

    We present a novel online classification strategy for citation screening to automatically discriminate "relevant" from "irrelevant" citations. We use an ensemble of Support Vector Machines (SVMs) built over different feature-spaces (e.g., abstract and title text), and trained interactively by the reviewer(s).

    Semi-automating the citation screening process is difficult because any such strategy must identify all citations eligible for the systematic review. This requirement is made harder still due to class imbalance; there are far fewer "relevant" than "irrelevant" citations for any given systematic review. To address these challenges we employ a custom active-learning strategy developed specifically for imbalanced datasets. Further, we introduce a novel undersampling technique. We provide experimental results over three real-world systematic review datasets, and demonstrate that our algorithm is able to reduce the number of citations that must be screened manually by nearly half in two of these, and by around 40% in the third, without excluding any of the citations eligible for the systematic review.
    Conclusions

    We have developed a semi-automated citation screening algorithm for systematic reviews that has the potential to substantially reduce the number of citations reviewers have to manually screen, without compromising the quality and comprehensiveness of the review.


    Url: http://www.biomedcentral.com/1471-2105/11/55


    Associated Research Topics:
    • Active Class Selection/Active Learning

    Affiliated Tufts Members:
    • Carla Brodley

    Tufts / Purdue Alumni:
    • Byron Wallace

Current Research Topics:

  • Multiple Expert Active Learning  [+]

    Description:  

    More...
  • Active Learning under Class Imbalance  [+]

    Description:  

    More...

Past Research Topics:  [+]

  • Active Class Selection/Active Learning  [+]

    Description:  We are looking at problems related to the generation of training data. We are interested in two scenarios. 1) A new class of problems we have defined, Active Class Selection (ACS). ACS addresses the question: if one can collect n additional instances, how should they be distributed with respect to class? 2) Active Learning, in which one requests labels for existing training data.

    Specifically, Active Class Selection addresses the tasks for which one can control the classes from which training data are generated. In such cases, utilizing feedback during learning to guide the generation of new training data will yield better performance than learning from an a priori fixed class distribution. Our methods work within a multi-armed bandit framework.

    In regard to active learning, we are investigating several real-world issues. Speficially, how to perform active learning in the context of severe class imbalance, how to adapt to changes in the underlying concept to be learned (concept drift), and how to inject domain knowledge into the AL framework.

    More...