Phylogeny Reconstruction and Experimental Algorithmics
Phylogenetic trees, also known as evolutionary trees, model the evolution of biological species or genes from a common ancestor. Heuristics for the NP-hard maximum parsimony (MP) problem constitute the principle mechanism for estimating phylogenies on large datasets. (The most parsimonious tree is the one with the smallest number of evolutionary changes.) Traditional MP heuristics spend an enormous amount of computational time searching for the optimal solution; on large datasets, MP searches may require months or even years to solve optimally. My work centers on using experimental algorithmics to design and test algorithms for large-scale phylogeny reconstruction.
In this talk, I will discuss Disk-Covering Methods (DCMs), a suite of techniques for reconstructing phylogenetic trees quickly and accurately. Our DCMs reduce the time to optimal by an order of magnitude on many datasets. Yet, is it necessary to solve MP optimally? Our work shows that near- optimal solutions to MP (within a fraction of one percent of optimal) give highly accurate estimations of optimal tree topologies. Moreover, they can be obtained in a fraction of the time needed to solve to optimality. Thus, the talk concludes with a discussion of a stopping criterion for phylogenetic searches.