Reconstructing Phylogenies: Accuracy of Methods and Appropriateness of Models
Phylogenies, i.e. the evolutionary histories of groups of organisms, play a major role in representing the interrelationships among biological entities. Their pervasiveness has led biologists, mathematicians, and computer scientists to design a cariety of methods for modeling, comparing, and reconstructing them. We address two problems with existing methods for phylogeny reconstruction, and present our solutions.
First we address the inaccuracy of phylogenetic tree construction methods. We present a new method, called DCM NJ+ML, that is both fast, accurate, and outperforms all methods in its class. The method is based on a divide and conquer approach. The second problem that we address is reticulate evolution. Almost all existing phylogenic methods assume that the underlying evolutionary history of a given set of entities can be represented by a tree. While this model gives a satisfactory first order approximatino for many families of organisms, other families exhibit evolutionary mechanisms that cannot be represented by trees. In particular, proccesses such as hybrid specialization (e.g., in groups of plants) and horizontal gene transfer (e.g. in bacteria) result in networks of relationships rather than trees of relationships. Although this problem is widely appreciated, there has been comparatively little work on computational methods for estimating and studying evolutionary networks. I will describe a mathematic model of phylogenetic networks, and the simulation tools we have developed based on this model. Then, I will discuss our new measure of distance between a pair of networks; this is the first metric that allows for assessing the topological accuracy of phylogenetic networks. This suite of tools and measures allows for conducting simulations to study the performance of network reconstruction methods. Finally, I will describe our new method for reconstructing phylogenetic networks. This method, called SpNet (for Species Networks), is based on a separate analysis approach of the dataset: individual gene trees are first reconstructed, and then the resulting trees are reconciled into a network. Our experimental studies show that SpNet significantly outperforms existing methods. Central to our method are efficient algorithms that we have designed to solve a special case of a long standing open problem. Joint work with Tandy Warnow (CS, UT Austin), Randy Linder (Biology, UT Austin), and Bernard Moret (CS, UNM).