De novo Repeat Classification and Fragment Assembly

March 17, 2004
2:50pm - 4:00pm
Halligan 111

Abstract

Repetitive sequences make up a significant fraction of almost any genome and an important and still open question in bioinformatics is how to represent all repeats in DNA sequences. We propose a radically new approach to repeat classification that is motivated by the fundamental topological notion of quotient spaces. A torus or Klein bottle are examples of quotient spaces that can be obtained from a square by gluing some points. Our new repeat classification algorithm is based on the observation that the alignment-induced quotient space of a DNA sequence compactly represents all sequence repeats. This observation leads to a simple and efficient solution of the repeat classification problem as well as new approaches to fragment assembly and multiple alignment. We show that our RepeatGluer algorithm improves on existing repeat classification tools, while our new FragmentGluer assembler improves on Phrap and ARACHNE in assembly of BACs and bacterial genomes.