Undergraduate Thesis Defense: Formatt: Correcting Protein Multiple Structural Alignments by Sequence Peeking
Multiple protein alignments have several applications in computational biology, including detection of protein homology, identification of conserved or divergent regions in proteins, or construction of HMMs as templates for protein families. Although it has been shown that structure-based methods produce better multiple protein alignments than pure sequence aligners, structural alignments that ignore all sequence information are prone to making easily-avoidable frame offset errors. We present Formatt, a multiple protein structural alignment program that also takes sequence similarity into account when constructing alignments. Formatt is based on the Matt purely geometric multiple structural alignment program. We show that Formatt is superior to Matt in alignment quality based on objective measures (most notably Staccato Seq and Str scores) while preserving the same advantages in core length and RMSD that Matt, as a flexible structural aligner, has as compared to other multiple structural alignment programs on popular benchmark datasets. Applications include producing better training data for threading methods.