A Statistical and Information Theoretic Analysis of the PASTranscription Factor Dimerization Domain
The standard procedure to investigate the importance of individual amino acids to protein structure and function is to mutate specific residues and then characterize the activity of these mutants. Knowledge of which residues to change usually comes from a solved three-dimensional structure, but for many biologically important molecules such data are not available. We have applied an analytical algorithm to two conserved domains (A and B) in the PAS (analogs of Per, Arnt, and Sim) family of transcription factor proteins to identify residues that covary due to structure and/or function rather than shared common ancestry. First, we derived a robust definition of domain boundaries by using a local sequence alignment program. To quantify associations among residues (both within and between domains) we used mutual information (MI) normalized by the square root of the product of constituent entropies. A parametric bootstrap algorithm simulated protein sequences that were used to calculate the distribution of mutual information values primarily due to phylogenetic association. Mutual information values for the PAS domains that were greater than 95% of the simulated MI values were considered to have a significant probability of being due to structural and/or functional constraints. Residues having significant structural/functional associations within a domain are more important to the specific function of that domain. Residues having high among-domain MI values are not considered to have as important a role in the divergence of domain function, but possibly are important to overall protein structural integrity.