Graduate Student Talk:Mining for Gene Interactions in Entrez Gene Reference Into Function (RIF) Entries

December 11, 2006
2:00 - 3:00 pm
Extension Conference Room
Speaker: Danny Vukelich, Tufts University
Host: Donna Slonim

Abstract

A Gene Reference Into Function (GeneRIF) entry in the Entrez Gene Database is a short characterization (no more than 255 characters) describing the "function" of a gene. Each GeneRIF is uniquely associated with the PubMed article from which it is derived. A GeneRIF may also include cross-references to other genes mentioned in the text of the entry.

In support of a larger proposed effort here at Tufts to develop tools and methodologies for validating candidate gene interaction networks submitted in the literature, we are investigating the use of Natural Language Processing (NLP) techniques applied to GeneRIFs in order to ascertain the nature of the relationships among the genes referenced in the GeneRIF entry. These relationships can then serve as a basis for evaluating candidate gene interaction networks.

As an experiment, we randomly selected 150 GeneRIFs related to the organism Saccharomyces cerevisiae (yeast). For each GeneRIF, we computed the pair-wise combination of all genes represented in the RIF and assigned each pair one of the following values: "+" signifying an interaction; "-" signifying NO interaction; and "*" signifying "inconclusive evidence." Against this "gold standard," we then applied some modest NLP techniques to measure the utility of mining GeneRIFs for gene interactions. My talk will discuss the assumptions and heuristics employed as well as the pitfalls encountered when applying NLP techniques in this restricted textual environment.