New Graph Metrics Improve Network-based Protein Function Prediction
Protein-protein interaction (PPI) networks, arise when each protein in an organism is represented as a node and two nodes are connected by an edge if the corresponding proteins interact.
We study the traditional task of inferring protein function labels based on the function labels of proteins within some notion of "neighborhood" defined in the PPInetwork.
Many previous methods that have been suggested for this problem rely on shortest-path distance to define the neighborhood, but because these graphs typically have many properties of so-called ``small world" networks, they are low diameter and shortest-path distances quickly comprise the entire graph, which inevitably results in many ties in proximity for such dense graphs.
Thus we introduce new distance metrics using a random walk paradigm that can better capture an informative notion of local neighborhood.
We first introduce the diffusion state distance (DSD) as a finer-grained distance metric based on graph diffusion properties, so that it globally captures functional similarity among proteins.
Then we will introduce three natural extensions for DSDby considering weighted and also directed graphs as well as by incorporating biological pathways as auxiliary subgraphs.
We show that our proposed DSDmetrics improve the performance of classical protein function prediction methods by replacing the shortest-path metric with DSD. The newly designed metrics can also be used in other computational biology tasks where any notion of distances is used: complex/pathway detection, disease gene prioritization, network alignment, etc. In addition, we expect that the application of the newly designed metrics will be beyond the field of computational biology.