Modeling Biological Networks

IV.1 Coordinators
IV.2 Participants
IV.3 Introduction
IV.4 Background and Significance
IV.5 Research Plan
IV.6 Specific Subprojects

IV.7 Connection to Specific Projects 2 (cytoskeleton) and 3 (organogenesis)
IV.8 Timeline

< Previous | Page 12 of 35 | Next >

IV.6.i.d.1.ii Clustering by thermal annealing:

Statistical methods (Bengtsson and Poivainen, 1995; Jain et al., 2000; Rose et al., 1990; Jain and Dubes, 1988; Blatt et al., 1996) on the other hand, define a "cost" or "energy" depending on the spatial relation of the elements. Starting from a random cluster structure, Monte Carlo simulations minimize the cost function. This minimization should converge to a natural clustering of the elements, though it can trap in spurious local minima. A temperature-like parameter controls the sensitivity of the clustering (see equation VI.5). These statistical methods embed the network in an n-dimensional space (Kamada and Kawai, 1987; Fruchterman and Reingold, 1991; Herman et al., 2000), where n is a fitting parameter. Drawbacks of these methods include the lack of a reliable graph drawing algorithm and the uncertainty regarding the unknown dimensionality of the phase space. Furthermore, the identified clusters often depend on the parameters of the statistical method. Such methods detect clusters in any system, whether naturally clustered or not and cannot determine if clustering is indeed natural. For obviously well-clustered networks these methods find the correct clusters, but when clustering is less evident, their reliability is questionable. While systematic methods can address the quality of clustering, we also need to explore approaches that unambiguously fail to group highly homogeneous networks.

IV.6.i.d.1.iii Network clustering:

Several recent clustering methods use the link structure of a network directly. These approaches try to infer clusters within a network from the arrangements of the links (Gibson et al., 1998; Adamic and Adar, 1999; Adamic, 1999; Larson, 1996; Flake et al., 2000). Despite competing definitions for "clusters" in networks or "communities" on the WWW (for studies relating to the Internet (Gibson et al., 1998; Flake et al., 2000)), their common idea is that nodes are more likely to link within a cluster than to outside nodes. The balanced minimum cut algorithm aims to minimize the number of links between clusters while still maintaining a certain minimal cluster size (Garey and Johnson, 1979; Chekuri et al., 1997). The maximum flow method notes that for two nodes from distinct communities the maximum flow is identical to the minimal cut (the number of links that we must remove to eliminate any connecting paths). We apply a polynomial time algorithm (Goldschmidt and Hochbaum, 1994; Saran and Vazirani, 1991) to compute a minimum k-cut in the graph, by removing the subset of edges with minimal weight such that the remaining graph consists of k connected components. After this step, we check whether each connected component is a cluster. The problem of computing a minimum k-cut in a general graph is computationally intractable (NP-complete), but we can achieve a good approximate solution within a factor of less than two of the optimal solution using the algorithm of Saran and Vazirani (1991) (Chen and Wu, 2001; Cvalinescu and Karloff, 2000; Dahlhaus et al., 1994; Karger et al., 1999).

We have developed and are currently testing a method to cluster metabolic networks based on measuring their clustering coefficient (Watts and Strogatz, 1998) and cutting links whose deletion increases it. The assignment of edge weights to the network must reflect the underlying biological properties. We also need some information on the minimal irreducible structure allowed in a cluster. Clusters in different biological networks may contain different irreducible structures, for example, closed enzymatic and catalytic cycles. Metabolites within a group link to each other more frequently than to metabolites outside the group. In this framework, links inside a cluster increase the clustering coefficient, while links between clusters decrease it. If we do not know the clusters beforehand, we can characterize the links according to the effect of their deletion. Our simulations show that repeatedly removing links that increase the clustering coefficient until the clustering coefficient stops increasing produces clusters consistent with our expectations. The algorithm does not cluster random networks, which are not inherently clustered. In particular, this method does cluster the metabolic network of E. coli.