Use of ILP for cancer driver prioritization
Project Information
- Discipline
- Computer Science (401)
- Orientation
- Research
A key challenge in cancer genomics is the identification and prioritization of genomic aberrations that potentially act as drivers of cancer. We introduce a combinatorial method to identify aberrant genes that can collectively influence possibly distant "outlier" genes based on what we call the "random-walk facility location" (RWFL) problem on an interaction network. RWFL differs from the standard facility location problem by its use of "multi-hitting time", the minimum expected number of hops in a random walk originating from any aberrant gene to reach an outlier. WE thus aim to find the smallest set of aberrant genes from which one can reach outliers within a desired multi-hitting time. For that it estimates multi-hitting time based on the independent hitting times from the drivers to any given outlier and reduces the RWFL to a weighted multi-set cover problem, which it solves as an integer linear program (ILP).
Intellectual MeritWe present a novel integrative method that considers potential driver events at the genomic level, i.e. single nucleotide mutations, structural or copy number changes. We try to identify “the most parsimonious” set of patient specific driver genes which have sucient “influence” over a large proportion of outlier genes. We formulate this as a “random-walk facility location” problem (RWFL), a combinatorial optimization problem, which, to the best of our knowledge, has not been explored earlier. Since RWFL problem is NP-hard, we estimate the multi-hitting time based on the independent hitting times of the drivers to an outlier, which provides an upper bound on the multi-hitting time. Our experiments show that this estimate works well for the human protein interaction network. More importantly, our estimate enables us to reduce the RWFL problem to a weighted multi-set cover problem, for which we give an ILP formulation.
Broader ImpactsIntroducing the multi-set cover method into prioritization of cancer drivers, providing bounds and ways to approximate multi-hitting time in networks and describing validation using module detection for purpose of classification.
Project Contact
- Project Lead
- Ermin Hodzic (ehodzic)
- Project Manager
- Ermin Hodzic (ehodzic)
- Project Members
- Yen-Yi Lin, Ibrahim Numanagic, Alex Gawronski
Resource Requirements
- Hardware System
-
- Not sure
Running cplex / gurobi in order to solve ILP problems that give solution to multi-hitting time approximations.
Scale of UseWill be running cplex/gurobi programs on datasets. Each run might take more than a day to solve.
Project Timeline
- Submitted
- 01/23/2014 - 13:51