Use of ILP for cancer driver prioritization

Project Information

Discipline
Computer Science (401) 
Orientation
Research 
Abstract

A key challenge in cancer genomics is the identification and prioritization of genomic aberrations that potentially act as drivers of cancer. We introduce a combinatorial method to identify aberrant genes that can collectively influence possibly distant "outlier" genes based on what we call the "random-walk facility location" (RWFL) problem on an interaction network. RWFL differs from the standard facility location problem by its use of "multi-hitting time", the minimum expected number of hops in a random walk originating from any aberrant gene to reach an outlier. WE thus aim to find the smallest set of aberrant genes from which one can reach outliers within a desired multi-hitting time. For that it estimates multi-hitting time based on the independent hitting times from the drivers to any given outlier and reduces the RWFL to a weighted multi-set cover problem, which it solves as an integer linear program (ILP).

Intellectual Merit

We present a novel integrative method that considers potential driver events at the genomic level, i.e. single nucleotide mutations, structural or copy number changes. We try to identify “the most parsimonious” set of patient specific driver genes which have sucient “influence” over a large proportion of outlier genes. We formulate this as a “random-walk facility location” problem (RWFL), a combinatorial optimization problem, which, to the best of our knowledge, has not been explored earlier. Since RWFL problem is NP-hard, we estimate the multi-hitting time based on the independent hitting times of the drivers to an outlier, which provides an upper bound on the multi-hitting time. Our experiments show that this estimate works well for the human protein interaction network. More importantly, our estimate enables us to reduce the RWFL problem to a weighted multi-set cover problem, for which we give an ILP formulation.

Broader Impacts

Introducing the multi-set cover method into prioritization of cancer drivers, providing bounds and ways to approximate multi-hitting time in networks and describing validation using module detection for purpose of classification.

Project Contact

Project Lead
Ermin Hodzic (ehodzic) 
Project Manager
Ermin Hodzic (ehodzic) 
Project Members
Yen-Yi Lin, Ibrahim Numanagic, Alex Gawronski  

Resource Requirements

Hardware System
  • Not sure
 
Use of FutureGrid

Running cplex / gurobi in order to solve ILP problems that give solution to multi-hitting time approximations.

Scale of Use

Will be running cplex/gurobi programs on datasets. Each run might take more than a day to solve.

Project Timeline

Submitted
01/23/2014 - 13:51