GPCloud: Cloud-based Automatic Repair of Real-World Software Bugs

Abstract

Bugs in software are ubiquitous, and fixing them remains an expensive, difficult, time-consuming, and manual process. The GenProg project automatically and generically repairs software defects using genetic programming, an iterative, stochastic search technique. GenProg can repair a variety of error types in many different programs while maintaining important functionality, and notably applies to off-the-shelf legacy applications without requiring formal specifications, program annotations or special coding practices. Our current research focuses on exploiting the search process's inherit parallelism to scale and adapt GenProg to repair bugs in large, open-source programs in commodity cloud environments. This project proposes to extend these improvements, study the evolutionary processes that underlie GenProg's success, and more particularly to understand the relationship between the underlying biological operations, repair success, and program/problem size.

Intellectual Merit

This project will quantify the relationship between program scale, bug type, and
test suite size and GP repair success. It will also formalize the GP operators,
parameters, and internal representation choices necessary for deployment on
commodity cloud resources. This will enable future application to a wider
variety of real-world software errors in larger and more variable open-source
programs.

Broader Impact

Mature software projects are forced to ship with both known
and unknown bugs because they lack the development resources to deal with every
defect. This is particularly troubling in critical code: in 2006, it took 28
days on average for maintainers to develop fixes for security defects. In a
2008 FBI survey of over 500 large firms, the average annual cost of computer
security defects alone was \$289,000. Automatic debugging is thus a pressing
research problem, and techniques for addressing this issue must necessarily
apply efficiently to real-world bugs in realistic software. We are also
committed to providing access to our source code and experimental images to
other researchers for the purposes of reproduction and for any
research that could use a large benchmark set of real bugs in real
software.

Use of FutureGrid

We will use the cloud resources at futuregrid to perform
our scale, parameter sweep, and representation-based experiments on a benchmark
set of 105 bugs in established open-source projects.

Scale Of Use

Our experiments entail either the deployment of bursts of a large
number of virtual machines that run between 10 minutes to a maximum of 12 hours,
though most runs/VMs complete within 1.5 hours on average, or a much smaller group of VMs running for a week at a time.

Publications


FG-179
Claire Le Goues
University of Virginia
Active

Project Members

Jonathan Dorn

Timeline

1 year 29 weeks ago