Skip to:

e-Science 2008 4th IEEE International Conference on e-Science

Main Conference Sessions

A Distributed Algorithm for Determining the Provenance of Data

Authors

  • Paul Groth, Information Sciences Institute

Abstract

As computational techniques for tracking provenance have become more widely used, applications are beginning to produce large quantities of provenance information. Furthermore, many of these applications are composed from distributed components (e.g., scientific workflows) that may, for reasons of scalability, security, or policy, need to store this information across multiple sites. In this paper, we describe an algorithm, D-PQuery, for determining the provenance of data from distributed sources of provenance information in a parallel fashion. To enable scientist to use D-PQuery on already existing Grid infrastructure, we present an implementation of the algorithm as a Condor DAGMan workflow that works across Kickstart records, which are produced in several production e-Science applications including the example application used in this paper, the astronomy application, Montage. Initial performance benchmarks are also presented.

Date and Time

Thursday, December 11, 10 a.m. to 10:30 a.m.

Room Number

206

More Information

Show your support for e-Science 2008

Add one of our badges to your site:

  • Teal eScience 2008 Web badge
  • Green eScience 2008 Web badge
  • Orange eScience 2008 Web badge