Provenance: an Open Approach to support Workflow Inter-Operability

old_uid4272
titleProvenance: an Open Approach to support Workflow Inter-Operability
start_date2008/03/12
schedule16h
onlineno
summaryOver the last few years, e-Science and e-Business have emphasized the need to expose existing and new procedures as services, so that they can be composed in sophisticated functionality for end-users. In particular, workflows have emerged as a paradigm for representing and managing complex distributed scientific computations. To some extent, with workflow technology, e-scientists are today provided with the means to express and run their experiments. However, while workflow technology is a crucial breakthrough, it is only one of the tools required to support the scientific methodology. As important to domain scientists (and very often ignored by computer scientists!) is the ability to describe past experiments, to reproduce and verify them, and to understand differences between executions. The problem is further compounded by the fact that workflow systems will inevitably be heterogeneous, and multiple workflow technologies are bound to co-exist (e.g. Taverna, Triana, Pegasus, Swift, Kepler). Provenance (also known as lineage, pedigree, audit trail) is crucial to allow scientists to implement their scientific methodology fully in silico. Provenance of a data product is defined as the process that led to that data product. While provenance technology has traditionally been embedded in execution environments (workflows system, operating system, specific application), we have taken a radically different view by seeing a provenance management system as a distinct first-class component of any computational environments where past executions should be inspect-able. Applications of our approach not only include e-science but also business, where past processes have to be audited. By taking this view, and separating provenance from workflow, we were able to identify the essence of provenance and to propose an architecture for provenance management systems, which allow past processes to be described, even when multiple execution technologies are involved. In this talk, I present the principles of provenance, its architectural design, its implementation, and integration with several workflow technologies. We have successfully deployed the approach in multiple application domains, including astronomy, aerospace engineering, and medicine. Professor Moreau is Professor of Computer Science, in the Intelligence, Agents, Multimedia group IAM, School of Electronics and Computer Science at the University of Southampton. His research is concerned with large-scale open distributed systems not subject to centralised control; examples of these include the Internet, the World Wide Web, the Grid and pervasive computing environments.
responsiblesBishop