Provenance-based scientific workflow search
Abstract
Due to data intensive and sophisticated tasks in scientific experiments, workflows have been widely used to enable repetitive task automation and data reproducibility. This yields to the need for effective and efficient search mechanisms for scientific workflows discovery as workflow retrieval systems require a model which fulfills several requirements: Unification, accuracy, and rich representations. Motivated by the recent uptake in provenance based models for scientific workflow discovery, in this paper, we propose a provenance-based architecture for retrieving workflows. Specifically, the paper presents an architecture which transforms data provenance into workflows and then organizes data into a set of indexes to support efficient querying mechanisms. The architecture enables composite queries supporting three types of search criteria: Keywords of workflow tasks, workflow structure patterns, and metadata about workflowse.g., how often a workflow was used.