Workflow-Driven Ontologies (WDOs) are used to capture process vocabulary, which in turn can be used to document processes, which in turn can be used to encode provenance of data. This page gives a brief description of the technologies involved.
WDO-It! is a Java-based tool available in the downloads section to aide users through the steps of capturing process vocabulary, documenting processes, and creating data annotators to encode provenance of data.
Workflow-Driven Ontologies (WDOs) are task ontologies intended to define concepts about a domain for the purposes of capturing process knowledge. WDOs define two types of concepts: Data and Method. Data concepts are representative of the data components of a scientific process. Examples of data concepts are datasets, documents, instrument readings, input parameters, maps, and graphs. Method concepts are representative of discrete activities involved in a scientific process that transform data components. Examples of method concepts are algorithms (either implemented as software or carried out manually by humans) and human analysis of data. Scientists define process-related concepts by extending the hierarchies of Data and Method.
From a technical point of view, WDOs are OWL ontologies that import and extend the wdo.owl ontology. The wdo.owl ontology defines concepts and relations that are useful to encode processes. Additionally, the wdo.owl ontology is aligned to the PML-Provenance ontology. The PML-Provenance ontology defines concepts and relations that are useful to encode provenance of data. By aligning these two ontologies, users are able to ground the documentation of processes on things that can be used to trace provenance of data.
As an illustrative example, the figure below shows the alignment of the wdo.owl and PML-Provenance ontologies, and their use to define concepts related to the process of creating a uniformly-distributed geospatial dataset from a sparsely-distributed dataset.
Additional information about the Geospatial WDO and other WDOs can be found in the examples section.
Semantic Abstract Workflows (SAWs) are used to document processes as understood and required by end users. SAWs capture the flow of data from the initial sources of a process, through the activities involved in the process, and finally, to the data sinks at the end of the process. Semantic in SAWs refers to the meaning inherited by using ontological concepts captured in a WDO. Abstract refers to the level of detail specified in SAWs, which is consistent with an end-user's understanding of a process, but which lacks additional constructs necessary to produce automated systems. In this sense, SAWs are not committed to be executable workflow specifications.
SAWs are encoded in OWL and have a graphical notation. Data are represented by directed edges, methods are represented by rectangles, and sources and sinks are represented by ovals. As an illustrative example, the figure below shows a the Grid Dataset SAW, which documents the process of creating a uniformly-distributed geospatial dataset from a sparsely-distributed dataset. The Grid Dataset SAW uses concepts from the Geospatial WDO (introduced above).
Notice that the labels have the format XXX:YYY, where XXX corresponds to the name of the concept defined in the WDO, and YYY corresponds to an assigned name for that particular instance of the concept. Additional information about the Grid Dataset SAW can be found in the examples section.
Provenance of data is encoded using the PML language. Processes documented as SAWs are used to scope the level of detail of provenance to be captured in PML. This is because SAWs document processes from the perspective of the end user's understanding. Hence, only relevant details for the end user are captured in SAWs.
By using the WDO-It! tool (available in the downloads section), end users can create data annotators for each method contained in a SAW. Data annotators are executable modules that capture and encode provenance in PML associated with the execution of a specific step in a process. Data annotators are used to instrument systems so that provenance is captured as a side effect of execution. The provenance captured in PML is stored in a repository (CI-Server) where it can be queried and analyzed later. These steps are illustrated in the next figure.