View Source

Primary Goal

Determine a Tupelo Bean structure to capture the various parts of a PTPFlow Workflow so we can track provenance about the workflow.

eAIRS Analysis Workflow

For your reference I have included an example of a workflow.xml file that can be executed using PTPFlow. This file structure cannot be altered since it is understood by PTPFlow and outlines the steps in the workflow including which resource to run on, executables to run, input files to use, etc. Our intent is to expose certain parts of the workflow to the User Interface (UI) so when a workflow gets submitted, the UI fields will fill in parts of the workflow.xml file such as "meshType".

Most of this file we don't have to worry about because it will not be changed by the users. The only exposed parts of the workflow to the user through the UI will be the selection of a mesh and specifying input parameters (such as Mach Number, Reynolds Number, etc) to the workflow (these input parameters will be written to a single file that is given to the executable at the command line). We intend to track the input parameters as beans so we can determine if a mesh has already been executed with a set of a parameters and just return results if it has.

Looking at the workflow below, it seems like it almost maps to the WorkflowBean/WorkflowStepBean/WorkflowToolBean structure with some alterations, but time is going to require we only use what fits currently. I think we can adopt much of the current workflow bean hierarchy, but we would use our own bean classes for this years project to minimize the level of effort to describe PTPFlow's concept of a workflow with beans. The proposed bean structure is in the next section.

<workflow-builder name="eAIRS-Single" experimentId="singleCFDWorkflow" eventLevel="DEBUG">
  <!-- <global-resource>grid-abe.ncsa.teragrid.org</global-resource> -->
  <global-resource></global-resource>
  <scheduling>
    <profile name="batch">
      <property name="submissionType">
        <value>batch</value>
      </property>
    </profile>
  </scheduling>
  <execution>
     <profile name="mesh0">
     	 <property name="RESULT_LOC">
     	 	<value>some-dir-uri</value>
     	 </property>
     	 <property name="EXEC">
     	 	<value>some-file-uri</value>
     	 </property>
         <property name="MeshType">
           <value>some-file-uri</value>
         </property>
         <property name="InputParam">
           <value>some-file-uri</value>
         </property>
     </profile>
  </execution>
  <graph>
    <execute name="compute0">
      <scheduler-constraints>batch</scheduler-constraints>
      <execute-profiles>mesh0</execute-profiles>
      <payload>2DComp</payload>
    </execute>
  </graph>
  <scripts>
    <payload name="2DComp" type="elf">
      <elf>
        <serial-scripts>
          <ogrescript>
            <echo message="Result location = file:${RESULT_LOC}/${service.job.name} result directory is file:${runtime.dir}/result, copy target is file:${RESULT_LOC}/${service.job.name}"/>
            <simple-process execution-dir="${runtime.dir}" out-file="cfd.out" >
              <command-line>${EXEC} -mesh ${MeshType} -param ${InputParam}</command-line>
             <!-- <command-line>${runtime.dir}/2D_Comp-2.0 -mesh ${meshType} -param ${inputParam}</command-line> -->
            </simple-process>

            <!-- Post Processing Step where we move the files to the directory the user specified. Can we have something polling a directory for new results (e.g. cron job)? -->
            <mkdir>
            	<uri>file:${RESULT_LOC}/${service.job.name}</uri>
            </mkdir>
            <copy sourceDir="file:${runtime.dir}/result" target="file:${RESULT_LOC}/${service.job.name}"/>
          </ogrescript>
        </serial-scripts>
      </elf>
    </payload>
  </scripts>
</workflow-builder>

Workflow Hierarchy

Until we can completely decompose the workflow.xml file into bean parts that can be compiled to a workflow.xml file, I believe we will need to store a workflow.xml "template" that will be filled in by the UI. My first thought is to store the workflow.xml in a DatasetBean and then associate this with a WorkflowBean.

private PersonBean creator;
private Date date = new Date();
private List<KNSGWorkflowStepBean> steps;

// This would be a workflow.xml file that needs to be filled in by the UI parts
private DatasetBean workflowTemplate;

This class would define the step in the workflow. eAIRS has single runs and parameterized runs so would there be a step for each set of parameters?

private PersonBean creator;
private Date date = new Date();
private KNSGWorkflowToolBean tool;
// This would contain the list of parameters (e.g. Mach Number, etc) that would be used to generate the parameter input file
// Our UI will request this list for the workflow step so it can generate the fields required to populate this
private List<WorkflowParameterBean> parameters;
private List<DatasetListBean> inputs;
private List<DatasetListBean> outputs;

This bean class would capture the command line tool to execute and allow users to specify a different eAIRS executable to run with the workflow

private String version;
private PersonBean creator = null;
private Date date = new Date();
private DatasetBean executable;

In the future, I believe we'd like to be able to decompose the entire workflow.xml file into beans so we can just compile the "workflow.xml" for execution based on the bean information. Your comments are welcome on whether the above is reasonable are welcome.

Note: These bean classes only contain the most important parts, fields such as title, label, etc have been intentionally left out

Question(s)

The question(s) we need to be able to answer about a workflow are:

1) Have these parameters been ran before with the selected executable and the selected mesh?

Problem Scope

The workflow that must be executed by PTPFlow is in the section labelled eAIRS Analysis Workflow. The critical parts of this workflow that must be supplied by the UI are:

RESULT_LOC - A location to store analysis outputs so the web application can retrieve them.
EXEC A URI to the eAIRS executable to run.
MeshType - A URI to the file containing the mesh to use.
InputParam - A URI to the file containing the input parameters to use (e.g. Mach Number, Reynolds Number, etc). This should come from UI input fields and be generated on the fly because we need to know what parameters are in the file so we can answer questions such as "Has an analysis with these parameters already been executed?".

RESULT_LOC is not a requirement, but we would need some way to track output results and which workflow/user created them so we can retrieve results in the web application. We also want to move the results somewhere because the default will put them in scratch space which gets wiped out regularly.