Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Table of Contents
styledisc

Overview

This document looks at some possibilities of extending version 1.0 of the KNSG Architecture to include semantic technologies that will improve the framework and add value to the user experience. KNSG is a non-domain specific application framework that is built upon Bard, PTPFlow, Tupelo, and MyProxy for setting up, launching and managing HPC application workflows through an easy to use set of user interfaces. This document is intended . The framework has simple facilities for working with data including import/export, annotation and tagging. The user can create scenarios, add data to them and then launch workflows to HPC machines using PTPFlow. There is also a facility for retrieving result data from completed jobs so the user can continue to work with it and if possible, visualize it. The intent of this document is to lay the foundation of how the core components and views provided by the KNSG will be enhanced in version 2.0 application framework and inform users how they can extend the various parts by adding in the semantic capabilities provided by Tupelo and replacing the current frameworks repository system with a Tupelo context and Tupelo beans. It will also give users information on how to extend the framework for their domain specific application.

Core Application Management

The central management piece for each SAGE KNSG application is KNSGFrame, an extension of BardFrame that registers tupelo utility methods for CETBeans used by the KNSG framework. For the remainder of this document we will use the term BardFrame since our extension (KNSGFrame) only overrides the method for registering beans, the rest is the same. BardFrame provides an interface for working with the Tupelo semantic content repository and is responsible for managing contexts, bean sessions, data, firing events, etc. The use of Tupelo beans will be a core concept for persisting information in the SAGE KNSG framework so all beans will need to descend from CETBean to remain compatible with the framework and other CET projects. Because every application will have its own bean requirements, each SAGE KNSG application should have its own instance of BardFrame to handle this along with an ontology to define domain specific concepts. All application bean types should register with BardFrame and the BardServiceRegistry IBardFrameService should provide the correct instance of BardFrame at runtime.

Scenario Manager

Scenarios View

Tupelo Beans

...

This section outlines the bean classes that will be required for the KNSG framework. Where possible, the core CETBeans will be used to minimize the work required and maximize compatibility across projects. Some beans will be marked optional if they are part of PTPFlow and if it is uncertain that they will be managed by Tupelo or continue to be managed by PTPFlow's current repository. If a bean is from the set of beans provided by edu.uiuc.ncsa.cet.bean it will be noted so we differentiate between new beans for KNSG and the use of existing beans (e.g. the ScenarioBean below is new, but has the same name as the scenario bean in edu.uiuc.ncsa.cet.bean).The first main view provided by SAGE will be the ScenariosView. This view displays user scenario(s) and all sub-parts in a Tree view. A scenario is similar to the concept of a project and is simply a way of organizing things that belong together. The scenario is responsible for managing all of the pieces that it contains including input datasets, output datasets and workflows. A scenario may also contain the RMI Service that the workflows will use to launch their jobs, but this could end up being an application wide object or part of a Scenario Manager since users will most likely use the same launch point to execute jobs regardless of which scenario the workflow belongs to. Users will launch jobs on the HPC machines that use the inputs in their scenario and when a project completes, the outputs should be added back to that scenario. A user can have multiple scenarios open at once, close scenarios, or even delete scenarios from their scenario view (deleted from the view, but still in the repository) so we'll need to manage which scenarios are in a session and what is their current state (open/closed). It is anticipated that new applications might extend this view to organize their view differently for their specific domain.

Scenario Bean

A scenario bean will be used to organize things such as user data and workflows specific to a scenario (or project). This will include datasets (input and output), workflows, and possibly the RMI service for launching jobs. As previously mentioned, this might end up an application wide object that is viewable from the scenario view, but not specific to any one scenarioThe RMI service was previously part of the scenario, but since this is a system wide object, it will probably not be tracked as part of the scenario in version 2.0. A snippet of what the scenario bean might look like is below:

Code Block
titleScenarioBean extends CETBean implements Serializable, CETBean.TitledBean
private String title;  // scenario title
private String description;  // scenario description
private Set<DatasetBean> dataSetsDate date = new Date();  // datasets associated with scenario date scenario created
private PersonBean creator;  // scenario creator
private RMIServiceBeanSet<DatasetBean> serviceBeandataSets;  // rmidatasets serviceassociated used to launch workflowswith scenario
private List<WorkflowBean> workflows;  // workflows associated with this scenario, if possible, this needs to be able to wrap ptpflow workflow xml files, or we need our own bean type
private booleanList<VisualizationBean> openvisualizations;  // it is possible in the scenariofuture openedthat or closed?
private PersonBean creator;  //visualizations might be part of a scenario creator
privateso Datewe date;know which datasets// date scenario created
tools are required to generate visualizations

While this bean looks very similar to the ScenarioBean in the cet bean plug-in, it is unclear if some of the internal bean types will match what is required for eAIRS/KNSG (e.g. the workflow bean, the visualization bean). As the scenario bean evolves, it will become more clear whether we can replace our bean with the one in the cet bean plug-in. More This scenario bean will evolve as the application framework is built and more final documentation will be put here as the this design matures.

The main parts of this bean are: DatasetBean's will be used to manage all of the input/output datasets , the RMIServiceBean (described later) will contain the service information and the WorkflowBean List will contain the workflows associated with this scenario. A user might extend the ScenarioBean if their application has other things that logically belong to their scenarios.

Workflows Bean

, but it is envisioned that most changes will happen at the metadata level (e.g. this dataset is a mesh, a result, etc) since the scenario bean should be a generic container that satisfies most users needs.

Dataset Bean

This section is intended to talk about the types of concepts that the Ontology needs to capture. We will break this into two parts: general framework concepts (e.g. a result) and eAIRS specific (e.g. a mesh). We don't anticipate any changes to the DatasetBean class that is provided as part edu.uiuc.ncsa.cet.bean plug-in.

WorkflowBean & WorkflowStepBean

Below you will find an example of a PTPFlow workflow.xml file. Right now, this file cannot be altered since it is understood by PTPFlow and Each workflow is described in an XML file that outlines the steps in the process workflow including which resource to run on, executables that will be launched, input files to use, etc. Initially, we will simply store the workflow information in a single WorkflowStepBean that has a reference to the file containing the xml and the DatasetBeans. Ogrescript xml files can be complex, but if we can logically separate out the steps or parts into individual beans that can be used to generate the full workflow xml file required by the HPC machines, then we can include workflow steps as separate beans and provide a UI for adding steps dynamically.

This bean will wrap the workflow and all of its parts.

Code Block
titleWorkflowBean extends CETBean implements Serializable

private String title;
private String description;
private Date date;
private List<WorkflowStepBean> workflowSteps;  // only one step initially which will be the workflow file that PTPFlow can launch right now
private PersonBean creator;
private Collection<PersonBean> contributors;

This bean will represent a step in the workflow.

Code Block
titleWorkflowStepBean extends CETBean implements Serialiable

private String title;
private PersonBean creator;
private Date date;
private List<DatasetBean> inputDatasets;  // all data inputs associated with this step
private DatasetBean workflow;  // initially our steps will only include a single step, the entire workflow

Service Manager

RMI Service Registry View

This view shows all of the machines defined as available to the user for installing the RMI service and PTPFlow plugins required to run HPC jobs and return status information to the client.

RMIService Info Bean

Ideally, this file would be wrapped into the current WorkflowBean and/or WorkflowStepBean in the edu.uiuc.ncsa.cet.bean plug-in. If this is not possible, the KNSG framework will need its own workflow bean.

Code Block
xml
xml

<workflow-builder name="eAIRS-Single" experimentId="singleCFDWorkflow" eventLevel="DEBUG">
  <!-- <global-resource>grid-abe.ncsa.teragrid.org</global-resource> -->
  <global-resource></global-resource>
  <scheduling>
    <profile name="batch">
      <property name="submissionType">
        <value>batch</value>
      </property>
    </profile>
  </scheduling>
  <execution>
     <profile name="mesh0">
     	 <property name="RESULT_LOC">
     	 	<value>some-file-uri</value>
     	 </property>
     	 <property name="executable">
     	 	<value>some-file-uri</value>
     	 </property>
         <property name="meshType">
           <value>some-file-uri</value>
         </property>
         <property name="inputParam">
           <value>some-file-uri</value>
         </property>
     </profile>
  </execution>
  <graph>
    <execute name="compute0">
      <scheduler-constraints>batch</scheduler-constraints>
      <execute-profiles>mesh0</execute-profiles>
      <payload>2DComp</payload>
    </execute>
  </graph>
  <scripts>
    <payload name="2DComp" type="elf">
      <elf>
        <serial-scripts>
          <ogrescript>
            <echo message="Result location = file:${RESULT_LOC}/${service.job.name} result directory is file:${runtime.dir}/result, copy target is file:${RESULT_LOC}/${service.job.name}"/>
            <simple-process execution-dir="${runtime.dir}" out-file="cfd.out" >
              <command-line>${executable} -mesh ${meshType} -param ${inputParam}</command-line>
             <!-- <command-line>${runtime.dir}/2D_Comp-2.0 -mesh ${meshType} -param ${inputParam}</command-line> -->
            </simple-process>
            <mkdir>
            	<uri>file:${RESULT_LOC}/${service.job.name}</uri>
            </mkdir>
            <copy sourceDir="file:${runtime.dir}/result" target="file:${RESULT_LOC}/${service.job.name}"/>
          </ogrescript>
        </serial-scripts>
      </elf>
    </payload>
  </scripts>
</workflow-builder>

RMIService Bean (optional)

PTPFlow's RMI Service is the service that manages the execution of PTPFlow workflows and records event information. It is the service through which clients communicate to find the status of their workflows. This information could be managed by Tupelo (at some future date) using an RMIServiceBean. This information is currently stored in xml files and managed by PTPFlow's repository system.The information about each service installation will be stored in an RMIServiceBean and will be used to launch and start the service. All of this information is currently used in PTPFlow and is stored in xml files. Bringing in tupelo to the service stack will allow us to store this information in tupelo.

Code Block
titleRMIServiceBean extends CETBean implements Serializable
// Service Info
private String name;
private String platform;
private String deployUsingURI;  // e.g. file:/
private String launchUsingURI;
private String installLocation;  // e.g. /home/user_home/ptpflow
private String rmiContactURI;
private int rmiPortLowerBound;
private int rmiPortUpperBound;
private int gridftpPortLowerBound;
private int gridftpPortUpperBound;
private Date installedDate;
private boolean running;
private Set<HostResourceBean> knownHosts;  // all of the known hosts associated with this service

Repository Manager

Repository View

Rather than a single repository view, this will be multiple views that are configured to show a particular type of bean(s) coming from a content provider. The content provider would get the data required from the configured tupelo context(s). For example, we will need a "Dataset Repository View" that shows all datasets (e.g. input/output datasets) and a way to manipulate them (e.g. add tags, annotations, etc), "Workflow Repository View" that shows all imported workflows, "Scenario Repository View" that shows all saved scenarios, "Service Repository View" that shows defined RMI service endpoints for launching jobs, and a "Known Hosts View" for showing known hosts that can accept jobs. This seems like too much disparate information to display in a single view. All Repository views will descend from BardFrameView since the BardFrame will be required to get the data required for each view.

Functional Requirements

...

...

Host Resource Bean (optional)

A critical requirement is that repositories can be both remote and local and users might use more than one simultaneously. Input data for workflows that is managed by Tupelo will need to move from the users machine to a location that the HPC machine can access. Datasets should also be returned to the users scenario or made available to it.

Host Manager

Known Host View

This view contains a list of defined HPC hosts that the user can launch jobs on. This view will provide the user with the ability to change/view/add properties such as environment settings, user information for the host (username, user home, etc), host operating system, node properties, new hosts, etc. These changes should be propogated to the defined RMI services so they can be used immediately.

Known Host Bean

Below is the bean structure that is anticipated:

...

Code Block
titleUserPropertyBean extends CETBean implements Serializable
private String userHome;
private String userName;
private String userNameOnHost;

X509CertificateBean

Code Block
titleX509CertificateBean extends CETBean implements Serializable

private String credential = null;

public String getCredential() {
  return credential;
}

public void setCredential(String credential) {
  this.credential = credential;
}

Analysis Framework

...

Code Block
titleX509CertificateBeanUtil extends AssociatableTupeloBeanUtil<X509CertificateBean>

  public X509CertificateBeanUtil(BeanSession beansession) {
    super(beansession)
  }

  public Resource getAssociationPredicate() {
    return KNSG.HAS_CREDENTIAL;  // "http://cet.ncsa.illinois.edu/2011/security/hasCredential"
  }

  public Resource getType() {
    return KNSG.X509_CERT;  //"http://cet.ncsa.illinois.edu/2011/X509Certificate"
  }

  public BeanMapping getMapping() {
    BeanMapping map = super.getMapping();

    // Java class representing the bean
    map.setJavaClassName(X509CertificateBean.class.getName());

    // Properties for the bean
    map.addProperty(KNSG.X509_CREDENTIAL, "credential", String.class);

    return map;
  }

  // Associate a credential with a given item (e.g. WorkflowStepBean, PersonBean)
  public void addCredential(CETBean item, String credential, Date expiration) throws OperatorException {
    addCredential(Resource.uriRef(item.getUri()), credential, expiration);
  }

  public void addCredential(Resource item, String credential, Date expiration) throws OperatorException {
    TripleWriter tw = new TripleWriter();

    // Create credential bean to store credential
    Resource credentialBean = Resource.uriRef(new X509CertificateBean().getUri());

    // Create the credential triple
    tw.add(Triple.create(credentialBean, KNSG.X509_CREDENTIAL, credential));

    // associate the credential bean with the user
    tw.add(Triple.create(item, KNSG.HAS_CREDENTIAL, credentialBean ));

    getBeanSession().getContext().perform(tw);
  }

  public String getCredential(Resource item) throws OperatorException {
    String credential = null;

    Unifier uf = new Unifier();
    uf.addPattern(item, KNSG.HAS_CREDENTIAL, "thecred");
    uf.addPattern("thecred", KNSG.X509_CREDENTIAL, "credential");
    uf.addColumnName("credential");

    getBeanSession().getContext().perform(uf);

    for(Tuple<Resource> row : uf.getResult()) {
      if(row.get(0) != null) {
	credential = row.get(0).getString();
      }
    }
    return credential;
  }

  public void removeCredential(Resource item, X509CertificateBean cred) throws OperatorException {
		
    Context context = getBeanSession().getContext();
		
    TripleWriter tw = new TripleWriter();
    tw.remove(Triple.create(item, KNSG.HAS_CREDENTIAL, Resource.uriRef(cred.getUri())));
    context.perform(tw);
  }

Metadata Requirements

This is a list of information we would like to capture with Tupelo using an ontology that is built by NCSA and KISTI (note that KISTI has not provided input on the domain concepts they would like captured).

General Framework Metadata

What we need to capture:

  • Is this dataset a result or output dataset?
  • Which workflow created this dataset?
  • Is this dataset an input dataset?
  • Who imported the dataset and when was it imported
  • What tags and annotations are associated with the dataset
Code Block
titleKNSG

public class KNSG {

//Namespace for KNSG
public static final String NS = "http://cet.ncsa.illinois.edu/2011/";

// KNSG Scenario
public static final Resource SCENARIO = newResourceSuffix("/knsg/scenario/KNSGScenario");
public static final Resource HAS_DATA = newResourceSuffix("/knsg/scenario/hasData");
}

eAIRS Metadata

What we need to capture:

  • Is this dataset an eAirs mesh (*.msh)?
  • Is this dataset an eAirs input file (*.inp)
  • Result files: coefhist.rlt, error.rlt, result.rlt, time.rlt, cp.rlt, force_com.rlt, result.vtk. We should capture enough information to know what each of these files represent as far as outputs (need input from KISTI).

Mime types of the files associated with the eAIRS workflow

  • Input
    • .inp
    • .msh
  • Output
    • .rlt
    • .vtk