This document looks at some possibilities of extending version 1.0 of the KNSG Architecture to include semantic technologies that will improve the framework and add value to the user experience. KNSG is a non-domain specific application framework that is built upon Bard, PTPFlow, and MyProxy for setting up, launching and managing HPC application workflows through an easy to use set of user interfaces. The framework has simple facilities for working with data including import/export, annotation and tagging. The user can create scenarios, add data to them and then launch workflows to HPC machines using PTPFlow. There is also a facility for retrieving result data from completed jobs so the user can continue to work with it and if possible, visualize it. The intent of this document is to lay the foundation of how the core components and views will be enhanced in version 2.0 by adding in the semantic capabilities provided by Tupelo and replacing the current frameworks repository system with a Tupelo context and Tupelo beans. It will also give users information on how to extend the framework for their domain specific application.
Core Application Management
The central management piece for each KNSG application is KNSGFrame, an extension of BardFrame that registers tupelo utility methods for CETBeans used by the KNSG framework. For the remainder of this document we will use the term BardFrame since our extension (KNSGFrame) only overrides the method for registering beans, the rest is the same. BardFrame provides an interface for working with the Tupelo semantic content repository and is responsible for managing contexts, bean sessions, data, firing events, etc. The use of Tupelo beans will be a core concept for persisting information in the KNSG framework so all beans will need to descend from CETBean to remain compatible with the framework and other CET projects. Because every application will have its own bean requirements, each KNSG application should have its own instance of BardFrame to handle this along with an ontology to define domain specific concepts. All application bean types should register with BardFrame and the IBardFrameService should provide the correct instance of BardFrame at runtime.
This section outlines the bean classes that will be required for the KNSG framework. Where possible, the core CETBeans will be used to minimize the work required and maximize compatibility across projects. Some beans will be marked optional if they are part of PTPFlow and if it is uncertain that they will be managed by Tupelo or continue to be managed by PTPFlow's current repository. If a bean is from the set of beans provided by edu.uiuc.ncsa.cet.bean it will be noted so we differentiate between new beans for KNSG and the use of existing beans (e.g. the ScenarioBean below is new, but has the same name as the scenario bean in edu.uiuc.ncsa.cet.bean).
A scenario bean will be used to organize things such as user data and workflows specific to a scenario (or project). This will include datasets (input and output), workflows, and possibly the RMI service for launching jobs. The RMI service was previously part of the scenario, but since this is a system wide object, it will probably not be tracked as part of the scenario in version 2.0. A snippet of what the scenario bean might look like is below:
While this bean looks very similar to the ScenarioBean in the cet bean plug-in, it is unclear if some of the internal bean types will match what is required for eAIRS/KNSG (e.g. the workflow bean, the visualization bean). As the scenario bean evolves, it will become more clear whether we can replace our bean with the one in the cet bean plug-in. More final documentation will be put here as this design matures.
The main parts of this bean are: DatasetBean's will be used to manage all of the input/output datasets and the WorkflowBean List will contain the workflows associated with this scenario. A user might extend the ScenarioBean if their application has other things that logically belong to their scenarios, but it is envisioned that most changes will happen at the metadata level (e.g. this dataset is a mesh, a result, etc) since the scenario bean should be a generic container that satisfies most users needs.
This section is intended to talk about the types of concepts that the Ontology needs to capture. We will break this into two parts: general framework concepts (e.g. a result) and eAIRS specific (e.g. a mesh). We don't anticipate any changes to the DatasetBean class that is provided as part edu.uiuc.ncsa.cet.bean plug-in.
WorkflowBean & WorkflowStepBean
Below you will find an example of a PTPFlow workflow.xml file. Right now, this file cannot be altered since it is understood by PTPFlow and outlines the steps in the workflow including which resource to run on, executables that will be launched, input files to use, etc. Ideally, this file would be wrapped into the current WorkflowBean and/or WorkflowStepBean in the edu.uiuc.ncsa.cet.bean plug-in. If this is not possible, the KNSG framework will need its own workflow bean.
RMIService Bean (optional)
PTPFlow's RMI Service is the service that manages the execution of PTPFlow workflows and records event information. It is the service through which clients communicate to find the status of their workflows. This information could be managed by Tupelo (at some future date) using an RMIServiceBean. This information is currently stored in xml files and managed by PTPFlow's repository system.
Host Resource Bean (optional)
Below is the bean structure that is anticipated:
A HostResourceBean defines the hpc host and its properties.
A NodeBean defines an HPC nodes properties such as the protocols used and nodeId.
A UserPropertyBean defines the users properties on the host
This is a list of information we would like to capture with Tupelo using an ontology that is built by NCSA and KISTI (note that KISTI has not provided input on the domain concepts they would like captured).
General Framework Metadata
What we need to capture:
- Is this dataset a result or output dataset?
- Which workflow created this dataset?
- Is this dataset an input dataset?
- Who imported the dataset and when was it imported
- What tags and annotations are associated with the dataset
What we need to capture:
- Is this dataset an eAirs mesh (*.msh)?
- Is this dataset an eAirs input file (*.inp)
- Result files: coefhist.rlt, error.rlt, result.rlt, time.rlt, cp.rlt, force_com.rlt, result.vtk. We should capture enough information to know what each of these files represent as far as outputs (need input from KISTI).
Mime types of the files associated with the eAIRS workflow