SAGE Documentation

Application Management

The BardFrame is a general application piece for managing contexts, bean sessions, data, etc and will need to be extended for each specific use case if the use case requires an entirely new application (e.g. e-AIRS and e-Spine have different ways in which they will launch HPC jobs and might require two separate BardFrame's implementations). All application bean types should register with BardFrame and the BardServiceRegistry should provide the right instance of BardFrame. Alternatively, BardFrame could be made to allow applications to register their bean types.

Scenarios View

Displays user scenario(s) and all sub-parts displayed in a Tree view. A scenario is similar to the concept of a project where it contains a collection of parts (input datasets, output datasets, etc) that are part of the scenario. Users will launch jobs on the HPC machines that run workflows using the inputs in their scenario and when a project completes, the outputs should be added to the users scenario. A user might have multiple scenarios open at once, close scenarios, or even delete scenarios from their scenario view (deleted from the view, but still in the repository) so we'll need to manage which scenarios are in a session and what is their current state (open/closed). For example, I might have scenario A, B and C stored in a local repository, but only A and B are loaded into my application.

Scenario

A scenario will contain user data, workflows, etc specific to a scenario (or project). This will include datasets (input and output), workflows, and an RMI service for launching jobs.

ScenarioBean extends CETBean implements Serializable, CETBean.TitledBean

private String title;
private String description;
private Set<DatasetBean> dataSets;
private RMIServiceBean serviceBean;
private List<WorkflowBean> workflows;

RMI Service Registry

The service registry contains all machine defined as available to the user for installing the PTPFlow plugins required to run HPC jobs and return status information to the client.

RMIService Info

The information about each service installation will be stored in an RMIServiceBean.

RMIServiceBean extends CETBean implements Serializable

// Service Info
private String name;
private String platform;
private String deployUsingURI;  // e.g. file:/
private String launchUsingURI;
private String installLocation;  // e.g. /home/user_home/ptpflow
private String rmiContactURI;
private int rmiPortLowerBound;
private int rmiPortUpperBound;
private int gridftpPortLowerBound;
private int gridftpPortUpperBound;
private Date installedDate;
private boolean running;

Workflows

Each workflow is described by an XML file that outlines the steps in the process including which machine to run on, executables that will be launched, input files, etc. Initially we will simply store the workflow information in a single WorkflowStepBean that has a reference to the file containing the xml and the DatasetBeans. Ogrescript xml files can be complex, but if we can logically separate out the pieces into steps or parts that can be used to generate the full workflow xml file required by the HPC machines, then we can include workflow steps as separate beans.

WorkflowBean extends CETBean implements Serializable

private String title;
private String description;
private Date date;
private List<WorkflowStepBean> workflowSteps;
private PersonBean creator;
private Collection<PersonBean> contributors;

WorkflowStepBean extends CETBean implements Serialiable

private String title;
private PersonBean creator;
private Date date;
private List<DatasetBean> inputDatasets;
private DatasetBean workflow;  // initially our steps will only include a single step, the entire workflow

Repository View

Rather than a single repository view, this will probably be multiple views that are configured to show a particular type of data coming from a content provider. The content provider would get the data required from the configured tupelo context(s). For example, we will need a "Data Repository View" that shows all datasets (e.g. input/output datasets) and a way to manipulate them (e.g. add tags, annotations, etc), "Scenario Repository View" that shows all saved scenarios, "Service Repository View" that shows defined RMI service endpoints for launching jobs, and a "Known Hosts View" for showing known hosts that can accept jobs. This seems like too much disparate information to display in a single view. In SAGE and all derived products, a repository is going to be used for storing information that must be persisted and includes input data, output data, saved scenarios, workflows, etc.

Functional Requirements

Import datasets that will be used as input to HPC workflows such as Mesh files, input files (e.g. mach number, poisson ratio, etc)
Store output datasets from workflow runs, some workflows will be parameterized and have multiple outputs
Store scenarios
Store defined RMI services
Store known-hosts
Store workflow xml files (Ogrescript)
Other functionality?

Repositories can be both remote and local and users might use more than one simultaneously. Input data for workflows that is managed by Tupelo will need to move from the users machine to a location that the HPC machine can access. Datasets should also be returned to the users scenario or made available.

Known Host View

This view lists information about the HPC hosts such as environment settings, user information for the host (username, user home, etc), host operating system, node properties, etc.

HostBean extends CETBean implements Serializable

private String osName;  // host os name
private String osVersion; // host os version
private String architecture; // host architecture
private String id; // host id
private Set<PropertyBean> envProperties;  // environment properties on host
private Set<NodeBean> nodes;  // properties of each node

Space shortcuts

Child pages