Overview

SAGE is a non-domain specific application framework that is built upon Bard, PTPFlow, Tupelo, and MyProxy for setting up, launching and managing HPC application workflows through an easy to use set of user interfaces. This document is intended to lay the foundation of the core components and views provided by the SAGE application framework and inform users how they can extend the various parts for their domain specific application.

Core Application Management

The central management piece for each SAGE application is BardFrame. BardFrame provides an interface for working with the Tupelo semantic content repository and is responsible for managing contexts, bean sessions, data, etc. The use of beans will be a core concept for persisting information in the SAGE framework so all beans will need to descend from CETBean. Because every application will have its own bean requirements, each SAGE application should have its own instance of BardFrame to handle this. All application bean types should register with BardFrame and the BardServiceRegistry should provide the correct instance of BardFrame at runtime.

Scenarios View

The first main view provided by SAGE will be the ScenariosView. This view displays user scenario(s) and all sub-parts in a Tree view. A scenario is similar to the concept of a project and is simply a way of organizing things that belong together. The scenario is responsible for managing all of the pieces that it contains including input datasets, output datasets and workflows. A scenario may also contain the RMI Service that the workflows will use to launch their jobs, but this could end up being an application wide object or part of a Scenario Manager since users will most likely use the same launch point to execute jobs regardless of which scenario the workflow belongs to. Users will launch jobs on the HPC machines that use the inputs in their scenario and when a project completes, the outputs should be added back to that scenario. A user can have multiple scenarios open at once, close scenarios, or even delete scenarios from their scenario view (deleted from the view, but still in the repository) so we'll need to manage which scenarios are in a session and what is their current state (open/closed). It is anticipated that new applications might extend this view to organize their view differently for their specific domain.

Scenario Bean

A scenario bean will be used to organize things such as user data and workflows specific to a scenario (or project). This will include datasets (input and output), workflows, and possibly the RMI service for launching jobs. As previously mentioned, this might end up an application wide object that is viewable from the scenario view, but not specific to any one scenario. A snippet of what the scenario bean might look like is below:

ScenarioBean extends CETBean implements Serializable, CETBean.TitledBean

private String title;  // scenario title
private String description;  // scenario description
private Set<DatasetBean> dataSets;  // datasets associated with scenario
private RMIServiceBean serviceBean;  // rmi service used to launch workflows
private List<WorkflowBean> workflows;  // workflows associated with this scenario
private boolean open;  // is the scenario opened or closed?
private PersonBean creator;  // scenario creator
private Date date;  // date scenario created

This scenario bean will evolve as the application framework is built and more final documentation will be put here as the design matures. The main parts of this bean are: DatasetBean's will be used to manage all of the input/output datasets, the RMIServiceBean (described later) will contain the service information and the WorkflowBean will contain the workflows associated with this scenario. A user might extend the ScenarioBean if their application has other things that logically belong to their scenarios.

RMI Service Registry View

This view shows all of the machines defined as available to the user for installing the RMI service and PTPFlow plugins required to run HPC jobs and return status information to the client.

RMIService Info Bean

The information about each service installation will be stored in an RMIServiceBean and will be used to launch and start the service. All of this information is currently used in PTPFlow and is stored in xml files. Bringing in tupelo to the service stack will allow us to store this information in tupelo.

RMIServiceBean extends CETBean implements Serializable

// Service Info
private String name;
private String platform;
private String deployUsingURI;  // e.g. file:/
private String launchUsingURI;
private String installLocation;  // e.g. /home/user_home/ptpflow
private String rmiContactURI;
private int rmiPortLowerBound;
private int rmiPortUpperBound;
private int gridftpPortLowerBound;
private int gridftpPortUpperBound;
private Date installedDate;
private boolean running;
private Set<HostResourceBean> knownHosts;  // all of the known hosts associated with this service

Workflows

Each workflow is described in an XML file that outlines the steps in the process including which resource to run on, executables that will be launched, input files to use, etc. Initially, we will simply store the workflow information in a single WorkflowStepBean that has a reference to the file containing the xml and the DatasetBeans. Ogrescript xml files can be complex, but if we can logically separate out the steps or parts into individual bean that can be used to generate the full workflow xml file required by the HPC machines, then we can include workflow steps as separate beans and provide a UI for adding steps.

WorkflowBean extends CETBean implements Serializable

private String title;
private String description;
private Date date;
private List<WorkflowStepBean> workflowSteps;  // only one step initially which will be the workflow file that PTPFlow can launch right now
private PersonBean creator;
private Collection<PersonBean> contributors;

WorkflowStepBean extends CETBean implements Serialiable

private String title;
private PersonBean creator;
private Date date;
private List<DatasetBean> inputDatasets;  // all data inputs associated with this step
private DatasetBean workflow;  // initially our steps will only include a single step, the entire workflow

Repository View

Rather than a single repository view, this will be multiple views that are configured to show a particular type of bean(s) coming from a content provider. The content provider would get the data required from the configured tupelo context(s). For example, we will need a "Dataset Repository View" that shows all datasets (e.g. input/output datasets) and a way to manipulate them (e.g. add tags, annotations, etc), "Workflow Repository View" that shows all imported workflows, "Scenario Repository View" that shows all saved scenarios, "Service Repository View" that shows defined RMI service endpoints for launching jobs, and a "Known Hosts View" for showing known hosts that can accept jobs. This seems like too much disparate information to display in a single view. All Repository views will descend from BardFrameView since the BardFrame will be required to get the data required for each view.

Functional Requirements

Import datasets that will be used as input to HPC workflows such as Mesh files, input files (e.g. mach number, poisson ratio, etc)
Store output datasets from workflow runs, some workflows will be parameterized and have multiple outputs
Store scenarios
Store defined RMI services
Store known-hosts
Store workflow xml files (Ogrescript)
Other functionality?

A critical requirement is that repositories can be both remote and local and users might use more than one simultaneously. Input data for workflows that is managed by Tupelo will need to move from the users machine to a location that the HPC machine can access. Datasets should also be returned to the users scenario or made available to it.

Known Host View

This view contains a list of defined HPC hosts that the user can launch jobs on. This view will provide the user with the ability to change/view/add properties such as environment settings, user information for the host (username, user home, etc), host operating system, node properties, new hosts, etc. These changes should be propogated to the defined RMI services so they can be used immediately. Below is the bean structure that is anticipated:

A HostResourceBean defines the hpc host and its properties.

HostResourceBean extends CETBean implements Serializable

private String osName;  // host os name
private String osVersion; // host os version
private String architecture; // host architecture
private String id; // host id
private Set<PropertyBean> envProperties;  // environment properties on host
private Set<NodeBean> nodes;  // properties of each node
private Set<UserPropertyBean> users;  // user properties on the host - userHome, userNameOnHost, userName

A NodeBean defines an HPC nodes properties such as the protocols used and nodeId.

NodeBean extends CETBean implements Serializable

private String nodeId;  // id of the node, e.g. grid-abe.ncsa.teragrid.org
private List<FileProtocolBean> fileProtocols;
private List<BatchProtocolBean> batchProtocols;
private List<InteractiveProtocolBean> interactiveProtocols;

A UserPropertyBean defines the users properties on the host

UserPropertyBean extends CETBean implements Serializable

private String userHome;
private String userName;
private String userNameOnHost;

Analysis Framework

The analysis framework will allow users to register HPC workflows, modify the workflow inputs through a graphical user interface, and execute HPC jobs when all inputs are satisfied.

Space shortcuts

Child pages

Overview

Core Application Management

Scenarios View

Scenario Bean

RMI Service Registry View

RMIService Info Bean

Workflows

Repository View

Known Host View

Analysis Framework

Space shortcuts

Child pages

SAGE Documentation

Overview

Core Application Management

Scenarios View

Scenario Bean

RMI Service Registry View

RMIService Info Bean

Workflows

Repository View

Known Host View

Analysis Framework