You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 24 Next »

Overview

SAGE is a non-domain specific application framework that is built upon Bard, PTPFlow, Tupelo, and MyProxy for setting up, launching and managing HPC application workflows through an easy to use user interface. This document is intended to lay the foundation of the core components and views provided by the SAGE application framework and inform users how they can extend the various parts for their domain specific application.

Core Application Management

The central management piece for each SAGE application is BardFrame. BardFrame provides an interface for working with the Tupelo semantic content repository and is responsible for managing contexts, bean sessions, data, etc. The use of beans will be a core concept for persisting information in the SAGE framework so all beans will need to descend from CETBean. Because every application will have its own bean requirements, each SAGE application should have its own instance of BardFrame to handle this. All application bean types should register with BardFrame and the BardServiceRegistry should provide the correct instance of BardFrame at runtime.

Scenarios View

The first main view provided by SAGE will be the ScenariosView. This view displays user scenario(s) and all sub-parts in a Tree view. A scenario is similar to the concept of a project and is simply a way of organizing things that belong together. The scenario is responsible for managing all of the pieces that it contains including input datasets, output datasets and workflows. A scenario may also contain the RMI Service that the workflows will use to launch their jobs, but this could end up being an application wide object or part of a Scenario Manager since users will most likely use the same launch point to execute jobs regardless of which scenario the workflow belongs to. Users will launch jobs on the HPC machines that use the inputs in their scenario and when a project completes, the outputs should be added back to that scenario. A user can have multiple scenarios open at once, close scenarios, or even delete scenarios from their scenario view (deleted from the view, but still in the repository) so we'll need to manage which scenarios are in a session and what is their current state (open/closed). It is anticipated that new applications might extend this view to organize their view differently for their specific domain.

Scenario

A scenario bean will be used to organize things such as user data and workflows specific to a scenario (or project). This will include datasets (input and output), workflows, and possibly the RMI service for launching jobs. As previously mentioned, this might end up an application wide object that is viewable from the scenario view, but not specific to any one scenario. A snippet of what the scenario bean might look like is below:

ScenarioBean extends CETBean implements Serializable, CETBean.TitledBean
private String title;  // scenario title
private String description;  // scenario description
private Set<DatasetBean> dataSets;  // datasets associated with scenario
private RMIServiceBean serviceBean;  // rmi service used to launch workflows
private List<WorkflowBean> workflows;  // workflows associated with this scenario
private boolean open;  // is the scenario opened or closed?
private PersonBean creator;  // scenario creator
private Date date;  // date scenario created

This code will evolve as the application framework is built and more final documentation will be put here as the design matures. DatasetBean's will be used to manage all of the input/output datasets, the RMIServiceBean (described later) will contain the service information and the WorkflowBean will contain the workflows associated with this scenario. A user might extend the ScenarioBean if their application has other things that logically belong to their scenarios.

RMI Service Registry

The service registry contains all machine defined as available to the user for installing the PTPFlow plugins required to run HPC jobs and return status information to the client.

RMIService Info

The information about each service installation will be stored in an RMIServiceBean.

RMIServiceBean extends CETBean implements Serializable
// Service Info
private String name;
private String platform;
private String deployUsingURI;  // e.g. file:/
private String launchUsingURI;
private String installLocation;  // e.g. /home/user_home/ptpflow
private String rmiContactURI;
private int rmiPortLowerBound;
private int rmiPortUpperBound;
private int gridftpPortLowerBound;
private int gridftpPortUpperBound;
private Date installedDate;
private boolean running;

Workflows

Each workflow is described by an XML file that outlines the steps in the process including which machine to run on, executables that will be launched, input files, etc. Initially we will simply store the workflow information in a single WorkflowStepBean that has a reference to the file containing the xml and the DatasetBeans. Ogrescript xml files can be complex, but if we can logically separate out the pieces into steps or parts that can be used to generate the full workflow xml file required by the HPC machines, then we can include workflow steps as separate beans.

WorkflowBean extends CETBean implements Serializable
private String title;
private String description;
private Date date;
private List<WorkflowStepBean> workflowSteps;
private PersonBean creator;
private Collection<PersonBean> contributors;
WorkflowStepBean extends CETBean implements Serialiable
private String title;
private PersonBean creator;
private Date date;
private List<DatasetBean> inputDatasets;
private DatasetBean workflow;  // initially our steps will only include a single step, the entire workflow

Repository View

Rather than a single repository view, this will probably be multiple views that are configured to show a particular type of data coming from a content provider. The content provider would get the data required from the configured tupelo context(s). For example, we will need a "Data Repository View" that shows all datasets (e.g. input/output datasets) and a way to manipulate them (e.g. add tags, annotations, etc), "Scenario Repository View" that shows all saved scenarios, "Service Repository View" that shows defined RMI service endpoints for launching jobs, and a "Known Hosts View" for showing known hosts that can accept jobs. This seems like too much disparate information to display in a single view. In SAGE and all derived products, a repository is going to be used for storing information that must be persisted and includes input data, output data, saved scenarios, workflows, etc.

Functional Requirements

  1. Import datasets that will be used as input to HPC workflows such as Mesh files, input files (e.g. mach number, poisson ratio, etc)
  2. Store output datasets from workflow runs, some workflows will be parameterized and have multiple outputs
  3. Store scenarios
  4. Store defined RMI services
  5. Store known-hosts
  6. Store workflow xml files (Ogrescript)
  7. Other functionality?

Repositories can be both remote and local and users might use more than one simultaneously. Input data for workflows that is managed by Tupelo will need to move from the users machine to a location that the HPC machine can access. Datasets should also be returned to the users scenario or made available.

Known Host View

This view lists information about the HPC hosts such as environment settings, user information for the host (username, user home, etc), host operating system, node properties, etc.

HostBean extends CETBean implements Serializable
private String osName;  // host os name
private String osVersion; // host os version
private String architecture; // host architecture
private String id; // host id
private Set<PropertyBean> envProperties;  // environment properties on host
private Set<NodeBean> nodes;  // properties of each node
private Set<UserPropertyBean> users;  // user properties on the host - userHome, userNameOnHost, userName
NodeBean extends CETBean implements Serializable
private String nodeId;  // id of the node, e.g. grid-abe.ncsa.teragrid.org
private List<FileProtocolBean> fileProtocols;
private List<BatchProtocolBean> batchProtocols;
private List<InteractiveProtocolBean> interactiveProtocols;
UserPropertyBean extends CETBean implements Serializable
private String userHome;
private String userName;
private String userNameOnHost;

Analysis Framework

  • No labels