Application Management
The BardFrame is a general application piece for managing contexts, bean sessions, data, etc and will need to be extended for each specific use case if the use case requires an entirely new application (e.g. e-AIRS and e-Spine have different ways in which they will launch HPC jobs and might require two separate BardFrame's implementations). All application bean types should register with BardFrame and the BardServiceRegistry should provide the right instance of BardFrame. Alternatively, BardFrame could be made to allow applications to register their bean types.
Scenarios View
Displays user scenario(s) and all sub-parts displayed in a Tree view. A scenario is similar to the concept of a project where it contains a collection of parts (input datasets, output datasets, etc) that are part of the scenario. Users will launch jobs on the HPC machines that run workflows using the inputs in their scenario and when a project completes, the outputs should be added to the users scenario. A user might have multiple scenarios open at once, close scenarios, or even delete scenarios from their scenario view (deleted from the view, but still in the repository) so we'll need to manage which scenarios are in a session and what is their current state (open/closed). For example, I might have scenario A, B and C stored in a local repository, but only A and B are loaded into my application.
Scenario
A scenario will contain user data, workflows, etc specific to a scenario (or project). This will include datasets (input and output), workflows, and an RMI service for launching jobs.
private String title; private String description; private Set<DatasetBean> dataSets; private RMIServiceBean serviceBean; private List<WorkflowBean> workflows;
RMI Service Registry
The service registry contains all machine defined as available to the user for installing the PTPFlow plugins required to run HPC jobs and return status information to the client.
RMIService Info
The information about each service installation will be stored in an RMIServiceBean.
// Service Info private String name; private String platform; private String deployUsingURI; // e.g. file:/ private String launchUsingURI; private String installLocation; // e.g. /home/user_home/ptpflow private String rmiContactURI; private int rmiPortLowerBound; private int rmiPortUpperBound; private int gridftpPortLowerBound; private int gridftpPortUpperBound; private Date installedDate; private boolean running;
Workflows
Each workflow is described by an XML file that outlines the steps in the process including which machine to run on, executables that will be launched, input files, etc. Initially we will simply store the workflow information in a single WorkflowStepBean that has a reference to the file containing the xml and the DatasetBeans. Ogrescript xml files can be complex, but if we can logically separate out the pieces into steps or parts that can be used to generate the full workflow xml file required by the HPC machines, then we can include workflow steps as separate beans.
private String title; private String description; private Date date; private List<WorkflowStepBean> workflowSteps; private PersonBean creator; private Collection<PersonBean> contributors;
private String title; private PersonBean creator; private Date date; private List<DatasetBean> inputDatasets; private DatasetBean workflow; // initially our steps will only include a single step, the entire workflow
Repository View
Rather than a single repository view, this will probably be multiple views that are configured to show a particular type of data coming from a content provider. The content provider would get the data required from the configured tupelo context(s). For example, we will need a "Data Repository View" that shows all datasets (e.g. input/output datasets) and a way to manipulate them (e.g. add tags, annotations, etc), "Scenario Repository View" that shows all saved scenarios, "Service Repository View" that shows defined RMI service endpoints for launching jobs, and a "Known Hosts View" for showing known hosts that can accept jobs. This seems like too much disparate information to display in a single view. In SAGE and all derived products, a repository is going to be used for storing information that must be persisted and includes input data, output data, saved scenarios, workflows, etc.
Functional Requirements
- Import datasets that will be used as input to HPC workflows such as Mesh files, input files (e.g. mach number, poisson ratio, etc)
- Store output datasets from workflow runs, some workflows will be parameterized and have multiple outputs
- Store scenarios
- Store defined RMI services
- Store known-hosts
- Store workflow xml files (Ogrescript)
- Other functionality?
Repositories can be both remote and local and users might use more than one simultaneously. Input data for workflows that is managed by Tupelo will need to move from the users machine to a location that the HPC machine can access. Datasets should also be returned to the users scenario or made available.
Known Host View
This view lists information about the HPC hosts such as environment settings, user information for the host (username, user home, etc), host operating system, node properties, etc.
private String osName; // host os name private String osVersion; // host os version private String architecture; // host architecture private String id; // host id private Set<PropertyBean> envProperties; // environment properties on host private Set<NodeBean> nodes; // properties of each node