Introduction
This document describes the architecture of the KISTI-NCSA Science Gateway (KNSG) Framework. To fully describe this architecture, this document also provides information about external technologies that were used to build the framework (e.g. Eclipse).
KNSG provides an extensible open source software platform for building HPC software applications that are managed by PTPFlow. The KNSG frameworks provides an easy to use graphical user interface for setting up and launching HPC workflows and views to manage the execution and retrieve results. The initial framework builds upon the Eclipse Rich Client Platform (Eclipse RCP), PTPFlow and Bard.
Purpose
The purpose of this document is to outline the major components of the KNSG Framework architecture for developers who are interested in using and extending the framework for their software applications. All software extension points are documented with the eAIRS CFD workflow use case used as an example concrete implementation.
Scope
The result of version 1.0 is an initial framework for building new HPC applications with some semantic capabilities and an easy to use graphical user interface for setting up, launching and monitoring HPC workflows.
Definitions, Acronyms and Abbreviations
See the Glossary for definitions, acronyms and abbreviations used in this document.
Overview
This document should contain all of the required information for developing new KNSG applications. Where external technologies are used, there will be links to documentation for them. The Use Case section indentifies the initial goal of the project, which is to provide users with an application for setting up, launching and monitoring the eAIRS CFD workflow.
Architectural Goals and Constraints
Development Tools and Resources
Eclipse RCP/Subversion
The Eclipse RCP platform provides an plugin-based architecture for building new applications. Eclipse provides core application functionality (such as drag and drop, window management, extension points and extension capabilities, etc) that allows users to build professional-looking application, with native look-and-feel, on multiple platforms and allowing them to focus on their value-add instead of building the entire application framework from scratch.
Graphical Editing Framework
The graphical editing framework is a toolkit for graphing and editing models. You can learn more about GEF here and here.
C/C++ Development Plugin
The PTPFlow library used included in KNSG uses a small amount of functionality from the C++ Development Plugin provided by Eclipse since PTPFlow was initially intended to be just views and perspectives withing the Eclipse IDE for C+. The C+ IDE was chosen primarily for it's size (it is the smallest Ecli with an easier to use and pse download). This also illustrates the power of KNSG's plugin-based architecture. KNSG version 1.0 puts an easier to use GUI on top of PTPFlow making it easier for novice users to run HPC Workflows while still providing the full capabilities of PTPFlow for advanced users. For more information about PTPFlow, go here.
Java
JavaSE 1.6
The KNSG framework is built against JavaSE 1.6; however, it should also build with JavaSE 1.5 since no JavaSE 1.6 language specific features are being used.
Initial Use Case: eAIRS CFD
The initial use case is the eAIRS RCP application which will be built using the KNSG Application Framework. This version of eAIRS and KNSG will focus on providing an easy to use graphical user interface for setting up the eAIRS CFD Workflow and launching it with PTPFlow. After launching the workflow, PTPFlow's monitoring facilities will be used to monitor the execution of the workflow and then the framework will provide the ability to retrieve workflow results.
Data Catalog
The data catalog consists of repositories that store data used by the KNSG Framework. Below is a description of what a repository is, the API for the repository and a concrete implementation used by the framework.
Repositories
Interfaces and Base Class
public IDataset retrieveDataset(DatasetId id); public List<DatasetProperties> listAllDatasetProperties(); public boolean hasDataset(DatasetId id); public boolean deleteDataset(DatasetProperties properties); public DatasetProperties getDatasetProperties(DatasetId id); public boolean isDisabled(); public void setDisabled(boolean disabled); public boolean isWritable();
public DatasetId importDataset(DatasetProperties properties, List<URI> files); public void saveDatasetProperties(DatasetProperties properties);
protected boolean disabled = false; public boolean isWritable() { return true; }
Local Repository
This represents a concrete implementation of a repository on the user's local machine. It provides methods for managing their imported data on the local machine through the RepositoryView.
// Base directory where the local repository is located protected File baseDirectory; // Directory containing the datasets protected File datasetDirectory; // Directory containing the dataset properties protected File propertiesDirectory;
The repository is responsible for all data used by the KNSG Framework. Users can input data into a repository, add metadata to imported datasets, and drag and drop data onto their Scenarios. The contents of a repository are visualized by the RepositoryView. The next section will discuss in more detail about the KNSG Framework dataset objects.
Dataset
In the KNSG Framework, a dataset consists of the data and the properties (metadata) associated with the data. In the next three sections, the Dataset (containing the data), the DatasetProperties (metadata), and the DatasetBean (the concrete implementation of IDataset), will be discussed in detail.
Dataset Interface
Each dataset object should implement this interface so the system can interact with the dataset object. The KNSG Framework contains one concrete object, a DatasetBean, that generically can handle any kind of data. More specific implementations can be created if needed to handle special kinds of data.
public String getFriendlyName(); public void setFriendlyName(String friendlyName); public DatasetId getDatasetId(); public void setDatasetId(DatasetId datasetId);
DatasetProperties
This object stores all properties associated with a dataset, including any metadata that is used to describe the type of data.
private String name; private String description; private DatasetId datasetId; private List<IMetadata> properties = new LinkedList<IMetadata>();
DatasetBean
This class represents a generic file dataset in KNSG. Datasets can be tagged as inputs for different workflows so that the analysis pages can use this information for adding the correct input to workflows.
private DatasetId datasetId; private String friendlyName; private File data;
Metadata
Metadata is loosely defined as data that describes data. In the KNSG Framework, any new metadata types should implement the IMetadata interface so the system is aware of it and it is stored properly. The first section below defines the basics of the IMetadata interface and the section that follows the interface definition contains two concrete examples, TagBean and AnnotationBean for tagging and commenting datasets.
IMetadata Interface
public static final String EXT_PT = "edu.illinois.ncsa.knsg.metadata"; public String getLabel(); public String getValue();
Metadata Implementations
Add a tag for a dataset using a TagBean. There is no limit to the number of tags a dataset could have (e.g. eAIRS-Input, result, etc).
private String tag; // metadata type public String getLabel() { return "tag"; } public String getValue() { return tag; }
The AnnotationBean allows users to add comments to a dataset. The annotation consists of a title, comment and date the comment was made.
private String title; private String annotation; private Date date = new Date();
Scenario
A Scenario is the container object for all things that are in a users scenario including including a list of datasets, the workflows that have been executed, and the RMI Service that the scenario uses. The scenario tracks all parts that make up each scenario and the framework is designed to support multiple scenarios. The scenario inherits from the IUserFacing interface, as do all objects in the KNSG Framework that are saved and restored as beans. Here we will only show the Scenario class, see the repository for the other classes.
private List<WorkflowBuilderModel> workflows; private List<IDataset> datasets; private ServiceInfo rmiService;
Managers
Managers provide access to objects in the KNSG framework and manage them. For example, the ScenarioManager allows users to build multiple scenarios simultaneously so that many what-if type analyses can be executed. Managers in the KNSG Framework should extend the BaseManager class, which provides basic services such as saving the state of the managers members and providing change listeners so objects are aware of changes to members of the Manager.
protected File stateDirectory; // The type of members this manager manages protected String type; protected List<BaseMember> members; protected IMemberChangeListenerSupport listeners = new IMemberChangeListenerSupport(); protected IMemberChangedListener myPassThroughListener = new MyIMemberChangeListener(); protected Object lock = new Object(); protected long lastSave = System.currentTimeMillis(); // BaseMember contains the change listener so adding and removing members is slightly different protected void addMember(BaseMember member); protected void removeMember(IMember member); public BaseMember[] getAllMembers();
Scenario Manager
The ScenarioManager manages each scenario that is created and allows users to have multiple scenarios open simultaneously. It provides access to each scenario so objects can be added and removed from the scenarios. It also listens for changes so it knows when a scenario has been updated.
private static ScenarioManager instance; private ScenarioManager() { type = "scenario"; } public Scenario getSelectedScenario(); public void addScenario(Scenario scenario); public void removeScenario(Scenario scenario); protected BaseMember createMember(File file); public void removeDatasetFromAllScenarios(DatasetProperties properties); public void removeDatasetFromAllScenarios(IDataset dataset);
The type tells KNSG where to store the state information for this manager and the objects that it manages. For example, this manager's state will be stored in user.home/NCSA/KNSG/scenario. So, if user.home was /home/foo, this would result in the state directory being /home/foo/NCSA/KNSG/scenario
Analysis Framework
The analysis framework allows developers to define new workflows that can be executed by the KNSG framework. By specifying a workflow and a wizard and wizard pages to setup the workflow, users can easily add new analyses to the system. Each part of the framework is described in detail below.
Workflow
The workflow is an XML file containing the Ogrescript to launch a job on an HPC machine. You will need to determine which parts of the workflow you will make configurable and then use the KNSG framework to add the UI parts that will let the user specify values for the workflow. For information on Ogrescript, see the documentation here. Also, for a tutorial that takes you through the creation of a new application and sample workflow, go here. In the next section we will briefly discuss the important UI parts.
Analysis Wizard
The AnalysisWizard is the base class for all analysis wizards defined in the KNSG framework. New analyses should use the extension point specified to tell the framework which wizard will setup the parts of the workflow and launch the analysis.
public static final String EXT_PT = "edu.illinois.ncsa.knsg.ui.analysisWizards"; protected Scenario scenario; protected String jobId; public void setScenario(Scenario scenario); public abstract String[] getOutputs(); public String getJobId(); public String getResultPath(WorkflowInfo workflowInfo, ParsedLocalEventKey eventKey); public abstract String[] internalGetResultPath(WorkflowDetails workflowDetails, ParsedLocalEventKey eventKey); // Writes local file to remote machine and returns absolute path to insert into workflow script protected String writeRemoteFile(File localFile, String remotePath, String hostURI) { return JobUtils.writeFile(localFile, remotePath, hostURI); }
In the Eclipse extension point, you must specify the following items:
- id The ID of the new analysis wizard, this needs to be unique since it will be used to retrieve the wizard.
- name The name of the new analysis, this will be displayed for users to select which analysis to run.
- tag Tag for the workflow, this should match the "experimentId" of the workflow xml file because this is used to retrieve results, otherwise we don't know which results came from which analysis wizards.
- workflow Specify the workflow xml file that will be setup and ran by this analysis.
- class The wizard class with all of the logic for setting up and running the analysis.
Once you have your analysis wizard, you will need to add wizard pages that provide UI fields that will provide input to your workflow.
Views
All views must use the org.eclipse.ui.views extension point. By registering your view with the extension point, Eclipse will automatically make it available to users who include your plug-in in their application. Below you will find some common views provided by the KNSG Framework that can be used directly or extended to provide specific functionality, layout, icons, etc.
Scenarios View
The framework provides a BaseScenariosView and its concrete implementation, KNSGScenariosView. The BaseScenariosView provides basic services such as view selections, a context menu and refresh method for updating the view. The key methods are below:
public abstract class BaseScenariosView extends ViewPart implements Refreshable { protected Viewer viewer; // Abstract Methods subclasses should implement // Define what the view should look like protected abstract void internalCreatePartControl(Composite parent); // Handle selections within the view protected abstract void handleSelectionChanged(ISelection selection); // Something has changed, refresh what needs refreshing protected abstract void handleRefresh();
The KNSGScenariosView provides a basic tree view of each scenarios objects. Default icons are provided for folders and objects contained in the scenario. By extending this class or the base class, users can use new content providers to change how objects are displayed, any additional data that gets displayed, and icons for those parts. By right clicking on a Scenario, the user can launch jobs on HPC machines using inputs from the Scenario and the Repository and when workflows complete, users can retrieve the results back to the Scenario through menus associated with the HPC job. As previously mentioned in the section regarding Scenarios, the Scenarios object is responsible for tracking which workflows were executed by it and the associated input data and result data and the ScenariosView provides the visualization of the content of each Scenario object..
Repository View
The RepositoryView provides a Tree view of all datasets in the LocalRepository and content providers that control how the datasets are displayed in the view. It also provides simple drag and drop capabilities that allow users to drag and drop datasets among views that support receiving datasets.
Users can subclass the RepositoryView or implement their own repository view if they want to display the data differently, use different icons, provide new/different functionality, etc. It is recommended that users subclass RepositoryView if they need a different implementation since the drag and drop capabilities would be automatically provided to the subclass. One example of a possible subclass would use more metadata to offer more advanced views and organization of the data with a simple example shown in the eAIRS ScenariosView. The eAIRS ScenariosView categorizes its data based on TagBeans. If a dataset contains a result tag, it will be listed in the Scenario under a sub-folder called Results. Many customization scenarios are possible using Metadata.
Metadata View
The KNSG MetadataView displays all Metadata associated with a dataset. In the KNSG Framework, both the ScenariosView and the RepositoryView are registered with the Eclipse selection provider so clicking on a dataset in either view will bring up the metadata for the dataset in the MetadataView. Registering new views that contain datasets with the Eclipse selection service can automatically take advantage of the MetadataView capabilities since it is listening for dataset selection.