Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • To assemble a loosely coupled set of components with well-defined functions such that we could extend them or compose into them other components as the need arose;
  • To build an infrastructure above the lower middleware layer which would be general enough to suit a wide range of applications and not be tightly bound to the requirements of any single domain (along with this goal goes, needless to say, the requirement that application-level code remain insulated from this infrastructure; that is, that it need not be recompiled to work in our environment);
  • o place high priority on productive and efficient interaction with the resource disposition typical of HPC centers, which more specifically means traditional batch queue managers; hence we have imposed a fundamental asynchronicity on the manner in which the highest level of the execution process must be handled (for instance, it would not make sense in this light that the component responsible for the execution logic hold the entire graph in memory for the duration of the workflow);
  • To interact with existing resources such as mass storage systems and HPCs while requiring no modifications, additions, or installations on the remote resources.

Previous and Existing Uses

The PWE is currently used along with the Siege user interface.

Planned Extension and Modifications

The PWE is a robust, general system for the management of workflows and little effort should be required to modify the existing system to support the use cases provided for by this proposal. Logical extensions, however, may be required to suit the particular needs of both the back-end HPC systems and any Grid-middleware required accessing those systems.

...

Scientific workflows can require an immense amount of archival storage.  For this proposal, these storage resources must be identified and the appropriate access and security privileges must be given to both the developers and the end users.  These resources must be able to be accessible by both the back end resources so jobs can store the data as well as by the client applications, whether they are web-based or desktop-based. 

Previous and Existing Uses

Identify the resources that will be used on this project.

Planned Extensions and Modifications

It is assumed that the configuration and administration of these resources lie outside the purview of this proposal.

...

Managing information about and generated by scientific applications is critical to the long term success of any project launching large number of jobs onto HPC resources.  Merely storing the data is typically a straightforward process, but finding a particular data set in the black hole that can be a mass storage system can be frustrating and time consuming.  To ensure the success of this project, care must be taken to develop a system to track the inputs, outputs, and provenance information generated by the scientific workflows so users can easily find and retrieve their data when needed.

Previous and Existing Uses

The use of metadata for management of scientific processes exists in many incarnations. At NCSA we are concentrating on the use of the Resource Description Framework (RDF) as a model for tracking this metadata, such as provenance and user submitted metadata (e.g. tagging and annotations.)

Planned Extensions and Modifications

This component is not in the scope of this project.

...

The existing service stack uses both synchronous Java RMI calls as well as asynchronous Java Messaging Service (JMS) events to handle interprocess communications.  A custom Event Service was developed to manage archival event storage as well providing a means for desktop clients which may go offline for extended periods to not miss any events.

Previous and Existing Uses

Both Java RMI and JMS are widely used across the globe. The custom Event Service is used along with PWE and Siege.

Planned Extensions and Modifications

The event service will not be extended or modified for this project.

Grid Middleware Layer

The primary mechanism for accessing the HPC resources with be via the PWE.  In order to manage the submission of a potentially large amount of workflows over a potentially large number of geographically distributed resources, the PWE bypasses most of the Grid middleware specific functionality such as JobManagers.  Instead, it relies heavily upon a few core components such as Grid Security Infrastructure (GSI) and GridFTP.  This allows a more efficient use of both the backend resources (e.g. not overloading either the cluster headnode or the queuing systems) as well as allowing the PWE to support very large (thousands of nodes) workflows.

We plan no modifications or extension to the Grid Middleware Layer.

Globus/gLite/etc

The Globus Toolkit is a widely used Grid middleware project and is the toolkit running on the PRAGMA resources.

The gLite is the middleware stack for grid computing used by the CERN Large Hadron Collider (LHC) experiments and a very large variety of scientific domains.

Grid Security Infrastructure (GSI)

...