You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 26 Next »

Ensemble Broker Documentation

Overview

The EnsembleBroker is a metaworkflow service designed to manage thousands of user submitted "nodes".

Documentation

View schematics images.

Examples

View example files.


Towards a revision of the Broker service.

The following are some thoughts concerning what will need to be done for the next version.

(1) Modify the structure of the descriptor.

This should involve the following:

  • Eliminate the top-level "ensemble", making the spawning of an ensemble a particular kind of workflow node
  • Add pre and post-conditions to each node. These could be file existence or property definition checks, and would be verified through the metadata system. Pre-condition property values can be retrieved and set on the node; post-condition property values and file locations would be registered with the metadata system.
  • The Node-to-node "dependency" (edge) would carry these as attributes.

(2) Modify the synchronization system.

Rather than synchronizing en bloc on the user; see the model adopted for the execution service.
Synchronization on state needs to be handled differently. All updates on objects should (1) reload the object from the database and (2) return the new object with the newer state. The Hibernate class needs to maintain three semaphores on ensembles, workflows and nodes which block modification of the given object by concurrent users. This needs to replace the class-level locks on the update methods.

(3) Implement a full clean-up system which also boots users who have no active workflows from the in-memory authentication repository. Since we will be using Tupelo (question) to store events, etc., there is no longer a need to clean those up.

(4) Ready Cache logic: we need to refactor the Ready Cache logic handling NodeAttempted logic. Two things should be kept in mind: should we account for how many previous times we have already tried to schedule the node in determining the next try? how do we ensure priority factors? should we retry higher priority ensembles more often? Currently there is only one global retry timeout setting for the entire service.

(5) Implement the "ensemble creator" worker and related mechanisms. These will involve (a) the unpacking of a description into an ensemble of sub-workflows; (b) a way of relating these for identification purposes (a group id); (c) a way of telling when the group has completed. This will be equivalent to the current "ensemble completed" worker's task.

(6) Eventually implement a "redo" worker (will this be necessary?) ... a workflow description which is recovered from Tupelo, but which is restarted at a certain node ...


An initial pass at the new Workflow description.

<workflow name="test-run-90428">
   <user>arossi</user>
   <constraints>
     <property/>
     <property/>
   </constraints>
   <nodes>
      <execute name="DATA" type="unscheduled">
	<child>COMPUTATION</child>
	<payload type="elf">... serialized XML or script ...</payload>
	<!--  or: text="... serialized XML or script ..." -->
	<payloadProfile>
	   <property name="a" value="A"/>
	   <property name="x" category="output"/>
	</payloadProfile>
	<schedulerProfile>
	   <property name="host-list" value="tungsten.ncsa.uiuc.edu"/>
	</schedulerProfile>
      </execute>
      <ensemble name="COMPUTATION" type="parameterized">
	<dependency>DATA</dependency>
	<child>VIZ</child>
	<parameterDescription/> <!-- need to work this out -->
	<workflow name="wrf-elf">
	   <nodes>
	      <execute name="WRF" type="scheduled">
		 <child>ANALYSIS</child>
	   	 <payload type="elf">... serialized XML or script ...</payload>
	   	 <payloadProfile>
		     <property name="x" category="input"/>
		     <property name="y${workflow-number}" category="output"/>
		 </payloadProfile>
		 <schedulerProfile>
		     <property/>
		     <property/>
		 </schedulerProfile>
	      </execute>
	      <execute name="ANALYSIS" type="scheduled">
	   	 <dependency>WRF</dependency>
	   	 <payloadProfile>
		     <property name="a" value="A"/>
		     <property name="y${workflow-number}" category="input"/>
		     <property name="z${workflow-number}" category="output"/>
	         </payloadProfile>
	   	 <schedulerProfile>
		     <property/>
		     <property/>
		 </schedulerProfile>
	      </node>	
	   </execute>
	</workflow>
      </ensemble>
      <execute name="VIZ" type="unscheduled">
	<dependency>COMPUTATION</dependency>
	<payloadProfile>
	   <property name="z$I{0:99}" category="input"/>
	</payloadProfile>
	<schedulerProfile>
	   <property name="host-list" value="cobalt.ncsa.uiuc.edu"/>
	</schedulerProfile>
      </execute>
   </nodes>
</workflow>

NOTES

  1. Node is now subclassed into Execute and Ensemble (or others). Q: implications of this polymorphism for Axis/Wsdl handling (the alternative here is to continue with one Node class and serialize an "ensemble descriptor" as its payload).
  2. Node sub-classes (or types) are handled by different workers.
  3. The parameterized ensemble type will have a workflow + parameter description (how to produce the parameters).
  4. Nodes can be given dynamic properties (input / output categories), which indicate that their value should be retrieved from the metadata system or written to the metadata system.
  5. The descriptor will allow for a limited use of reference resolution (useful in the case of multiple properties associated with paramterized, i.e., replicated workflows with indentical logic).
  6. The parameterized worker will do the following:
    • remove the ensemble node dependencies;
    • remove the dependencies from the children of the ensemble node;
    • set the latter dependencies on the last node of each sub-workflow;
    • add as dependencies to the ensemble node the first node of each sub-workflow;
    • store all workflows
    • mark the ensemble node as done
    • promote the first node of each sub-workflow into READY state and add associated actions to the controller queue (as did the ensemble submitted worker).
  • No labels