Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Siege-PWE-ELF 3.0

See also Workflow Descriptor.

What's in 3.0?

  • Refactorings & additional features to accommodate:
    1. Running on IBM systems (AIX, LoadLeveler)
    2. Basic "scheduling" across systems, including:
      1. Rudimentary "match-making"
      2. Programmatic reservation requesting
        • hard: explicit start-times given to scheduler; entire workflow is scheduled "up-front"; must complete by designated time (a.k.a. "on demand")
        • soft: largely an IBM LL feature (a.k.a. "flexible"); acts like a normal job in the queue; but the reserved resources are bound to an id, not to a specific job
        • NB: in all cases other than hard reservations, workflow nodes are scheduled lazily as they become ready to run - "just-in-time" scheduling; on non-IBM platforms, we usually bypass programmatic reservation requests
  • Bug fixes and improvements

...

SIEGE (1): <scheduling> properties

These are the properties available to be set in a profile used for <scheduling> in the workflow description.   See also Job XML Schema.

There is essentially only one required property for all workflows; default is interactive ...

...

Code Block
xml
xml
	<property name="maxWallTimePerMember" type="long" />
   	<property name="minWallTimePerMember" type="long" />

Wiki MarkupThese default to 'std\[\].log' in the initial directory:

Code Block
xml
xml
	<property name="stdout" type="string" />
	<property name="stderr" type="string" />

...

Code Block
xml
xml
<execution>
    <profile name="paths">
      <property xmlns:ncsa.updateable.id="paths-${HOST_KEY}" name="paths-${HOST_KEY}" 		category="platform.configuration"/>
    </profile>
</execution>

...

2. Configurations for this profile, based on the various possible resolutions of the variable, are then written to the TupleSpace service. For example,

Code Block
xml
xml


<tspace-entry-builder id="1" owner="/C=US/O=National Center for Supercomputing Applications/OU=People/CN=Albert L. Rossi" typeName0="platform.configuration" typeValue0="paths-ABE" name="tspace-entry-paths-ABE">
   <ranOn/>
   <payload payloadType="ncsa.tools.common.types.Configuration">
      <configuration>
        <property name="ELF_HOME" category="environment">
          <value>/u/ncsa/arossi/elf-3.0.0</value>
        </property>
      </configuration>
   </payload>
</tspace-entry-builder>

...

  1. Can platform-dependent properties appear in <scheduling> profiles?
    • ANSWER:  No, only in <execution> profiles. If we think of <scheduling> properties as being used to determine what target resource to use, then obviously platform-dependent properties should not be included.
  2. Then why, for instance, is "account" one of the <scheduling> properties?
    • ANSWER:  A contradiction. It is included simply because it is specific to running on a resource. Note that if it were placed in an <execution> profile, the workflow would still complete successfully.
  3. Which of the <scheduling> properties must appear in a <scheduling> profile?
    • ANSWER:  Currently, the properties needed to make a scheduling request/reservation. These include values which affect the number of cores/nodes required and the wallclock time, along with submission type, and any properties that must match Host Information environment properties. All the others could actually appear either in <execution> profiles or platform.configuration tuples if so desired.

...

  1. The algorithm attribute defines the method used to order potential matching target resources, defining the sequence in which they will be tried. There are currently two available algorithms, one which randomly orders the target names, and the other ("static-load", the default), which contacts the machine to determine something like a "load" number on the system.
  2. Including this element indicates the workflow should be treated as "on-demand" (hard, time-based reservations determined all up front for the entire graph).
  3. This element establishes a set of rules to apply, in order, to the resource request issued to each potentially matching target machine when the original request fails. Currently, there are three available modifiers: starttime, cpus, walltime; rules are separated by a semicolon, and clauses of the rule by commas; the predicate stands for a percentage alteration of the original value or, in the case of starttime, an increment.  Thus the rules tell the scheduler to try 4 times; first with the original request; then by pushing forward the start-time; then by halving the number of cpus and doubling wall time; then finally, by taking one-fourth of the cpus and increasing wall time by 4.

...

SIEGE (4): <global-resource>

...

NOTE the same semantics apply to individual execute nodes:

Code Block
xml
xml
'
       <execute name="setR" type="remote">
           <resource>cobalt.ncsa.uiuc.edu,tg-login.ncsa.teragrid.org</resource>
       </execute>

...