Enabling Science and Engineering Workflows on Dynamically Federated Cloud Infrastructure using CometCloud, Rutgers University

Public clouds have emerged as an important resource class enabling the renting of resources on-demand, and supporting a pay-as-you-go pricing policy. Furthermore, private clouds or data centers are exploring the possibility of scaling out to public clouds to respond to un-anticipated resource requirements. As a result, dynamically federated, hybrid cloud infrastructures that integrate private clouds, enterprise datacenters and grids, and public clouds are becoming increasingly important. Such federated cloud infrastructures also provide opportunities to improve application quality of service by allowing applications tasks to be mapped to appropriate resource classes. For example, typical application workflows consists of multiple application stages, which in turn can be composed of different application components with heterogeneous computing requirement in terms of the complexity of the tasks, their execution time, as well as their data requirements. Managing and optimizing these workflows on dynamically federated hybrid clouds can be challenging, especially since it requires simultaneously addressing resource provisioning, scheduling and mapping while balancing QoS with costs.

In this work, we explore autonomic approaches to addressing these challenges, and describe an autonomic framework that provides programming abstractions and runtime services to support complex application workflows. This framework is implemented on top of the CometCloud autonomic cloud engine, which supports dynamic cloud federation, autonomic cloud bursting to scale out to public clouds, and autonomic cloud bridging to integrate multiple datacenters, grids and clouds on-demand. The workflow framework builds a federated cloud and runs user applications on the federated cloud. A cloud can join or leave the federated cloud dynamically to scale up/down. The resource status of nodes in each cloud, such as available CPU, memory, network bandwidth, etc. are monitored and referred to make decisions for scheduling jobs. An application or a sequence of applications described as a workflow is submitted to the workflow framework and consumed on the federated cloud. Typically, a workflow consists of a sequence of applications and the output of one stage becomes the input of the next stage. Hence, each stage should be completed in an order.

The workflow framework provides support for (1) federated cloud management and (2) autonomic workflow management. A cloud can join the federated cloud dynamically by sending a join request to the autonomic scheduler with a list of available cloud resources. The autonomic scheduler manages the global view of resources and their availability, and provisions resources as needed. Cloud agents manage local resources and communicate with the autonomic scheduler to enable the management of the federated cloud. The workflow manager and application agents are collectively responsible for workflow management. The workflow manager submits workflow tasks, uses the autonomic scheduler to schedule these tasks, and receives results from the application agents that execute the tasks. The application agents pick up appropriate workflow tasks from the CometCloud task pool and execute them. This workflow framework enables the provisioning of appropriate resource classes for each stage of a workflow so as to improve application quality of services.

Child pages

Enabling Science and Engineering Workflows on Dynamically Federated Cloud Infrastructure using CometCloud, Rutgers University