Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
{toc}

h1. Overview

h1. Story 1 A study of cost and performance of the application of cloud computing to Astronomy ^1^

The performance of three workflow applications with different I/O, memory and CPU requirements are investigated on Amazon EC2 and the performance of cloud are compared with that of a typical HPC (Abe in NCSA).
The goal is to determine which type of _scientific workflow applications_ are cheaply and efficiently run on the Amazon EC2 cloud.
Also the application of cloud computing to the generation of an atlas of periodograms for the 210,000 light curves is described.

h3. Part I - Performance of three workflow applications

h5. Tools and methods

* Cloud platform: [Amazon EC2|https://wiki.ncsa.illinois.edu/display/CLOUD/Amazon#Amazon-Compute%28AmazonEC2%29] ([
Table of Contents

Overview

Story 1 A study of cost and performance of the application of cloud computing to Astronomy 1

The performance of three workflow applications with different I/O, memory and CPU requirements are investigated on Amazon EC2 and the performance of cloud are compared with that of a typical HPC (Abe in NCSA).
The goal is to determine which type of scientific workflow applications are cheaply and efficiently run on the Amazon EC2 cloud.
Also the application of cloud computing to the generation of an atlas of periodograms for the 210,000 light curves is described.

Part I - Performance of three workflow applications

Tools and methods

...

http://aws.amazon.com/ec2/])

...


Summary of the processing resources on Amazon EC2 and the Abe high-performance cluster

...

Type

...

Architecture

...

CPU

...

Cores

...

Memory

...

Network

...

Storage

...

Price

...

Amazon EC2

...

ml.small

...

32-bit


|| Type || Architecture || CPU || Cores || Memory || Network || Storage || Price ||
| Amazon EC2 |
| ml.small | 32-bit | 2.0-2.6 GHz Opteron

...

 | 1-2

...

 | 1.7 GB

...

 | 1 Gbps Ethernet

...

Local

 | Local | $0.10/hr

...

 |
| ml.large

...

 | 64-bit

...

 | 2.0-2.6 GHz Opteron

...

2

 | 2 | 7.5 GB

...

 | 1 Gbps Ethernet

...

Local

 | Local | $0.40/hr

...

 |
| ml.xlarge

...

 | 64-bit

...

 | 2.0-2.6 GHz Opteron

...

4

...

15 GB

...

1 Gbps Ethernet

...

Local

 | 4 | 15 GB | 1 Gbps Ethernet | Local | $0.80/hr

...

 |
| cl.medium

...

 | 32-bit

...

 | 2.33-2.66 GHz Xeon

...

2

 | 2 | 1.7 GB

...

 | 1 Gbps Ethernet

...

Local

 | Local | $0.20/hr

...

 |
| cl.xlarge

...

 | 64-bit

...

 | 2.0-2.66 GHz Xeon

...

8

 | 8 | 7.5 GB

...

 | 1 Gbps Ethernet

...

Local

 | Local | $0.80/hr

...

 |
| Abe Cluster

...

 |
| abe.local

...

 | 64-bit

...

 | 2.33 GHz Xeon

...

8

...

8 GB

...

10 Gbps InfiniBand

...

Local

...

N/A

...

abe.lustre

...

64-bit

...

2.33 GHz Xeon

...

8

...

8 GB

...

10 Gbps InfiniBand

...

Lustre TM

...

N/A

...

 | 8 | 8 GB | 10 Gbps InfiniBand | Local | N/A |
| abe.lustre | 64-bit | 2.33 GHz Xeon | 8 | 8 GB | 10 Gbps InfiniBand | Lustre ^TM^ | N/A |

* Workflow ^a^ applications
Three different workflow applications are chosen.
** Montage ([http://montage.ipac.caltech.edu]) from astronomy: a toolkit for aggregating astronomical images in Flexible Image Transport System (FITS) format into mosaic

...


The workflow contained 10,429 tasks, read 4.2 GB of input data, and produced 7.9 GB of output data.

...


Montage is considered I/O-bound because it spends more than 95% of its time waiting on I/O operations.

...


** Broadband ([http://scec.usc.edu/research/cme]) from seismology: generates and compares intensity measures of seismograms from several high\- and low-frequency earthquake simulation codes

...


The workflow contained 320 tasks, read 6 GB of input data, and produced 160 MB of output data.

...


Broadband is considered memory-limited because more than 75% of its runtime is consumed by tasks requiring more than 1 GB of physical memory.

...


** Epigenome ([http://epigenome.usc.edu]) from biochemistry: maps short DNA segments collected using high-throughput gene sequencing machines to a previously constructed reference genome

...


The workflow contained 81 tasks, read 1.8 GB of input data, and produced 300 MB of output data.

...


Epigenome is considered CPU-bound because it spends 99% of its runtime in the CPU and only 1% on I/O and other activities.

...



** Summary of resource use by the workflow applications

...

Application


|| Application || I/O

...

Memory

...

CPU

...

Montage

...

High

...

Low

...

Low

...

Broadband

...

Medium

...

High

...

Medium

...

Epigenome

...

Low

...

Medium

...

High

  • Methods
    The experiments were all run on single nodes to provide an unbiased comparison of the performance of workflows on Amazon EC2 and Abe.
    For experiments on EC2:
    • Executables were pre-installed in a Virtual Machine image which is deployed on the node.
    • Input data was stored in the Amazon EBS.
    • Output, intermediate files and the application executables were stored on local disks.
    • All jobs were managed and executed through a job submission host at the Information Sciences Institute (ISI) using the Pegasus Workflow Management System (Pegasus WMS) including Pegasus and Condor.
Cloud performance

...

 || Memory || CPU ||
| Montage | High | Low | Low |
| Broadband | Medium | High | Medium |
| Epigenome | Low | Medium | High |


* Methods
The experiments were all run on single nodes to provide an unbiased comparison of the performance of workflows on Amazon EC2 and Abe.
For experiments on EC2:
** Executables were pre-installed in a Virtual Machine image which is deployed on the node.
** Input data was stored in the Amazon EBS.
** Output, intermediate files and the application executables were stored on local disks.
** All jobs were managed and executed through a job submission host at the Information Sciences Institute (ISI) using the Pegasus Workflow Management System (Pegasus WMS) including Pegasus and Condor.

h5. Cloud performance

# Montage (I/O-bound)

...


The processing times on _abe.lustre_ are nearly three times faster than the fastest EC2 machines

...

 ^b^.
# Broadband (Memory-bound)

...


The processing advantage of the parallel file system largely disappears. And _abe.local_'s performance is only 1% better than cl.xlarge.

...


For memory-intensive application, Amazon EC2 can achieve nearly the same performance as Abe.

...


# Epigenome (CPU-bound)

...


The parallel file system in Abe provides no processsing advantage for Epigenome. The machines with the most cores gave the best performance for CPU-bound application.

...



Figure below shows the processing time for the three workflows

...

Image Removed

Cost

The cost of Amazon EC2 includes:

  • Resource cost: the figure below shows processing cost of three workflows in EC2.

Image Removed

  • Storage Cost: Cost to store VM images in S3 and cost of storing input data in EBS.
    The table summarizes the monthly storage cost

    Application

    Input Volume

    Monthly Storage Cost

    Montage

    4.3 GB

    $0.66

    Broadband

    4.1 GB

    $0.66

    Epigenome

    1.8 GB

    $0.26

  • Transfer cost: AmazonEC2 charges $0.10 per GB for transter into the cloud and $0.17 per GB for transfer out of the cloud.
    The data size and transfer costs are summarized in the tables below.
    Data transfer size per workflow on Amazon EC2

    Application

    Input

    Output

    Logs

    Montage

    4,291 MB

    7,970 MB

    40 MB

    Broadband

    4,109 MB

    159 MB

    5.5 MB

    Epigenome

    1,843 MB

    299 MB

    3.3 MB

    Costs of transferring data into and out the EC2 cloud

    Application

    Input

    Output

    Logs

    Total

    Montage

    $0.42

    $1.32

    $<0.01

    $1.75

    Broadband

    $0.40

    $0.03

    $<0.01

    $0.43

    Epigenome

    $0.18

    $0.05

    $<0.01

    $0.23

  • Cost effectiveness study
    Cost calculations based on processing reqeusts for 36,000 mosaic of 2MASS images (Total size 10TB) of size 4 sq deg over a period of three years (typical workload for image mosaic service).
    Results show that Amazon EC2 is much less attractive than a local service for I/O-bound application due to the high costs of data storage in Amazon EC2.
    Tables below show the cost of both local and Amazon EC2 service.
    Cost per mosaic of a locally hosted image mosaic service

    Item

    Cost ($)

    12 TB RAID 5 disk farm and enclosure
    (3 yr support)

    12,000

    Dell 2650 Xeon quad-core processor,
    1 TB staging area

    5,000

    Power, cooling and administration

    6,000

    Total 3-year Cost

    23,000

    Cost per mosaic

    0.64

    Cost per mosaic of a mosaic service hosted in the Amazon EC2 cloud

    Item

    Cost ($)

    Network Transfer In

    1000

    Data Storage on Elastic Block Storage

    36,000

    Processor Cost (cl.medium)

    4,500

    I/O operations

    7,000

    Network Transfer Out

    4,200

    Total 3-year Cost

    52,700

    Cost per mosaic

    1.46

Summary
  • For CPU-bound applications, virtualization overhead on Amazon EC2 is generally small.
  • The resources offered by EC2 are generally less powerful than those available in HPC. Particularly for I/O-bound applications.
  • Amazon EC2 offers no cost benefit over locally hosted storage, but does eliminate local maintenance and energy costs, and does offer high-quality, reliable storage.
  • As a result, commercial clouds may not be best suited for large-scale computations c.

Part II - Application to calculation of periodograms

Generation of a science product: an atlas of periodograms for the 210,000 light curves released by the NASA Kepler Mission.

Summary of periodogram calculations on the Amazon EC2 cloud

 

 

Result

Runtimes

Tasks

631,992

 

Mean Task Runtime

6.34 sec

 

Jobs

25,401

 

Mean Job Runtime

2.62 min

 

Total CPU Time

1,113 hr

 

Total Wall Time

26.8 hr

Inputs

Input Files

210,664

 

Mean Input Size

0.084 MB

 

Total Input Size

17.3 GB

Outputs

Output Files

1,263,984

 

Mean Output Size

0.124 MB

 

Total Output Size

76.52 GB

Cost

Compute Cost

$291.58

 

Transfer Cost

$11.48

 

Total Cost

$303.06

References

  1. Berriman, G.B. et al. Sixth IEEE International Conference on e-Science, 1-7 (2010)
  2. Berriman, G.B. et al. SPIE Conference 7740: Software and Cyberinfrastructure for Astronomy (2010)
  3. Juve, G. et al. Cloud Computing Workshop in Conjunction with e-Science Oxford, UK: IEEE (2009)

Notes and other links

...

.


!proctime.png!


h5. Cost

The cost of Amazon EC2 includes:
* Resource cost: the figure below shows processing cost of three workflows in EC2.

!cost.png!

* Storage Cost: Cost to store VM images in S3 and cost of storing input data in EBS.
The table summarizes the monthly storage cost
|| Application || Input Volume || Monthly Storage Cost ||
| Montage | 4.3 GB | $0.66 |
| Broadband | 4.1 GB | $0.66 |
| Epigenome | 1.8 GB | $0.26 |

* Transfer cost: AmazonEC2 charges $0.10 per GB for transter into the cloud and $0.17 per GB for transfer out of the cloud.
The data size and transfer costs are summarized in the tables below.
Data transfer size per workflow on Amazon EC2
|| Application || Input || Output || Logs ||
| Montage | 4,291 MB | 7,970 MB | 40 MB |
| Broadband | 4,109 MB | 159 MB | 5.5 MB |
| Epigenome | 1,843 MB | 299 MB | 3.3 MB |
Costs of transferring data into and out the EC2 cloud
|| Application || Input || Output || Logs || Total ||
| Montage | $0.42 | $1.32 | $<0.01 | $1.75 |
| Broadband | $0.40 | $0.03 | $<0.01 | $0.43 |
| Epigenome | $0.18 | $0.05 | $<0.01 | $0.23 |

* Cost effectiveness study
Cost calculations based on processing reqeusts for 36,000 mosaic of 2MASS images (Total size 10TB) of size 4 sq deg over a period of three years (typical workload for image mosaic service).
Results show that Amazon EC2 is much less attractive than a local service for I/O-bound application due to the high costs of data storage in Amazon EC2.
Tables below show the cost of both local and Amazon EC2 service.
Cost per mosaic of a locally hosted image mosaic service
|| Item || Cost ($) ||
| 12 TB RAID 5 disk farm and enclosure \\
(3 yr support) | 12,000 |
| Dell 2650 Xeon quad-core processor, \\
1 TB staging area | 5,000 |
| Power, cooling and administration | 6,000 |
| Total 3-year Cost | 23,000 |
| _Cost per mosaic_ | 0.64 |
Cost per mosaic of a mosaic service hosted in the Amazon EC2 cloud
|| Item || Cost ($) ||
| Network Transfer In | 1000 |
| Data Storage on Elastic Block Storage | 36,000 |
| Processor Cost (cl.medium) | 4,500 |
| I/O operations | 7,000 |
| Network Transfer Out | 4,200 |
| Total 3-year Cost | 52,700 |
| _Cost per mosaic_ | 1.46 |

h5. Summary

* For CPU-bound applications, virtualization overhead on Amazon EC2 is generally small.
* The resources offered by EC2 are generally less powerful than those available in HPC. Particularly for I/O-bound applications.
* Amazon EC2 offers no cost benefit over locally hosted storage, but does eliminate local maintenance and energy costs, and does offer high-quality, reliable storage.
* As a result, commercial clouds may not be best suited for large-scale computations ^c^.

h3. Part II - Application to calculation of periodograms

Generation of a science product: an atlas of periodograms for the 210,000 light curves released by the NASA Kepler Mission.

Summary of periodogram calculations on the Amazon EC2 cloud
|| || Result ||
|| Runtimes | {csv:output=wiki|heading=0} Tasks, 631992
Mean Task Runtime, 6.34 sec
Jobs, 25401
Mean Job Runtime, 2.62 min
{csv} |



|| || || Result ||
|| Runtimes | Tasks | 631,992 ||
|| | Mean Task Runtime | 6.34 sec ||
|| | Jobs | 25,401 ||
|| | Mean Job Runtime | 2.62 min ||
|| | Total CPU Time | 1,113 hr ||
|| | Total Wall Time | 26.8 hr ||
|| Inputs | Input Files | 210,664 ||
|| | Mean Input Size | 0.084 MB ||
|| | Total Input Size | 17.3 GB ||
|| Outputs | Output Files | 1,263,984 ||
|| | Mean Output Size | 0.124 MB ||
|| | Total Output Size | 76.52 GB ||
|| Cost | Compute Cost | $291.58 ||
|| | Transfer Cost | $11.48 ||
|| || Total Cost || $303.06 ||

h1. References

# [Berriman, G.B. _et al._ _Sixth IEEE International Conference on e-Science_, 1-7 (2010)|http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5693133]
# [Berriman, G.B. _et al._ _SPIE Conference 7740: Software and Cyberinfrastructure for Astronomy_ (2010)|http://authors.library.caltech.edu/23420/]
# [Juve, G. _et al._ _Cloud Computing Workshop in Conjunction with e-Science Oxford, UK: IEEE_ (2009)|http://arxiv.org/abs/1005.2718]

h1. Notes and other links

a. Workflow: loosely coupled parallel applications that consist of a set of computational tasks linked by data\- and control-flow dependencies.
b. A parallel file system and high-speed interconnect would make dramatic performance upgrades. Recently Amazon released a new resource type including a 10Gb interconnect.
c. There is a movement towards providing academic clouds, such as [FutureGrid|http://futuregrid.org] or [Magellan|http://www.nersc.gov/nusers/systems/magellan].