You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

Overview

Story 1 A study of cost and performance of the application of cloud computing to Astronomy 1

The performance of three workflow applications with different I/O, memory and CPU requirements are investigated on Amazon EC2 and the performance of cloud are compared with that of a typical HPC (Abe in NCSA).
The goal is to determine which type of scientific workflow applications are cheaply and efficiently run on the Amazon EC2 cloud.
Also the application of cloud computing to the generation of an atlas of periodograms for the 210,000 light curves is described.

Part I - Performance of three workflow applications

Tools and methods
  • Cloud platform: Amazon EC2 (http://aws.amazon.com/ec2/)
    Summary of the processing resources on Amazon EC2 and the Abe high-performance cluster

    Type

    Architecture

    CPU

    Cores

    Memory

    Network

    Storage

    Price

    Amazon EC2

    ml.small

    32-bit

    2.0-2.6 GHz Opteron

    1-2

    1.7 GB

    1 Gbps Ethernet

    Local

    $0.10/hr

    ml.large

    64-bit

    2.0-2.6 GHz Opteron

    2

    7.5 GB

    1 Gbps Ethernet

    Local

    $0.40/hr

    ml.xlarge

    64-bit

    2.0-2.6 GHz Opteron

    4

    15 GB

    1 Gbps Ethernet

    Local

    $0.80/hr

    cl.medium

    32-bit

    2.33-2.66 GHz Xeon

    2

    1.7 GB

    1 Gbps Ethernet

    Local

    $0.20/hr

    cl.xlarge

    64-bit

    2.0-2.66 GHz Xeon

    8

    7.5 GB

    1 Gbps Ethernet

    Local

    $0.80/hr

    Abe Cluster

    abe.local

    64-bit

    2.33 GHz Xeon

    8

    8 GB

    10 Gbps InfiniBand

    Local

    N/A

    abe.lustre

    64-bit

    2.33 GHz Xeon

    8

    8 GB

    10 Gbps InfiniBand

    Lustre TM

    N/A

  • Workflow a applications
    Three different workflow applications are chosen.
    • Montage (http://montage.ipac.caltech.edu) from astronomy: a toolkit for aggregating astronomical images in Flexible Image Transport System (FITS) format into mosaic
      The workflow contained 10,429 tasks, read 4.2 GB of input data, and produced 7.9 GB of output data.
      Montage is considered I/O-bound because it spends more than 95% of its time waiting on I/O operations.
    • Broadband (http://scec.usc.edu/research/cme) from seismology: generates and compares intensity measures of seismograms from several high- and low-frequency earthquake simulation codes
      The workflow contained 320 tasks, read 6 GB of input data, and produced 160 MB of output data.
      Broadband is considered memory-limited because more than 75% of its runtime is consumed by tasks requiring more than 1 GB of physical memory.
    • Epigenome (http://epigenome.usc.edu) from biochemistry: maps short DNA segments collected using high-throughput gene sequencing machines to a previously constructed reference genome
      The workflow contained 81 tasks, read 1.8 GB of input data, and produced 300 MB of output data.
      Epigenome is considered CPU-bound because it spends 99% of its runtime in the CPU and only 1% on I/O and other activities.
    • Summary of resource use by the workflow applications

      Application

      I/O

      Memory

      CPU

      Montage

      High

      Low

      Low

      Broadband

      Medium

      High

      Medium

      Epigenome

      Low

      Medium

      High

  • Methods
Cloud performance
Summary

Part II - Application to calculation of periodograms

References

  1. Berriman, G.B. et al. Sixth IEEE International Conference on e-Science, 1-7 (2010)
  2. Berriman, G.B. et al. SPIE Conference 7740: Software and Cyberinfrastructure for Astronomy (2010)
  3. Juve, G. et al. Cloud Computing Workshop in Conjunction with e-Science Oxford, UK: IEEE (2009)

Notes and other links

a. Workflow: loosely coupled parallel applications that consist of a set of computational tasks linked by data- and control-flow dependencies.

  • No labels