Table of Contents

Overview

...

Summary

The practice of cloud computing in Astronomy area is focused on data processing such as image from telescope (Berriman et al. 2010, Jackson et al. 2010, Berriman et al. 2010(2), Juve et al. 2009), or data sharing (Juve et al. 2010).
The common approach is to implement existing pipeline on to public cloud platform (Berriman et al. 2010, Jackson et al. 2010, Berriman et al. 2010(2), Hoffa et al. 2008).

Workflow

Eucalyptus is used to allocate resources and start virtual machines (VMs) (Vockler et al. 2011).
Hadoop MapReduce is a useful tool for parallel computing applications (Wiley et al. 2011).

Data

The data throughput for Astronomy applications is usually very big. For example, astronomical surveys of the sky generates tens of terabytes of images and detect hundreds of millions of sources every night (Wiley et al. 2011).
With cloud computing, the data processing time can be reduced. For example, 20TB data can be processed in about ~7hrs with 80-core Amazon EC2 instance (Jackson et al. 2010).

Cloud platform

Amazon EC2 is popular (Berriman et al. 2010, Jackson et al. 2010, Berriman et al. 2010(2), Juve et al. (2009), Juve et al. (2010), Vockler et al. (2011)) since it is convenient to implement existing techniques on the Amazon cloud due to the IaaS property of AWS.
Community clouds are also used because the cost effective property (comparing to commercial clouds). For example, Nimbus (Hoffa et al. 2008), FutureGrid (Vockler et al. 2011), and Magellan (Vockler et al. 2011) are used.
Since Astronomy research is usually conducted by national organization, sometimes they will build their own cloud platform, such as CANFAR (Gaudet et al. 2010).

Issues/Gaps

Data transfer (Vockler et al. 2011, Berriman et al. 2010)
Cost of transferring and storage of huge input/output data on commercial cloud service (Berriman et al. 2010).
Cost of S3 is at a disadvantage for workflows with many files since Amazon charges a fee per S3 transaction (Juve et al. 2010).
Need to replicate HPC cluster environment in cloud or the application must be modified (Jackson et al. 2010).
Cloud performs poorly on workflows with a large number of small files (Juve, et al. 2010).

A study of cost and performance of the application of cloud computing to Astronomy ¹

The performance of three workflow applications with different I/O, memory and CPU requirements are investigated on Amazon EC2 and the performance of cloud are compared with that of a typical HPC (Abe in NCSA).
The goal is to determine which type of scientific workflow applications are cheaply and efficiently run on the Amazon EC2 cloud.
Also the application of cloud computing to the generation of an atlas of periodograms for the 210,000 light curves is described.

Part I - Performance of three workflow applications

...

Summary

For CPU-bound applications, virtualization overhead on Amazon EC2 is generally small.
The resources offered by EC2 are generally less powerful than those available in HPC. Particularly for I/O-bound applications.
Amazon EC2 offers no cost benefit over locally hosted storage, but does eliminate local maintenance and energy costs, and does offer high-quality, reliable storage.
As a result, commercial clouds may not be best suited for large-scale computations ^c.

Cloud platform

Cloud platform: Amazon EC2 (http://aws.amazon.com/ec2/) of the processing resources on Amazon EC2 and the Abe high-performance cluster
Type
Architecture
CPU
Cores
Memory
Network
Storage
Price
Amazon EC2
ml.small
32-bit
2.0-2.6 GHz Opteron
1-2
1.7 GB
1 Gbps Ethernet
Local
$0.10/hr
ml.large
64-bit
2.0-2.6 GHz Opteron
2
7.5 GB
1 Gbps Ethernet
Local
$0.40/hr
ml.xlarge
64-bit
2.0-2.6 GHz Opteron
4
15 GB
1 Gbps Ethernet
Local
$0.80/hr
cl.medium
32-bit
2.33-2.66 GHz Xeon
2
1.7 GB
1 Gbps Ethernet
Local
$0.20/hr
cl.xlarge
64-bit
2.0-2.66 GHz Xeon
8
7.5 GB
1 Gbps Ethernet
Local
$0.80/hr
Abe Cluster
abe.local
64-bit
2.33 GHz Xeon
8
8 GB
10 Gbps InfiniBand
Local
N/A
abe.lustre
64-bit
2.33 GHz Xeon
8
8 GB
10 Gbps InfiniBand
Lustre ^TM
N/A
Workflow ^a applications
Three different workflow applications are chosen.
- Montage (http://montage.ipac.caltech.edu) from astronomy: a toolkit for aggregating astronomical images in Flexible Image Transport System (FITS) format into mosaic
  The workflow contained 10,429 tasks, read 4.2 GB of input data, and produced 7.9 GB of output data.
  Montage is considered I/O-bound because it spends more than 95% of its time waiting on I/O operations.
- Broadband (http://scec.usc.edu/research/cme) from seismology: generates and compares intensity measures of seismograms from several high- and low-frequency earthquake simulation codes
  The workflow contained 320 tasks, read 6 GB of input data, and produced 160 MB of output data.
  Broadband is considered memory Memory-limited because more than 75% of its runtime is consumed by tasks requiring more than 1 GB of physical memory.
- Epigenome (http://epigenome.usc.edu) from biochemistry: maps short DNA segments collected using high-throughput gene sequencing machines to a previously constructed reference genome
  The workflow contained 81 tasks, read 1.8 GB of input data, and produced 300 MB of output data.
  Epigenome is considered CPU-bound because it spends 99% of its runtime in the CPU and only 1% on I/O and other activities.Summary of resource use by the workflow applications
  Application
  I/O
  Memory
  CPU
  Montage
  High
  Low
  Low
  Broadband
  Medium
  High
  Medium
  Epigenome
  Low
  Medium
  High
Methods
The experiments were all run on single nodes to provide an unbiased comparison of the performance of workflows on Amazon EC2 and Abe.
For experiments on EC2: Executables were pre-installed in a Virtual Machine image which is deployed on the node.
Input data was stored in the Amazon EBS.
Output, intermediate files and the application executables were stored on local disks.
All jobs were managed and executed through a job submission host at the Information Sciences Institute (ISI) using the Pegasus Workflow Management System (Pegasus WMS) including Pegasus and Condor.

Cloud performance

Montage (I/O-bound)
The processing times on abe.lustre are nearly three times faster than the fastest EC2 machines ^b.
Broadband (Memory-bound)
The processing advantage of the parallel file system largely disappears. And abe.local's performance is only 1% better than cl.xlarge.
For memory-intensive application, Amazon EC2 can achieve nearly the same performance as Abe.
Epigenome (CPU-bound)
The parallel file system in Abe provides no processsing advantage for Epigenome. The machines with the most cores gave the best performance for CPU-bound application.

Figure below shows the processing time for the three workflows.

Image Removed

Cost

The cost of Amazon EC2 includes:

Resource cost: the figure below shows processing cost of three workflows in EC2.

Image Removed

Issues/Gaps

Storage Cost: Cost to store VM images in S3 and cost of storing input data in EBS.
The table summarizes the monthly storage cost
Application
Input Volume
Monthly Storage Cost
Montage
4.3 GB
$0.66
Broadband
4.1 GB
$0.66
Epigenome
1.8 GB
$0.26
Transfer cost: AmazonEC2 charges $0.10 per GB for transter into the cloud and $0.17 per GB for transfer out of the cloud.
The data size and transfer costs are summarized in the tables below.
Data transfer size per workflow on Amazon EC2
Application
Input
Output
Logs
Montage
4,291 MB
7,970 MB
40 MB
Broadband
4,109 MB
159 MB
5.5 MB
Epigenome
1,843 MB
299 MB
3.3 MB
Costs of transferring data into and out the EC2 cloud
Application
Input
Output
Logs
Total
Montage
$0.42
$1.32
$<0.01
$1.75
Broadband
$0.40
$0.03
$<0.01
$0.43
Epigenome
$0.18
$0.05
$<0.01
$0.23

Cost effectiveness study
Cost calculations based on processing reqeusts for 36,000 mosaic of 2MASS images (Total size 10TB) of size 4 sq deg over a period of three years (typical workload for image mosaic service).
Results show that Amazon EC2 is much less attractive than a local service for I/O-bound application due to the high costs of data storage in Amazon EC2. Tables below show the cost of both local and Amazon EC2 service.
Cost per mosaic of a locally hosted image mosaic service

Item	Cost ($)
12 TB RAID 5 disk farm and enclosure (3 yr support)	12,000
Dell 2650 Xeon quad-core processor, 1 TB staging area	5,000
Power, cooling and administration	6,000
Total 3-year Cost	23,000
Cost per mosaic	0.64

Cost per mosaic of a mosaic service hosted in the Amazon EC2 cloud

Item	Cost ($)
Network Transfer In	1000
Data Storage on Elastic Block Storage	36,000
Processor Cost (cl.medium)	4,500
I/O operations	7,000
Network Transfer Out	4,200
Total 3-year Cost	52,700
Cost per mosaic	1.46

Summary

For CPU-bound applications, virtualization overhead on Amazon EC2 is generally small.
The resources offered by EC2 are generally less powerful than those available in HPC. Particularly for I/O-bound applications.
Amazon EC2 offers no cost benefit over locally hosted storage, but does eliminate local maintenance and energy costs, and does offer high-quality, reliable storage.

Part II - Application to calculation of periodograms

Generation of a science product: an atlas of periodograms for the 210,000 light curves released by the NASA Kepler Mission.

		Result
Runtimes	Tasks	631,992
	Mean Task Runtime	6.34 sec
	Jobs	25,401
	Mean Job Runtime	2.62 min
	Total CPU Time	1,113 hr
	Total Wall Time	26.8 hr
Inputs	Input Files	210,664
	Mean Input Size	0.084 MB
	Total Input Size	17.3 GB
Outputs	Output Files	1,263,984
	Mean Output Size	0.124 MB
	Total Output Size	76.52 GB
Cost	Compute Cost	$291.58
	Transfer Cost	$11.48
	Total Cost	$303.06

Seeking Supernovae in the Clouds: A Performance Study ²

Summary

Nearby Supernova Factory(SNfactory) experiment measures the expansion history of the Universe to explore the nature of Dark Energy with Type Ia supernovae. SNfactory is a pipeline of serial processes executing various image processing algorithms in parallel on ~10TBs of data. SNfactory is ported to Amazon Web Services environment.

Cloud platform

Cloud Platform: Amazon Web Services
- EC2 32-bit highCPU medium instances (c1.mediu: 2 virtual cores, 2.5 ECU each)
- 80-core runs were used.
Design: Port the environment into EC2 first, then decide the location of data and the size of compute resource.
Setup virtual cluster in EC2. Create EBS volume for shared file system.
Data size:
- Raw data: 10TB
- Processed data: 20TB

Cloud performance

EBS vs S3
- In the 80-core experiment, a run of processing took ~7 hours for EBS variants and only 3 hours for S3.
- Output data loading time into S3 is an order of magnitude smaller than into EBS.
- Cost: data transfers between EC2 and S3 are free ^d. S3 storage is better than EBS for SNfactory.
- Input data and application data will be stored in EBS and output data will be writen to S3.

Issues/Gaps

Need to replicate HPC cluster environment in EC2 or the application must be modified.
Mean rate of failure is higher in EC2 than in traditional cluster environments which needs to be handled.
Inability to acquire all of the VMI requested because insufficient resources are available, so need to modify the application to adapt this.
Transient errors.

Application of Cloud computing to the creation of image mosaic and management of their provenance ³

Summary

Workflow

Data

Cloud platform

Cloud performance

Issues/Gaps

Scientific workflow applications on Amazon EC2 ⁴

Summary

Workflow

Data

Cloud platform

Cloud performance

Issues/Gaps

Data Sharing Options for Scientific Workflows on Amazon EC2 ⁵

Summary

Choice of storage system has a significant impact on workflow runtime
Investigated data management options in the cloud for workflow applications

Workflow

Montage: high I/O, low Memory, low CPU
Broadband: medium I/O, high memory, medium CPU
Epigenome: low I/O, medium memory, high CPU

Data

Cloud platform

Comparison:

Amazon EC2/S3
NFS
GlusterFS
PVFS

Cloud performance

S3 produces good performance for one application due to the use of caching in the implementation of the S3 client
S3 performs poorly on workflows with a large number of small files
Cost of S3 is at a disadvantage for workflows with many files, because Amazon charges a fee per S3 transaction

Issues/Gaps

Using MapReduce for Image Coaddition ⁶

Summary

The paper presents implementation and evaluation of image coaddition within the MapReduce data-processing framework using Hadoop.

Workflow

Data

Processed dataset containing 100,000 individual FITS files

Cloud platform

Hadoop on cluster

Cloud performance

Process 100,000 files (300 million pixels) in three minutes on a 400-node cluster

Issues/Gaps

CANFAR: Canadian Advanced Network for Astronomical Research ⁷

Summary

The Canadian Advanced Network For Astronomical Research (CANFAR) is a project that is delivering a network-enabled platform for the accessing, processing, storage, analysis, and distribution of very large astronomical datasets

Workflow

Data

Cloud platform

Comparison of processing models

	Grid	Cloud	CANFAR
Ample CPU Cycles
Job Scheduling
User customized environment
Resource Sharing
Portability of environment

Cloud performance

Issues/Gaps

A Multi-Dimensional Classification model for Scientifc workflow Characteristics ⁸

Summary

A multi-dimensional classification model is presented with workflow examples.

Workflow

Astronomy workflow:
- Pan-STARRS's (Panoramic Survey Telescope And Rapid Response System) project is a continuous survey of the entire sky
  - PSLoad workflow stages incoming data files from the telescope pipeline and loads them into individual relational databases each night
  - PSMerge workflow: Each week, the production databases that astronomers query are updated with the new data staged during the week

Data

Cloud platform

Cloud performance

Issues/Gaps

Trident Scientific Workflow Workbench for Data Management in the cloud ⁹

Summary

Workflow

Data

Cloud platform

Cloud performance

Issues/Gaps

On the use of cloud computing for scientific workflows ¹⁰

Summary

Montage is a widely used astronomy application with short job runtimes.
The virtual environment can provide good compute time performance but it can suffer from resource scheduling delays and wide-area communications.

Workflow

Montage

Data

Cloud platform

University of Chicago's 16-node TeraPort cluster with Nimbus science cloud
Globus

Cloud performance

Issues/Gaps

Large overheads of jobs waiting in the Condor and resource queues
May use clustering techniques to reduce the scheduling overheads

Experiences using cloud computing for a scientific workflow application ¹¹

Summary

An application for processing astronomy data released by the NASA Kepler project which is to search for Earth-like planets orbiting other stars.

Workflow

The workflow is deployed across multiple clouds using the Pegasus Workflow Management System
Allocate 6 nodes with 8 cores each in all cases

Data

Cloud platform

Comparison:

Cloud performance

Allocate 6 nodes with 8 cores each in all cases
Runtime is longer on EC2 due to: 1. A lower CPU speed, and 2. Poor WAN performance.

Issues/Gaps

Better utilization of remote resources
Different clustering strategies: explore the benefits of different task cluster sizes
Submit host management
Alternative data staging mechanisms, explore different protocols, and storage solutions

References

Notes and other links

a. Workflow: loosely coupled parallel applications that consist of a set of computational tasks linked by data- and control-flow dependencies.
b. A parallel file system and high-speed interconnect would make dramatic performance upgrades. Recently Amazon released a new resource type including a 10Gb interconnect.
c. There is a movement towards providing academic clouds, such as FutureGrid or Magellan.
d. Only true for intra-zone transfer (before July 1st, 2011). Also the request for data transfer is not free.

Application	I/O	Memory	CPU
Montage	High	Low	Low
Broadband	Medium	High	Medium
Epigenome	Low	Medium	High

Application	Input	Output	Logs
Montage	4,291 MB	7,970 MB	40 MB
Broadband	4,109 MB	159 MB	5.5 MB
Epigenome	1,843 MB	299 MB	3.3 MB

Application	Input	Output	Logs	Total
Montage	$0.42	$1.32	$<0.01	$1.75
Broadband	$0.40	$0.03	$<0.01	$0.43
Epigenome	$0.18	$0.05	$<0.01	$0.23

Application	Input Volume	Monthly Storage Cost
Montage	4.3 GB	$0.66
Broadband	4.1 GB	$0.66
Epigenome	1.8 GB	$0.26

Child pages

Page History

Versions Compared

Old Version 17

New Version Current

Key

Overview

Summary

Workflow

Data

Cloud platform

Issues/Gaps

A study of cost and performance of the application of cloud computing to Astronomy 1

Part I - Performance of three workflow applications

...

Summary

Cloud platform

Cloud performance

Cost

Issues/Gaps

Summary

Part II - Application to calculation of periodograms

Part II - Application to calculation of periodograms

Seeking Supernovae in the Clouds: A Performance Study 2

Summary

Cloud platform

Cloud performance

Issues/Gaps

Application of Cloud computing to the creation of image mosaic and management of their provenance 3

Summary

Workflow

Data

Cloud platform

Cloud performance

Issues/Gaps

Scientific workflow applications on Amazon EC2 4

Summary

Workflow

Data

Cloud platform

Cloud performance

Issues/Gaps

Data Sharing Options for Scientific Workflows on Amazon EC2 5

Summary

Workflow

Data

Cloud platform

Cloud performance

Issues/Gaps

Using MapReduce for Image Coaddition 6

Summary

Workflow

Data

Cloud platform

Cloud performance

Issues/Gaps

CANFAR: Canadian Advanced Network for Astronomical Research 7

Summary

Workflow

Data

Cloud platform

Cloud performance

Issues/Gaps

A Multi-Dimensional Classification model for Scientifc workflow Characteristics 8

Summary

Workflow

Data

Cloud platform

Cloud performance

Issues/Gaps

Trident Scientific Workflow Workbench for Data Management in the cloud 9

Summary

Workflow

Data

Cloud platform

Cloud performance

Issues/Gaps

On the use of cloud computing for scientific workflows 10

Summary

Workflow

A study of cost and performance of the application of cloud computing to Astronomy ¹

Seeking Supernovae in the Clouds: A Performance Study ²

Application of Cloud computing to the creation of image mosaic and management of their provenance ³

Scientific workflow applications on Amazon EC2 ⁴

Data Sharing Options for Scientific Workflows on Amazon EC2 ⁵

Using MapReduce for Image Coaddition ⁶

CANFAR: Canadian Advanced Network for Astronomical Research ⁷

A Multi-Dimensional Classification model for Scientifc workflow Characteristics ⁸

Trident Scientific Workflow Workbench for Data Management in the cloud ⁹

On the use of cloud computing for scientific workflows ¹⁰

Experiences using cloud computing for a scientific workflow application ¹¹