Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

Scientific

...

Cloud

...

Computing

...

Survey

...

White

...

Paper

...

Table of Contents

Part 1.

...

Introduction

...

of

...

Cloud

...

Computing

...

Technology

...

In

...

this

...

survey,

...

we

...

use

...

the

...

NIST

...

definition

...

and

...

categorization

...

of

...

the

...

Cloud

...

computing

...

technology.

...

Cloud

...

Computing

...

is

...

a

...

model

...

for

...

enabling

...

convenient,

...

on-demand

...

network

...

access

...

to

...

a

...

shared

...

pool

...

of

...

configurable

...

computing

...

resources

...

that

...

can

...

be

...

rapidly

...

provisioned

...

and

...

released

...

with

...

minimal

...

management

...

effort

...

or

...

service

...

provider

...

interaction.

...

This

...

cloud

...

model

...

promotes

...

availability

...

and

...

is

...

composed

...

of

...

the

...

following

...

essential

...

characteristics:

...

On-demand

...

self-service,

...

Broad

...

network

...

access,

...

Resource

...

pooling,

...

Rapid

...

Elasticity,

...

and

...

Measured

...

Service.

...

The

...

Cloud

...

can

...

be

...

operated

...

as

...

several

...

modes

...

such

...

as

...

PaaS,

...

IaaS,

...

SaaS

...

and

...

be

...

deployed

...

as

...

the

...

following

...

types:

...

  • Private

...

  • Cloud

...

  • Community

...

  • Cloud

...

  • Public

...

  • Cloud

...

  • Hybrid

...

  • Cloud

...

Some

...

of

...

the

...

details

...

of

...

the

...

definition

...

can

...

be

...

found

...

in

...

the

...

wiki

...

under

...

section

...

Cloud

...

Definition:

...

https://wiki.ncsa.illinois.edu/display/CLOUD/Cloud+Definition

...

Part 2. Science Stories and Requirements for the Cloud

There are a lot of practices of implementing scientific applications on cloud computing resources such as biology/bioinformatics (Stein 2010, Schatz et al. 2010), Geospatial Information System (Yang et al. 2011), Astronomy, and Environmental Science.

Due to different requirements in each science area, the focuses of cloud computing applications are various. In Biology/Bioinformatics area, many applications such as DNA sequencing require processing of large data throughput (Schatz et al. 2010 and Langmead et al. 2009). Many opensource projects can be easily implemented in cloud such as Myrna (Langmead et al. 2010), CloudBLAST (Matsunaga et al. 2008), and Galaxy (Afgan et al. 2010). The cloud computing workflow in Geospatial sciences mainly involves data storage and processing (Cui et al. 2010, Huang et al. 2010, Park et al. 2011, Yang et al. 2010, Bunzel et al. 2010) and simulation and modeling. Also, a main IT challenge in Geospatial sciences is to deal with massive concurrent users access (Huang et al. 2010, Bernstein et al. 2010, Wang et al. 2010, Janakiraman et al. 2010, Blower et al. 2010). The practice of cloud computing in Astronomy is focused on data processing such as processing images from telescope (Berriman et al. 2010, Jackson et al. 2010, Berriman et al. 2010(2), Hoffa et al. 2008) or data sharing (Juve et al. 2010). In Environmental sciences, the practice of implementing cloud computing is focused on modeling such as ocean climate modeling (Evangelinos et al. 2008) and groundwater modeling (Hunt et al. 2010), cloud computing is also used in data analysis such as parallel sequential data analysis tasks (Hasenkamp et al. 2010).

The most common used cloud service model in scientific applications is IaaS. Amazon Cloud services is the most popular cloud platform in almost all the scientific areas, this is because it is convenient to implement existing techniques on the Amazon cloud. For example, in the Biology/Bioinformatics area, most applications use linux-based system and technologies which can be easily implement on to Amazon EC2 (Gunarathne et al. 2010, Qiu et al., Langmead et al. 2010, Vecchiola et al. 2009, Nguyen et al. 2011, Afgan et al. 2010). Amazon cloud service is also popular in Astronomy (Berriman et al. 2010, Jackson et al. 2010, Juve et al. 2009, Vockler et al. 2011), GIS (Huang et al. 2010, Janakiraman et al. 2010, Bunzel et al. 2010), and Environmental sciences (Evangelinos et al. 2008, He et al. 2010). Other community IaaS cloud platforms are also used because the cost effective property compared to commercial clouds. For example, FutureGrid(Qiu et al. 2010) and Magellan(Taylor et al. 2010) are used in Bioinformatics applications; Nimbus(Hoffa et al. 2008), FutureGrid(Vockler et al. 2011), and Magellan(Vockler et al. 2011) are used in Astronomy; GoGrid(Hunt et al. 2010, He et al. 2010); OpenNebula(Park et al. 2011) is used in GIS application; FutureGrid with Nimbus and Eucalyptus(Fox et al. 2011) and Magellan with Eucalyptus(Hasenkamp et al. 2010) are used in Environmental sciences applications.

PaaS are also used in scientific areas. For example, Microsoft Azure is used in Biology/Bioinformatics applications(Qiu et al. 2009, Qiu et al. 2010, Lu et al. 2010), groundwater risk analysis (Liu et al. 2010, 2011 ) and Astronomy(Eye on Earth project). Google App Engine is used in GIS area(Blower et al. 2010). Scientific researchers choose PaaS platform because some technologies they need is constructed based on specific platform, such as MapReduce implementation Dryad is based on Microsoft platform(Qiu et al. 2009).

Table below lists the cloud platforms used in scientific applications.

Wiki Markup
{table-plus:title=Table 1: Statistics of cloud platforms used in scientific applications (Sample: Astro: 11 papers, Bio: 15 papers, Env: 10 papers, GIS: 11 papers)}
|| || Astronomy || Biology || Environmental || GIS ||
| Amazon | 6 | 9 | 2 | 3 |
| Azure | | 4 | 1 | |
| Google App Engine | | | | 1 |
| FutureGrid | 1 | 1 | 1 | |
| Magellan | 1 | | 1 | |
| GoGrid | | | 2 | |
| Eucalyptus | 1 | | 2 | |
| Nimbus | | | 1 | |
| OpenNebula | | | | 1 |
| IBM Grid | | | 1 | |
{table-plus}

...

A

...

lot

...

of

...

scientific

...

computing

...

applications

...

involve

...

parallel

...

computing

...

algorithms.

...

As

...

a

...

framework

...

to

...

support

...

distributed

...

computing

...

on

...

large

...

data

...

sets

...

on

...

clusters,

...

MapReduce

...

is

...

popular

...

in

...

scientific

...

applications,

...

such

...

as

...

in

...

the

...

field

...

of

...

sequencing

...

analysis

...

in

...

Biology/Bioinformatics

...

(Langmead

...

et

...

al.

...

2009,

...

Gunarathne

...

et

...

al.

...

2010).

...

Hadoop

...

is

...

a

...

free

...

opensource

...

implementation

...

of

...

MapReduce,

...

and

...

it

...

is

...

commonly

...

used

...

in

...

science

...

areas

...

such

...

as

...

Biology/Bioinformatics

...

and

...

Astronomy

...

(Wiley

...

et

...

al.

...

2011).

...

Other

...

MapReduce

...

implementations

...

and

...

extensions

...

are

...

also

...

used

...

such

...

as

...

Microsoft

...

Dryad

...

(Qiu

...

et

...

al.

...

2009,

...

Lu

...

et

...

al.

...

2010)

...

and

...

Twister

...

(Qiu

...

et

...

al.

...

2010).

...

The

...

practices

...

and

...

experiments

...

of

...

scientific

...

computing

...

applications

...

in

...

cloud

...

demonstrate

...

many

...

advantages

...

of

...

cloud

...

computing

...

such

...

as

...

improved

...

data

...

processing

...

time

...

and

...

reduced

...

cost.

...

For

...

instance,

...

in

...

the

...

Crossbow

...

project,

...

a

...

human

...

sample

...

comprising

...

2.7

...

billion

...

reads

...

can

...

be

...

genotyped

...

by

...

crossbow

...

in

...

about

...

4

...

hours

...

including

...

data

...

uploading

...

time

...

in

...

Amazon

...

Cloud

...

and

...

the

...

cost

...

is

...

about

...

$85

...

(Langmead

...

et

...

al.

...

2009).

...

And

...

in

...

the

...

work

...

of

...

Schadt

...

et

...

al.

...

,

...

1

...

PB

...

of

...

data

...

can

...

be

...

traversed

...

on

...

a

...

1,000

...

node

...

instance

...

on

...

Amazon

...

EC2

...

within

...

~350

...

minutes

...

and

...

cost

...

about

...

$2,040

...

(Schadt,

...

et

...

al.

...

2011).

...

The

...

data

...

throughput

...

for

...

Astronomy

...

applications

...

is

...

usually

...

very

...

big.

...

For

...

example,

...

astronomical

...

surveys

...

of

...

the

...

sky

...

generates

...

tens

...

of

...

terabytes

...

of

...

image

...

data

...

and

...

detect

...

hundreds

...

of

...

millions

...

of

...

sources

...

every

...

night

...

(Wiley

...

et

...

al.

...

2011).

...

With

...

cloud

...

computing,

...

the

...

data

...

processing

...

time

...

can

...

be

...

reduced.

...

In

...

the

...

experiment

...

of

...

Jackson

...

et

...

al.

...

,

...

20TB

...

data

...

can

...

be

...

processed

...

in

...

about

...

~7

...

hours

...

with

...

80-core

...

Amazon

...

EC2

...

instance

...

(Jackson

...

et

...

al.

...

2010).

...

Part

...

3.

...

Cloud

...

Computing

...

Platforms

...

and

...

Tools

...

We

...

survey

...

several

...

popular

...

Cloud

...

computing

...

platforms

...

and

...

tools.

...

These

...

include

...

the

...

following:

...

1.

...

Cloud

...

Services

...

Cloud

...

Services

1.1

...

Community

...

Clouds

...

Community

...

Clouds

...


FutureGrid

...

FutureGrid

...


Magellan

...

Magellan

...


Science

...

Clouds

...

Science

...

Clouds

...

1.2

...

Public

...

Clouds

...

Public

...

Clouds

...


Amazon

...

Amazon

...


AT&T

...

Synaptic

...

AT&T

...

Synaptic

...


GoGrid

...

GoGrid

...


Google

...

App

...

Engine

...

Google

...

App

...

Engine

...


Microsoft

...

Azure

...

Microsoft

...

Azure

...


Rackspace

...

Rackspace

...

Joyent:

...

http://www.joyent.com/

...

2.

...

Cloud

...

Software

...

Cloud

...

Software

2.1

...

Cloud

...

Applications

...

Cloud

...

Applications
Hadoop Distributed File System Hadoop Distributed File System
Hadoop MapReduce Hadoop MapReduce
Sector & Sphere Sector & Sphere

2.2.

...

Cloud

...

Platforms

...

Cloud

...

Platforms
Eucalyptus Eucalyptus
Nimbus Nimbus
OpenNebula OpenNebula
OpenStack OpenStack
WSO2 Stratos WSO2 Stratos

2.3 Multi-Cloud API Multi-Cloud API
Apache Deltacloud Apache Deltacloud
Apache LibCloud Apache LibCloud
JClouds JClouds
Jets3t Jets3t
SMEStorage SMEStorage
Typica Typica

Part 4. Gap Analysis and Known Issues

There are several issues raised in the practice of scientific cloud computing. The first issue is that the input data (usually with large size) must be deposited in a cloud resource to run a cloud program over the data set. So the compatibility between data-generation and transfer speeds achievable must be assessed (Schatz et al. 2010). Currently one option is to use High-speed internet. Another option is to ship physical hard drives to the cloud vender.

Another issue is the cost. Although with cloud computing we can avoid dealing with cost associated with local equipment maintenance and staffing, the cost model of current cloud service provider is complex for scientific cloud computing users to determine the actual full cost (Truong et al. 2010). For instance, in the work of Juve et al., the cost of Amazon S3 is at a disadvantage for workflows with many files since Amazon charges a fee per S3 transaction (Juve et al. 2010).

Security and privacy is another concern for scientific cloud computing users. In GIS applications, many location-based services involve the location and identity information of users, and location and identity privacies need to be considered (Wang et al. 2010). Also, the geospatial data of a country are usually very sensitive that it will raise concern when data are stored in cloud provided by foreign organization (Yang et al. 2010).