Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
h1. Survey White Paper Outline


h2. Part 1. Introduction of Cloud Computing Technology


h2. Part 2. Science Stories and Requirements for the Cloud

There are a lot of practices of implementing scientific applications on cloud computing resources such as biology/bioinformatics ([Stein 2010|http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2898083/?report=abstract], [Schatz _et al._ 2010|http://www.nature.com/nbt/journal/v28/n7/full/nbt0710-691.html]), Geospatial Information System ([Yang _et al._ 2011|http://cisc.gmu.edu/scc/readings/spatial_cloud_computing.pdf]), Astronomy, and Environmental Science.

Due to different requirements in each science area, the focuses of cloud computing applications are various. In Biology/Bioinformatics area, many applications such as DNA sequencing require processing of large data throughput (Schatz _et al._ 2010 and Langmead _et al._ 2009). The cloud computing workflow in Geospatial sciences mainly involves data storage and processing ([Cui _et al._ 2010|http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5532992&tag=1], [Huang _et al._ 2010|http://portal.acm.org/citation.cfm?doid=1869692.1869699], [Park _et al._ 2011|http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5746010], [Yang _et al._ 2010|http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5602628&tag=1], [Bunzel _et al._ 2010|http://portal.acm.org/citation.cfm?doid=1823854.1823894]) and simulation and modeling. Also, a main IT challenge in Geospatial sciences is to deal with massive concurrent users access (Huang _et al._ 2010, [Bernstein _et al._ 2010|http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5635224], [Wang _et al._ 2010|http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5489727], [Janakiraman _et al._ 2010|http://portal.acm.org/citation.cfm?doid=1869790.1869813], [Blower _et al._ 2010|http://portal.acm.org/citation.cfm?doid=1823854.1823893]). The practice of cloud computing in Astronomy is focused on data processing such as processing images from telescope (Berriman _et al._ 2010, Jackson _et al._ 2010, Berriman _et al._ 2010(2), Hoffa _et al._ 2008) or data sharing (Juve _et al._ 2010). In Environmental sciences, the practice of implementing cloud computing is focused on modeling such as ocean climate modeling (Evangelinos _et al._ 2008) and groundwater modeling (Hunt _et al._ 2010), cloud computing is also used in data analysis such as parallel sequential data analysis tasks (Hasenkamp _et al._ 2010).

The most common used cloud service model in scientific applications is IaaS. [Amazon Cloud services|CLOUD:Amazon] is the most popular cloud platform in almost all the scientific areas, this is because it is convenient to implement existing techniques on the Amazon cloud. For example, in the Biology/Bioinformatics area, most applications use linux-based system and technologies which can be easily implement on to Amazon EC2 (Gunarathne _et al._ 2010, Qiu _et al._, Langmead _et al._ 2010, Vecchiola _et al._ 2009, Nguyen _et al._ 2011, Afgan _et al._ 2010). Amazon cloud service is also popular in Astronomy (Berriman _et al._ 2010, Jackson _et al._ 2010, Juve _et al._ 2009, Vockler _et al._ 2011), GIS (Huang _et al._ 2010, Janakiraman _et al._ 2010, Bunzel _et al._ 2010), and Environmental sciences (Evangelinos _et al._ 2008, He _et al._ 2010). Other community IaaS cloud platforms are also used because the cost effective property compared to commercial clouds. For example, [FutureGrid|CLOUD:FutureGrid](Qiu _et al._ 2010) and [Magellan|CLOUD:Magellan](Taylor _et al._ 2010) are used in Bioinformatics applications; [Nimbus|CLOUD:Nimbus](Hoffa _et al._ 2008), FutureGrid(Vockler _et al._ 2011), and Magellan(Vockler _et al._ 2011) are used in Astronomy; [GoGrid|CLOUD:GoGrid](Hunt _et al._ 2010, He _et al._ 2010); [OpenNebula|CLOUD:OpenNebula](Park _et al._ 2011) is used in GIS application; FutureGrid with Nimbus and [Eucalyptus|CLOUD:Eucalyptus](Fox _et al._ 2011) and Magellan with Eucalyptus(Hasenkamp _et al._ 2010) are used in Environmental sciences applications.

PaaS are also used in scientific areas. For example, [Microsoft Azure|CLOUD:Microsoft Azure] is used in Biology/Bioinformatics applications(Qiu _et al._ 2009, Qiu _et al._ 2010, Lu _et al._ 2010) and Astronomy([Eye on Earth project|http://www.eyeonearth.eu/]). Google App Engine is used in GIS area(Blower _et al._ 2010). Scientific researchers choose PaaS platform because some technologies they need is constructed based on specific platform, such as MapReduce implementation Dryad is based on Microsoft platform(Qiu _et al._ 2009).

Table below lists the cloud platforms used in scientific applications.
{table-plus:title=Table 1: Statistics of cloud platforms used in scientific applications (Sample: Astro: 11 papers, Bio: 15 papers, Env: 10 papers, GIS: 11 papers)}
|| || Astronomy || Biology || Environmental || GIS ||
| Amazon | 6 | 9 | 2 | 3 |
| Azure | | 4 | 1 | |
| Google App Engine | | | | 1 |
| FutureGrid | 1 | 1 | 1 | |
| Magellan | 1 | | 1 | |
| GoGrid | | | 2 | |
| Eucalyptus | 1 | | 2 | |
| Nimbus | | | 1 | |
| OpenNebula | | | | 1 |
| IBM Grid | | | 1 | |
{table-plus}

h2. Part 3. Cloud Computing Platforms and Tools


h2. Part 4. Gap Analysis and Known Issues


h2. Part 5. Recommendations

What should NCSA do next step? Should we invest some time on setting up some private cloud? some Cloud tools?