Streaming in the Clouds

Abstract

In the recent years BigData has become an important aspect of scientific discoveries - a process referred to as the Forth Paradigm. From the wide spectrum of applications and acquisitions methods, the ones that will generate the biggest amounts of data fall in the category of streaming data, i.e., networks of sensors, observatories, telescopes or experiments such as CERN LHC. As the amount of acquired information grows and the location of data sources are increasingly geographically distributed, it becomes important to process the data in scalable and efficient ways. Cloud computing presents an interesting option for a scalable processing platform. However, the question arises how to best use cloud computing capabilities for geographically distributed stream processing. In this work, we explore and analyze different approaches to streaming data to the cloud and evaluate them in the context of multiple cloud offerings including Microsoft Azure, and and FutureGrid's Nimbus and OpenStack installations. We show, using an ATLAS application, that using the right approach to streaming data can improve the average data rates three times. 


Intellectual Merit

The project goal is to understand how streaming is supported by cloud environments. This is a key aspect for the future, as the nature of BigData in the future is expected to be of stream data.

Broader Impact

The results and observations can be used by all scientific researchers that will have to analyze such data (i.e. stream data). The observations and discovery will allow them to optimally scale and adjust their experiment configuration in order to process all the amounts of data they need.

Use of FutureGrid

FutureGrid is used for running the Virtual machines in which the stream processing will be performed. The purpose is to understand how such data can be processed in cloud environments.

Scale Of Use

The number of VMs used are in the order of tens up to hundred. As the goal is to understand how BigData streaming is supported at large scale, scalability in terms of number of nodes/ VMs is important.

Publications


FG-361
Radu Tudoran
Pierre Riteau
INRIA Rennes
Active

Project Members

Pierre Riteau
Radu Tudoran
Sergey Panitkin

Timeline

34 weeks 3 hours ago