End-to-end Optimization of Data Transfer Throughput over Wide-area High-speed Networks

Project Information

Discipline
Computer Science (401) 
Orientation
Research 
Abstract

Data produced by large-scale scientific applications has reached the amount of multiple petabytes/exabytes while in transfer speed we are able to achieve multiple gigabits per second due to the improvement of high-performance optical networking technology, which can support up to 100Gbps today. The transport layer protocols (e.g. TCP), that are highly popular, were not originally designed to cope with the capacity and speed of these types of networks currently available to the scientific community. Many alternative transport layer protocols have been designed to be suitable for high-speed networks, however they failed to replace the existing protocols. Moreover, to get transfer speeds of 100Gbps, the end-system capacities must also be taken into account apart from protocol improvements. The end-systems have evolved from single desktop computers to complex massively parallel multi-node clusters, supercomputers and multi-disk parallel storage systems. Additional level of parallelism by using multiple CPUs and parallel disk access are needed combined with the network protocol optimization to achieve high data transfer throughput. In this project, we develop models and algorithms in the application level that do network and end-system optimization to be able to utilize multiple Gbps bandwidth of high-speed networks by using the existing transport protocols without making any changes to the OS kernel. We claim that users should not have to change their existing protocols and tools to achieve high data transfer speeds. We achieve this in the application level via ‘end-to-end data-flow parallelism ‘in which we use parallel streams and stripes utilizing multiple CPUs and disks. It is very important to be able to utilize the network capacity without affect the existing traffic too much. Our prediction models can detect that level and only use minimal number of end-system resources to achieve that. We use very little information regarding network and end-systems by using previous transfer information or immediate samplings. We keep the overhead of the sampling to minimal through special techniques so that the overall gain in throughput will be much higher than this overhead included. We want to verify the feasibility and accuracy of the proposed prediction models by comparing to actual TCP data transfers over wide area networks. We would like to use several FutureGrid resources over wide area to validate our models.

Intellectual Merit

We will implement application-level models and algorithms to predict the best combination of protocol parameters for optimal end-to-end network throughput (i.e. num- ber of parallel data streams, TCP buffer size, I/O block size); integration of disk and CPU speed parameters into the performance model to predict the optimal number of disk (data striping) and CPU (parallelism) combinations for the best end-to-end performance; and development of an esti- mation service for advanced bandwidth reservations and provisioning. The developed models and algorithms will be implemented as a standalone service as well as being used in interaction with external data scheduling and management tools such as Stork, SRM, and Globus Online.

Broader Impacts

Developed models and algorithms will be made available via a web portal and supported by a proactive training program to ensure impact across all science communi- ties dealing with large amounts of data. We will be collaborating with TeraGrid, OSG, and DOSAR to make the services developed within this proposal available to a wide range of researchers across the nation as well as to the international community.

Project Contact

Project Lead
Tevfik Kosar (tevfikkosar) 
Project Manager
Tevfik Kosar (tevfikkosar) 
Project Members
Esma Yildirim, Engin Arslan, Jangyoung Kim, Ismail Alan, Kemal Guner, MD S Q Zulkar Nine  

Resource Requirements

Hardware Systems
  • alamo (Dell optiplex at TACC)
  • foxtrot (IBM iDataPlex at UF)
  • hotel (IBM iDataPlex at U Chicago)
  • india (IBM iDataPlex at IU)
  • sierra (IBM iDataPlex at SDSC)
 
Use of FutureGrid

We will be running TCP and UDP based transfer protocol services on several FutureGrid nodes and test our protocol optimization models on these services. We are especially interested in wide-area high-bandwith experiments.

Scale of Use

We will need 1-2 nodes on each cluster for 1-2 days per experiment. We expect to perform 3-4 experiments per month.

Project Timeline

Submitted
04/13/2011 - 10:02