Parallel File System

Abstract

In today's high-end computing (HEC) systems, the parallel file system (PFS) is at the core of the storage infrastructure. PFS deployments are shared by many users and applications, but currently there are no provisions for differentiation of service - data access is provided in a best-effort manner. As systems scale, this limitation can prevent applications from efficiently utilizing the HEC resources while achieving their desired performance and it presents a hurdle to support a large number of data-intensive applications concurrently. This NSF HECURA project tackles the challenges in quality of service (QoS) driven HEC storage management, aiming to support I/O bandwidth guarantees in PFSs.

Intellectual Merit

As the high-end computing system is scaling both in storage capacity and I/O throughput, the generic best-effort I/O scheduling algorithms can prevent applications from achieving their desired performance. The HECURA project manages the data I/O to achieve the QoS for applications. This facilitates the throughput-stringent or latency-stringent applications to utilize existing parallel file systems, and better I/O efficiency will also be achieved. The HECURA project is the first project that focuses on managing storage I/O in parallel file systems based on QoS, and the corresponding I/O scheduling algorithms are original. Our project involves 6 members, and is well organized to finish in 3 years.

Broader Impact

The HECURA project will promote the research of QoS-based I/O management in parallel storage systems. Our plan is that the QoS-based I/O management scheme should be widely adopted in existing parallel storage systems. Eventually, the high-end computing systems that adopt HECURA scheme will provide QoS guaranteed storage I/O, so that the system performance can be promoted. The scheduling evaluation results are helpful for the scientific area to conduct analysis. This project broadens the participation of underrepresented groups, all the 6 members in our project are US aliens, and 1 member is female.

Use of FutureGrid

FutureGrid will be used to host a small/middle size cluster (<100 nodes) with Parallel Virtual File System version2 deployed. In this cluster, scheduling algorithms parallel file system I/O will be tested and evaluated.

Scale Of Use

100 virtual machines, each can be configured with 20GB storage. I will use them for 50 days.

Publications


FG-34
Yonggang Liu
University of Florida
Active

Keywords

Timeline

3 years 34 weeks ago