SSD performance benchmarking

Project Information

Discipline
Computer Science (401) 
Orientation
Research 
Abstract

Solid-state drives (SSDs) are becoming cheaper and more common in Data Centers and we believe that this trend will continue to grow.  Current 6 GB/s SATA III NAND-based SSDs are delivering improved random I/O performance compared to traditional hard-disk drives. The goal of this project is to understand how big data technologies can benefit from SSDs. As the first effort, we are benchmarking Apache Hadoop. A key component of Apache Hadoop is the Hadoop Distributed File System (HDFS), a distributed file system that provides high-throughput access to application data. We will first compare HDFS I/O throughput (Mbps) for SSDs and HDDs. Next, we will investigate the impact of SSDs on virtualization.

Intellectual Merit

This project will help us understand how big data technologies and virtualization can benefit from SSDs.

Broader Impacts

Solid-state (Flash) drives are becoming cheaper and more common in data centers and we believe that this trend will continue to grow. By 2020, the quantity of electronically stored data will reach 35 trillion gigabytes. Quantifying the impact of SSDs on virtualization and big data technologies will help us to improve the performance and energy-efficiency of data centers.

Project Contact

Project Lead
Sameer Tilak (sameer) 
Project Manager
Sameer Tilak (sameer) 

Resource Requirements

Hardware System
  • sierra (IBM iDataPlex at SDSC)
 
Use of FutureGrid

Lima is a FutureGrid cluster at SDSC that consists of 8 nodes equipped with 480 GB SSD SATA drives. We will use Lima for conducting this research.

Scale of Use

Lima cluster for next few months.

Project Timeline

Submitted
04/04/2013 - 16:38