SSD performance benchmarking
Project Information
- Discipline
- Computer Science (401)
- Orientation
- Research
Solid-state drives (SSDs) are becoming cheaper and more common in Data Centers and we believe that this trend will continue to grow. Current 6 GB/s SATA III NAND-based SSDs are delivering improved random I/O performance compared to traditional hard-disk drives. The goal of this project is to understand how big data technologies can benefit from SSDs. As the first effort, we are benchmarking Apache Hadoop. A key component of Apache Hadoop is the Hadoop Distributed File System (HDFS), a distributed file system that provides high-throughput access to application data. We will first compare HDFS I/O throughput (Mbps) for SSDs and HDDs. Next, we will investigate the impact of SSDs on virtualization.
This project will help us understand how big data technologies and virtualization can benefit from SSDs.
Broader ImpactsSolid-state (Flash) drives are becoming cheaper and more common in data centers and we believe that this trend will continue to grow. By 2020, the quantity of electronically stored data will reach 35 trillion gigabytes. Quantifying the impact of SSDs on virtualization and big data technologies will help us to improve the performance and energy-efficiency of data centers.
Project Contact
- Project Lead
- Sameer Tilak (sameer)
- Project Manager
- Sameer Tilak (sameer)
Resource Requirements
- Hardware System
-
- sierra (IBM iDataPlex at SDSC)
Lima is a FutureGrid cluster at SDSC that consists of 8 nodes equipped with 480 GB SSD SATA drives. We will use Lima for conducting this research.
Scale of UseLima cluster for next few months.
Project Timeline
- Submitted
- 04/04/2013 - 16:38