Parallel Clustering on GPU's

Project Information

Discipline
Computer Science (401) 
Subdiscipline
11.07 Computer Science 
Orientation
Research 
Abstract

The applications in science are creating huge amount of data sets. These data sets need to be classified into subsets in order to draw some meaningful conclusions. Data clustering is the statistical analysis process that groups similar objects into relatively homogeneous sets which are called clusters. The computational demands of data clustering grow rapidly. And it is very time consuming for single CPU to processing large data sets. To address this computational demands, we investigated a CUDA based high performance solution to two data clustering algorithms: K-means and C-means. The k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. Fuzzy C-means is an algorithm of clustering that allows one element to belong to two or more clusters with different probabilities.

For this project, we propose a low cost, user friendly, high performance solution that used to perform parallel clustering on GPU's. Two data clustering algorithms, Kmeans and Cmeans were implemented with our proposed solution. The optimization issues of our CUDA based implementation for these two clustering algorithms, such as tiling of input and intermediate data, and splitting of computational kernel, are studied. The illustration on how to implement the global reduction primitives, and how to apply them to Kmeans were given. The performance of sequential, single GPU, and multiple GPU implementations of the two data clustering algorithms are compared in detail.

The parallel clustering on GPUs project consists of three subprojects, GlobalReductions, Cmeans, and Runtime tool. The timeline and associated pdf reports are as follows:

1. GlobalReductions (Jan/2013~now) (PDF)
2. Cmeans (May/2012~Sep/2012) (PDF) (DataSet)
3. Runtime (Aug/2012~Dec/2012) (PDF) (PPT)

Intellectual Merit

We expect to develop a low cost, user friendly, high performance framework for parallel clustering on GPU's. In programmability side, it provided the developers with a unique global reduction programming interface that hide implementation and optimization details. In performance side, it showed the much better performance for Kmeans and Cmeans as compared with corresponding multi-core CPU implementation; and it showed scalable performance as opposed to single GPU results. This research work is expected to aware the neccessary and advantages of providing global reductions primitives for parallel clustering computation. The source code of Cmeans and Runtime tool project are archived in Github at: Cmeans Code: https://github.com/cyberaide/biostatistics Panda V0.3: https://github.com/futuregrid/panda/tree/master/PandaV0.3 (Aug2012~Dec2012) Panda V0.4: https://github.com/futuregrid/panda/tree/master/PandaV0.4 (Jan2013~April2013)

Broader Impacts

Three PhD students working on this project will make usage of high performance infrastructure on Delta cluster on FutureGrid. The research report will be archived as FG document and project code will be opened to FG users as well. In addition, some experience of this project have been written as tutorial pages in FutureGrid portal.

Project Contact

Project Lead
Gregor von Laszewski (gvonlasz) 
Project Manager
Hui Li (lihui) 
Project Members
Hui Li, Gregor von Laszewski, Fugang Wang  

Resource Requirements

Hardware System
  • delta (GPU Cloud)
 
Use of FutureGrid

We will use FutureGrid to run experiments on multiple nodes of the Delta system. The software stack of our program including: CUDA, Pthread, OpenMP, MPI. Some middle size input data will be stored in file system on Detla nodes.

Scale of Use

between 1~10 nodes on Delta cluster.

Project Timeline

Submitted
04/05/2013 - 07:25 
Completed
05/01/2013