Adapting K-means Clustering for Modern GPUs

Supervisor: Martin Kruliš
Intended Scope: Master thesis
Required Skills: C/C++, parallel programming, GPU programming, CUDA/OpenCL


Clustering is one of the practical data-mining methods. K-means is an heuristic algorithm that computes an approximation of Voronoi cells partitioning of set of points in Rn space.

Even though the algorithm is quite straight forward and seems that it can be easily parallelized, a GPU implementation is quite challenging. There are many special cases that needs to be studied, based on the size of the data set, the number of final clusters, the dimensionality of the space, etc. The problem gets even more interesting, if we try to utilize multiple GPUs, or even a whole cluster of computers (with GPUs).

This work will require extensive research, prototype implementations, and a lot of experimental work. On the other hand, it is most likely the work can be presented as a conference paper, thus it provide a good start for a student, who is considering a career in research.

The content is available under Creative Commons BY-NC 3.0 License