NPRG042 Programming in Parallel Environment

Assignment 2


K-means clustering
Intel Threading Building Blocks
Assigned: 19.3.2024
Deadline: 1.4.2024 23:59 (CEST)
Supervisor: Martin Kruliš
ReCodEx: assignment
Results: w201 (32 threads), w202 (64 threads)
speedup points
2× or less 0
2× to 5× 1
5× to 10× 2
10× to 18× 3
18× or more 4

Write an implementation of k-means clustering algorithm using Intel Threading Building Blocks technology. The algorithm gets input points in a plane (R^2), the number of clusters K, and the number of iterations. It returns the coordinates of the cluster centroids and the assignment of the points after the last iteration.

The algorithm works as follows. As an initialization, the first K points are taken as the initial set of cluster centroids. After that, the algorithm performs a prescribed number of iterations, whilst each iteration consists of the following steps:

  1. Each point is assigned to its nearest cluster -- i.e., the cluster that has its centroid nearest by the Euclidean metrics. In case multiple clusters with centroids are equidistant from the point being assigned, the cluster with the lowest index is considered the closest one.
  2. A new centroid is computed for each cluster. The centroid is an average of all assigned points (computed per dimension). The task expects that integer arithmetic is used (i.e., all operations including the division are computed on integers). In case a cluster has no points assigned to it (and thus computing a new centroid will result in zero division error), the centroid should be preserved (no new centroid shall be computed).

The number of points is divisible by 1,024 and the number of clusters K will not exceed 256 (clusters are indexed from 0). All points will fit in RAM and the sum of all point coordinates in one dimension as well as the sum of squares of coordinate differences (Euclidean distance) will fit into a signed 64-bit integer.

Your solution must use a framework, which is available for you at /home/_teaching/para/02-kmeans (including the testing data, data generator, and serial implementation). Your solution is expected to modify only the implementation.hpp file of the framework while you have to preserve the name of the implementation class (KMeans) and it must implement the interface IKMeans. The modified file implementation.hpp is also the only file you are expected to submit into ReCodEx. If you need to create additional headers (*.hpp), submit them as well, but you cannot create any subdirectories (and it must not collide with any of the names in the framework).

Use the attached Makefile for compilation and there is also a build.sh and run.sh script wrappers that can be used with sbatch command. When debugging, the KMeans class has a bool template parameter DEBUG. When the program is called with -debug argument, the debugging version of the class is used. You may use it to print out debugging information, but make sure that the regular (non-debug) version is not printing anything to the stdout.

Your solution will be tested on different datasets than you are provided. However, the evaluation data will be of the same size. The homogenous partition (w2xx workers) will be used to measure the speedup. The K for the testing will be selected as rather higher (i.e., you do not have to optimize for very small K, such as lower than 10).