NPRG058 - AdvPara

Final Assignment

The final assignment is to implement k-medoids algorithm to cluster image feature signatures using SQFD as a distance function. The SQFD was introduced in the second CUDA lecture and it is revised below. The serial implementation is available at NFS /home/_teaching/advpara/final-kmedoids and you get the same code in your prepared Git repository. You can use any parallel approach and any technologies you deem fit; however, we expect that you use a combination of at least two different technologies.

Algorithm description

The k-medoids algorithm is quite similar to k-means clustering. It divides a set of objects into k clusters based on their similarity defined by given distance function. Unlike k-means, where the objects are points in R^d and Euclidean distance defines (dis)similarity, k-medoids work with more relaxed assumptions. The clustered objects and given (dis)similarity function d(o_1, o_2) can be anything, the only assumption is that d conforms to metric axioms. Therefore, the assignment step of the algorithm is virtually the same as in k-means, but the construction of means (centroids) is quite different.

Since we cannot construct centroid of a cluster using vector operations (like in k-means), we choose one of the assigned object as a representative (centroid) of the cluster. The best object for the job is the one that lies closest to the center, so we compute distances between all objects within one cluster and subsequently select an object which has the lowest sum of the distances to all its neighbors.

The input dataset comprise feature signatures of images. A feature signature is a vector of tuples (f_i, w_i), where f_i is a feature vector R^d of fixed dimensionality (d = 7 in our case) and w_i is its weight (of R+). Signatures may differ in size, so that simpler images are represented by shorter vectors and vice versa. Our signatures are about tens to hundreds of features long.

The SQFD (Signature Quadratic Form Distance) was used as the distance that measures dissimilarity, which is capable of dealing with the fact that the input signatures have different sizes. How the function is computed was described in detail in the lectures; however, for a quick refresher:

sqfd(o, q) = sqrt((w_o | -w_q) x A_f x (w_o | -w_q)^T), where o and q are input feature signatures, | is vector concatenation, and A_f is a matrix defined as

A_f(i,j) = exp(-α L_2^2(f_i, f_j)), where f_i is i-th feature of concatenated vector of features (from o and q) and α is a constant tuning parameter.

Remember, that the best way how to parallelize SQFD internally is to evaluate all items of A_f independently, multiply them with appropriate weights (according to distributivity rules), and sum them up using parallel reduction.

Technical details

There are only two evaluation criteria:

The solution must utilize a combination two parallel technologies. Natural selection would be CUDA + MPI; however, using CUDA (multi-GPU) with smart host-based scheduling or MPI + elaborate CPU parallel solution is also acceptable.
The solution must exhibit sufficient speedup (based on selected technologies) -- i.e., a speedup that would be expected by a straightforward decent implementation.

Also provide a readme file with

a brief summary explaining how your solution was implemented (mainly which parallel technologies were used and how),
information how to execute the solution (especially if you use MPI or if you added some extra CLI arguments),
and measured (expected) speedup in comparison with the provided serial baseline.

Submit your solution into prepared git repository and once it is complete, notify the teachers via Mattermost. There is no deadline, but it is recommended to submit your solution at least one week before you plan to enlist for an oral exam, so that there is enough time to validate the solution and award you the credit.

NPRG058: Advanced Programming in Parallel Environment

Final Assignment

Algorithm description

Technical details