NPRG042 Programming in Parallel Environment

Labs 04 - SYCL

In this lab, we will need to work on the gpulab (instead of parlab). The gpulab head node uses the same credentials and the same NFS, but it gives you access to our GPU worker nodes.

Image blur

The main objective of this lab is to implement a data-parallel version of image Blur stencil using SYCL. The blur stencil receives a grey-scale image (pixels are represented as floats) and a radius parameter as an input. Then it recalculates the pixel values (independently) computing a weighted average of all pixels in a surrounding area (the size of which is determined by the radius). The weights are inverted Manhattan distances from the central pixel, where 5 is used as a constant weight for the central pixel.

Assuming x and y are the coordinates of the pixel, the pseudocode for accumulating sum and weight values from which the new pixel is computed as sum / weight is as follows:

for (int dy = y - radius; dy <= y + radius; ++dy) {
    for (int dx = x - radius; dx <= x + radius; ++dx) {
        int distance = std::abs(dx) + std::abs(dy); // Manhattan distance
        int weight = (distance > 0) ? 1 / (T)distance : (T)5;
        weights += weight;
        sum += image[x, y] * weight;
    }
}

Let us emphasize, that the actual window traversed by dx, dy needs to be clamped to the image boundaries.

An example of an image blurred with radius = 5 is shown below:

Compilation and execution on gpulab

The initial source code with the serial solution is ready in the /home/_teaching/para/labs/sycl-blur directory on parlab. The initial directory holds the main file blur.cpp you need to modify. Implement sycl_blur() function using the serial_blur() function as a reference. The shared directory contains the image wrapper implementation and the stopwatch class. Images are available in the data directory.

For compilation, you may use the provided Makefile. Note it uses the Intel DPC++ compiler (icpx) with appropriate sycl flags.

For your convenience, you may use the provided run.sh. This is a bash script (not sbatch script), it calls make, and executes the compiled executable using separate srun invocations since this task should be quite fast and it might be better to run it directly in the terminal.

If you wish to use srun yourselves, do not forget to add --gpus option to allocate a GPU. --gpus=1 will allocate one GPU (any available).  --gpus=V100:1 will allocate one Volta GPU (those you should use primarily since we have 10 of them). Please note, that there are two versions of Volta GPUs on gpulab, for referential measurements, you may fix which worker is being targeted using -w option (nodes are named volta01-volta05).

Always remember to use --gpus attribute with srun. Otherwise, srun will actually hang up (due to an error), even if you do not require the GPU.