NPRG042 Programming in Parallel Environment
Labs 04 - SYCL
gpulab
(instead of parlab
). The gpulab head node uses the same credentials and the same NFS, but it gives you access to our GPU worker nodes.
Image blur
The main objective of this lab is to implement a data-parallel version of image Blur stencil using SYCL. The blur stencil receives a grey-scale image (pixels are represented as floats
) and a radius
parameter as an input. Then it recalculates the pixel values (independently) computing a weighted average of all pixels in a surrounding area (the size of which is determined by the radius). The weights are inverted Manhattan distances from the central pixel, where 5
is used as a constant weight for the central pixel.
Assuming x
and y
are the coordinates of the pixel, the pseudocode for accumulating sum
and weight
values from which the new pixel is computed as sum / weight
is as follows:
for (int dy = y - radius; dy <= y + radius; ++dy) {
for (int dx = x - radius; dx <= x + radius; ++dx) {
int distance = std::abs(dx) + std::abs(dy); // Manhattan distance
int weight = (distance > 0) ? 1 / (T)distance : (T)5;
weights += weight;
sum += image[x, y] * weight;
}
}
Let us emphasize, that the actual window traversed by dx
, dy
needs to be clamped to the image boundaries.
An example of an image blurred with radius = 5
is shown below:


Compilation and execution on gpulab
The initial source code with the serial solution is ready in the /home/_teaching/para/labs/sycl-blur
directory on parlab. The initial
directory holds the main file blur.cpp
you need to modify. Implement sycl_blur()
function using the serial_blur()
function as a reference. The shared
directory contains the image wrapper implementation and the stopwatch class. Images are available in the data
directory.
For compilation, you may use the provided Makefile
. Note it uses the Intel DPC++ compiler (icpx
) with appropriate sycl
flags.
For your convenience, you may use the provided run.sh
. This is a bash script (not sbatch script), it calls make, and executes the compiled executable using separate srun
invocations since this task should be quite fast and it might be better to run it directly in the terminal.
If you wish to use srun
yourselves, do not forget to add --gpus
option to allocate a GPU. --gpus=1
will allocate one GPU (any available). --gpus=V100:1
will allocate one Volta GPU (those you should use primarily since we have 10 of them). Please note, that there are two versions of Volta GPUs on gpulab, for referential measurements, you may fix which worker is being targeted using -w
option (nodes are named volta01
-volta05
).
Always remember to use --gpus
attribute with srun
. Otherwise, srun
will actually hang up (due to an error), even if you do not require the GPU.