NPRG042 Programming in Parallel Environment

Assignment 4

Physical Simulation
SYCL
Assigned:	14.4.2025
Deadline:	4.5.2025 23:59 (CEST)
ReCodEx:	assignment
Results:	volta01 (2 GPU)

speedup	points
20× or less	0
20× to 75×	1
75× to 150×	2
150× to 300×	3
300× or more	4

The assignment is to implement a physical simulation based on the given model specification described in a separate document (Czech, English). The simulation is about simple particle mechanics based on forces and velocities. Unlike the other assignments, this one will be developed and tested on our GPU-lab (the front server is gpulab.ms.mff.cuni.cz and it is accessible to you in the same manner as parlab).

Your solution must use a framework, which is available for you at /home/_teaching/para/04-potential (including the testing data and serial implementation). Your solution is expected to modify only the implementation.hpp and that is the only file you are supposed to submit for evaluation. The implementation class (ProgramPotential) must be preserved and it must implement the interface IProgramPotential.

The compilation is performed by Intel DPC++ (as indicated in the attached Makefile). Your solution will be tested only with double-precision floats (coordinates) and 32-bit unsigned integers (indices), even though the interface is templated. Your solution is expected to perform the following changes in the implementation.hpp file:

The virtual function initialize() is designed to initialize (allocate) the memory buffers. You may also copy input data (like edges) to the GPU or initialize the velocities to zero vectors.
The virtual function iteration(points) should implement the computation performed in each simulation iteration. The function updates the velocities and moves the points according to them. This function is called as many times as many iterations as the simulation has to perform. Furthermore, the API guarantees that every iteration call (starting from the second iteration) is given the points vector yielded by its previous call. In other words, you may cache the point positions (or any related data) in the GPU memory.
The virtual function getVelocities() is expected to retrieve the internal copy of the point velocities. This function is invoked for verification only and it does not have to be efficient (its execution time will not be added to the overall measured time).

The solution must be ready to run both on CPU and GPU (as it will be tested in ReCodEx in a CPU-only setup). The safest way is to use default_selector for queue construction. If you aim for a more elaborate (multi-GPU) solution, you need to implement a CPU fallback if no GPU device is detected.

All the parameters of the model are filled in the member variables of the ProgramPotential object before the initialize() function is called. The member variable mVerbose indicates whether the application was launched with -verbose argument. If so, you may print out debugging or testing output without any penalization. Otherwise, your implementation must not write anything to the output (stdout nor to stderr). Critical errors should be reported by throwing UserException. Note that asynchronous exceptions must be handled correctly as explained in the lecture.

The framework tests the solution and prints out the measured times. The first value is the time of the initialization and the second value is the average time of an iteration (both in milliseconds). The initialization time is not considered for the speedup evaluation. Your solution will be tested on different data than you are given, but we will use the same numbers of vertices and edges. The verification process will be performed separately from the time measurements; thus, it will not influence the performance. All tests will be performed using the initial simulation parameters (see potential.cpp).

Supplied Makefile may be used for compilation. Do not forget, that the code has to be compiled by workers. The SLURM partition for you is named gpu-short-teach and it should have higher priority than most other partitions (use it for short jobs only). Also do not forget to specify your account (nprg042s) For allocating the GPUs on the workers, you need to pass a general resources request parameter to srun:

$> srun -p gpu-short-teach -A nprg042s --gpus=1

The solution will be tested on volta01 (add -w volta01 to the srun command).

Use --gpus=2 if you want to implement your solution for two GPUs (volta01 has only 2 GPUs, so it is not necessary to optimize for more).