Write a CUDA accelerated implementation that will compute a histogram (frequency analysis) of a text file. The input is a plain text where one char is one byte (we do not bother with encoding details). The histogram holds number of occurrences for each character. The range of characters in the histogram may be adjusted by algorithm parameters, range 0-127 is computed by default.
The initial source code is already in /home/_teaching/advpara/ha2-cuda-histogram
. CUDA kernels along with C function that invokes them (runners) should be in kernels/kernels.cu
. You also need to copy the runner header into headers/kernels.cuh
and do not forget to explicitly instantiate its C++ template with appropriate parameters (it is sufficient if the kernel is executable with uint8
as input element type and uint32
as result type).
Each algorithm has to be wrapped in a class which inherits from IHistogramAlgorithm
. Place your CUDA algorithm wrappers into headers/cuda.hpp
. You will also need to register your algorithm wrapper (create a unique pointer instance) in getAlgorithm
function of histogram.cpp
. See serial
algorithm and enlist your algorithms analogically.
In the first episode of Histogram saga, you need to implement two simple approaches:
Check out all the input files and compare performance of the two aforementioned approaches.
Stretch Goal: Ty to use warp-opportunistic approach to aggregate atomic updates within one warp (see the lectures how to achieve that).