a generator of testing Big Data

Tomáš Faltín, Michal Hanzeli, Irena Holubová, Dušan Variš, Jan Škvařil, Vojtěch Šípek


In the present time the amount of data that needs to be processed grows on a daily basis. For this reason, many of the state-of-the-art applications need to be designed in such a way that allows them to process large volumes of data efficiently. BDgen is a solution for such a problem. The tool is implemented as a general framework which is highly extensible with new plugins that might be developed by a third party. The whole system is divided into scalable backend designed to generate Big Data on clusters with MPI framework and frontend for user friendly definition of input data for backend. We implemented generators of two commonly used formats - JSON and CSV. Our generator also contains plugin for generating data based on a regular grammar.

Where to get it

BDgen is open-source and is therefore freely accessible. Users are welcome with their feedback.



Research group at the department:

XML and Web Technologies Research Group


  • Faltín T., Hanzeli M., Šípek V., Škvařil J., Variš D., Holubová I.: BDgen: A Universal Big Data Generator, in Proceedings of the 21st International Database Engineering & Applications Symposium, Bristol, Great Britain, ACM Press, ISBN: 978-1-4503-1234-9, pp. 200-208, 2017 - text
