Master Theses

General Information

Defended Theses:

Unified Querying of Multi-Model Data

  • Most of the popular database systems can now be denoted as multi-model. Such a system can use a combination of several logical models (such as graph and document) for data storage, defining relationships between the data, and querying across the models. However, no standard multi-model query language exists currently - each system supports its proprietary system-specific approach. Since the multi-model data can be represented as a general graph, a possible approach may utilise an existing graph query language for this purpose. The thesis aims to first analyse the existing popular query languages for graphs and, based on the results, propose a query language for a unifying graph representation of multi-model data. The language should be sufficiently robust and easily transformable to core constructs used in popular multi-model databases. The proposal will be implemented and experimentally tested over a selected multi-model database or multiple single-model databases.
  • Author: Daniel Crha
  • Status: Defended on 8.2.2023 (mark: A)
  • Reviewer: Jaroslav Pokorný
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Koupil, P. - Crha, D. - Holubova, I.: MM-quecat: A Tool for Unified Querying of Multi-Model Data. EDBT '23: Proceedings of the 26th International Conference on Extending Database Technology, Ioannina, Greece 2023. OpenProceedings.org, 2023. ISBN 978-3-89318-092-9. [www]

Evolution Management in Multi-Model Databases

  • The problem of efficient management of changes in applications and their data structures has been solved for many years. A change in the structure (schema) needs to be propagated to data instances, storage strategies etc. Multi-model databases represent a set of databases that support storing and querying data in several mutually related data models (e.g., document, relational and graph). The thesis aims to analyze the existing approaches for single-model data and propose an extension for the multi-model world, particularly a selected subset of models involving the graph model, which is often omitted in the related work.
  • Author: Jáchym Bártík
  • Status: Defended on 13.9. 2022 (mark: A)
  • Reviewer: Martin Nečaský
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Koupil, P. - Bartik, J. - Holubova, I.: MM-evocat: A Tool for Modelling and Evolution Management of Multi-Model Data. CIKM '22: Proceedings of the 31st ACM International Conference on Information and Knowledge Management, pages 4892 - 4896, Atlanta, Georgia, USA, October 2022. ACM Press, 2022. ISBN 978-1-4503-9236-5. [www]

Schema Inference for Multi-model Data

  • The vast majority of existing database systems today are referred to by their vendors as multi-model, i.e. supporting more than one logical model (e.g. relational and graph etc.). In addition, there are links between the supported models that allow cross-model querying. As in single-model systems (for example, relational or XML), one of the related data management problems is the inference of a schema for the given data. In this case, the situation is further complicated by different approaches to the schema for different models and database systems (there are schema-full, schema-less, and schema-mixed variants). The author of the work will first briefly describe existing single-model approaches and analyse the suitability of their extension/combination for multi-model data (namely, a subset of interconnected models). An integral part of the work will be the implementation of the proposed design verifying its properties.
  • Author: Sebastián Hricko
  • Status: Defended on 07.06.2022 (mark: A)
  • Reviewer: Michal Kopecký
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    • Koupil, P. - Hricko, S. - Holubova, I.: A Universal Approach for Multi-Model Schema Inference. Journal of Big Data, volume 9, number 97. Springer Open, August 2022. ISSN 2196-1115. [CiteScore 2021: 14.4, SJR 2021: 2.592, SNIP 2021: 4.661] [www]
    • Koupil, P. - Hricko, S. - Holubova, I.: Schema Inference for Multi-Model Data. MODELS '22: Proceedings of the ACM / IEEE 25th International Conference on Model Driven Engineering Languages and Systems, pages 13 - 23, Montréal, Canada, October 2022. ACM Press, 2022. ISBN 978-1-4503-9466-6. [www]
    • Koupil, P. - Hricko, S. - Holubová, I.: MM-infer: A Tool for Inference of Multi-Model Schemas. EDBT '22: Proceedings of the 25th International Conference on Extending Database Technology, pages 566 - 569, Edinburgh, UK, March/April 2022. OpenProceedings.org, 2022. ISBN 978-3-89318-081-3. [www]

Analysis of Inferred Social Networks

  • The knowledge of a social network of clients would bring various benefits to companies and businesses. However, an acess to such data is highly limited. Recently there has occured the idea of inferred social networks, i.e., networks that are not built by the people themselves, but inferred from the knowledge of their particular behaviour (e.g., usage of mobile phones, public transport, bank accounts etc.). This idea however brings many challenging problems. The aim of this thesis is to focus on the analysis of an inferred social network using classical graph algorithms, as well as those specific for social networks. For this purpose the author will use real-world data from the financial sector and adapt the approaches to the specific targets of this area. The result of the thesis will be an experimental exploration of the verified approaches for this new type of networks.
  • Author: Michal Lehončák
  • Status: Defended on 29.06.2021 (mark: A)
  • Reviewer: Michal Kopecký
  • Text: [pdf]

Link Prediction in Inferred Social Networks

  • The knowledge of a social network of clients would bring various benefits to companies and businesses. However, an access to such data is highly limited. Recently there has occurred the idea of inferred social networks, i.e., networks that are not built by the people themselves, but inferred from the knowledge of their particular behaviour (e.g., usage of mobile phones, public transport, bank accounts etc.). This idea however brings many challenging problems. The aim of this thesis is to focus on the problem of link prediction in an inferred social network using existing verified approaches. For this purpose the author will use real-world data from the financial sector and adapt the selected methods to the specific targets of this area. The result of the thesis will be an experimental exploration of selected suitable approaches for this new type of networks.
  • Author: Ondřej Měkota
  • Status: Defended on 22.06.2021 (mark: A)
  • Reviewer: Ladislav Peška
  • Text: [pdf]

Comparison of Approaches for Querying of Chemical Compounds

  • Chemical compounds represent a unique type of a graph data set with a specific exploitation and querying. Currently there exist various approaches for storing and querying chemical compounds. They can be represented as general graphs or specific strings (e.g., in the SMILES format), queried using specific languages (e.g., the SMARTS language), indexed using specific indexes (e.g., GString) etc. The aim of the thesis is to describe, discuss and, in particular, experimentally compare the existing approaches for efficient storing and querying chemical compounds, including NoSQL graph databases and relational databases.
  • Author: Vojtěch Šípek
  • Status: Defended on 17.06.2019 (mark: B)
  • Reviewer: Jaroslav Pokorný
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Sipek, V. - Holubová, I. - Svoboda, M.: Comparison of Approaches for Querying Chemical Compounds. DMAH@VLDB '19: Proceedings of the 5th International Workshop on Data Management and Analytics for Medicine and Healthcare, held in conjunction with VLDB '19, pages 204 - 221, Los Angeles, CA, USA, August 2019. Lecture Notes in Computer Science 11721. Springer, 2019. ISBN 978-3-030-33751-3. [www]

Evolution Management in NoSQL Document Databases

  • Since most of the applications are dynamic, sooner or later the structure of the data needs to be changed and so have to be changed also all related aspects (storage strategies, queries, source codes etc.). We speak about so-called evolution management or adaptability. This aspect is also related to the area of Big Data and NoSQL databases where the solutions can inspire themselves in the area of evolution management in centralized XML dataabases.
  • The aim of the thesis is to analyze the XML evolution management approaches with regards to the specifics of the distributed NoSQL document databases (i.e. replication, weak consistency of data, mutual references, duplicities, schemalessness etc.) which also enable to store semi-structured data (mainly in the JSON format). On the basis of the analysis, the author will propose and implement a respective extension of a selected document database (e.g. MongoDB). The features of the proposal will be demonstrated experimentally.
  • Author: Michal Vavrek
  • Status: Defended on 12.06.2018 (mark: A)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Vavrek, M. - Holubová, I. - Scherzinger, S.: MM-evolver: A Multi-Model Evolution Management Tool. EDBT '19: Proceedings of the 22nd International Conference on Extending Database Technology, pages 586 - 589, Lisbon, Portugal, March 2019. OpenProceedings.org, 2019. ISBN 978-3-89318-081-3. [www]

Web Data Extraction

  • The vast majority of the information on the Internet is designed for human-consumption and, therefore, has no specific structure. The area of web data extraction thus focuses on extracting important information from the unstructured data into a structured form by special programs called web wrappers.
  • This work will focus on the area of restricted and safe execution of web wrappers executed in a restricted environment, e.g., in web browsers. First, the author will analyze the existing approaches and evaluate their capabilities and open problems. On the basis of the findings, the author will propose, implement, and evaluate own solution targeting the selected issues with an emphasis on modularity and extensibility.
  • Author: Tomáš Novella
  • Status: Defended on 12.9. 2015 (mark: B)
  • Reviewer: Marek Polák
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Novella, T. - Holubová, I.: User-friendly and Extensible Web Data Extraction. ISD '17: Proceedings of the 26th International Conference on Information Systems Development, Larnaca, Cyprus, September 2017. AIS Electronic Library 2017. ISBN 978-9963-2288-3-6. [www]

Mining Parallel Corpora from the Web

  • The Web is a source of numerous valuable information which can be mined and efficiently exploited. One of such information is a parallel corpus, i.e. a multilingual corpus consisting of tuples (pairs) of equivalent phrases in different languages. On the other hand, the main problems of Web-crawled data are their size, veracity and “dirtiness”.
  • The aim of the thesis is to analyze existing methods for mining parallel corpora from the data crawled from the Web and identify their weaknesses. On the basis of the analysis the author will propose a technique which will enable to identify promising candidates more efficiently and precisely. The proposed approach will be trained using an existing reliable parallel corpus (such as CzEng) modified according to the features of real-world data. Then it will be experimentally tested and evaluated using real-world Web-crawled data (e.g. the data from the Common Crawl).
  • Author: Jakub Kúdela
  • Status: Defended on 9.9. 2015 (mark: A)
  • Reviewer: Jindřich Helcl
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Kudela, J. - Holubová, I. - Bojar, O.: Extracting Parallel Paragraphs from Common Crawl. The Prague Bulletin of Mathematical Linguis, number 107, pages 39 - 56. 2017. ISSN 0032-6585. [www]

Analysis of Real-World XML Queries

  • A statistical analysis of the structure of XML data provides useful information for various optimization strategies. Unfortunately, probably there exists no analysis of real-world XML queries, which would provide even more specific information on their most commonly used constructs and their context. The software project Analyzer is a tool which enables to crawl various types of data and perform their analyses. It was further extended in several Master theses; however, so far it has not been used for an extensive analysis of real-world XML operations.
    The aim of this thesis is to analyze the current structure of real-world XML queries. First, the author should study the existing results of analyses of XML data as well as the Analyzer tool. Next, the author should utilize or implement a suitable crawler. And, finally, having the crawled non-trivial sample of real-world XML queries, an extensive analysis should be performed, possibly using the results of thesis [6]. Modifications and/or extensions of the Analyzer tool might be necessary.
  • Author: Peter Hlísta
  • Status: Defended on 9.9. 2015 (mark: B)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Hlista, P. - Holubová, I.: An Analysis of Real-World XML Queries. ODBASE '16: Proceedings of the 15th International Conference on Ontologies, DataBases, and Applications of Semantics, pages 608 - 624, Rhodes, Greece, October 2016. Lecture Notes in Computer Science 10033, Springer, 2016. ISBN 978-3-319-48471-6. [www]

Automatic Generation of Synthetic XML Documents

  • The aim of this work is a research on possibilities and limitations of automatic generation of synthetic XML documents for the purpose of testing of XML applications. First of all it is necessary to analyze existing data generators and to discuss their advantages and disadvantages. The core of the work should be a proposal and implementation of own algorithm that would focus on reasonable subset of possible XML data characteristics such as, e.g., size of the document, depth, fan-out, number of elements, number of attributes, mixed contents, IDs and IDREF(S), distribution of the constructs, complexity of the constructs, textual values etc. At the same time the usage of such a system should be easy and fast. The resulting algorithm will be also (at least partly) able to deal with mutual dependencies of various parameters. The parameters can be set either manually, or extracted from a given set of XML documents, XML queries etc. The work will include suitable experimental results.
  • Author: Roman Betík
  • Status: Defended on 9.9. 2015 (mark: B)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Betik, R. - Holubová, I.: JBD Generator: Towards Semi-Structured JSON Big Data. ADBIS '16: Proceedings of the 20th East-European Conference on Advances in Databases and Information Systems, pages 54 - 62, Prague, Czech Republic, August 2016. Communications in Computer and Information Science 637, Springer, 2016. ISBN 978-3-319-44065-1. [www]

Generating of Synthetic XML Data

  • The aim of this work is a research on possibilities and limitations of automatic generating of synthetic XML documents for the purpose of testing of XML applications. First of all it is necessary to analyze existing data generators and to discuss their advantages and disadvantages. The core of the work will be a proposal and implementation of a system that will solve selected problems of the existing tools. The work will include suitable experimental results that will provide the proof of the concept.
  • Author: Dušan Rychnovský
  • Status: Defended on 26.5. 2014 (mark: A)
  • Reviewer: Marek Polák
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Rychnovsky, D. - Holubová, I.: Generating XML Data for XPath Queries. SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing - track Web Technologies, pages 724 - 731, Salamanca, Spain, April 2015. ACM Press, 2015. ISBN 978-1-4503-3196-8. [www]

Adaptive Similarity of XML Data

  • Exploitation of similarity of XML data is currently a typical optimization strategy for many related areas of their processing. However, most of the approaches suffer from the same problem caused by the fact that different similarity evaluations are suitable for different types of data. The current strategies are either fixed, or their calibration for particular type of data has to be done manually which is not an easy task.
    The aim of this work is a research on various aspects of (semi-)automatic adaptive evaluation of similarity of XML data. Firstly, it is necessary to analyze existing solutions in the area of both XML and non-XML data and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method for similarity evaluation strategy focusing on the found disadvantages and shortcomings. The work will include suitable experimental results.
  • Author: Eva Jílková
  • Status: Defended on 27.1. 2014 (mark: B)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Jilkova, E. - Polak, M. - Holubová, I.: Adaptive Similarity of XML Data. ODBASE '14: Proceedings of the 13th International Conference on Ontologies, DataBases, and Applications of Semantics, Amantea, Italy, October 2014. Springer, 2014. [www]

Analysis and Experimental Comparison of Graph Databases

  • The aim of the thesis is a research on possibilities and limitations of a new set of database systems called graph databases which belong to the set of NoSQL databases. The author will first study and describe basic characteristics of various types of NoSQL databases (key-value, document, column-family, graph, ...) in general and provide an overview of general features of graph database systems. Next, (s)he will select a representative set of existing implementations of graph databases and compare their key features. The core of the work will be an experimental comparison of the systems, accompanied with a respective extensible framework.
  • Author: Vojtěch Kolomičenko
  • Status: Defended on 27.5. 2013 (mark: B)
  • Reviewer: Jaroslav Pokorný
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Kolomicenko, V. - Svoboda, M. - Holubová (Mlýnková), I.: Experimental Comparison of Graph Databases. iiWAS '13: Proceedings of the 15th International Conference on Information Integration and Web-based Applications & Services, pages 115 - 124, Vienna, Austria, December 2013. ACM Press, 2013. ISBN 978-1-4503- 2113-6. [www]

XSLT Benchmarking

  • The aim of this work is a research on possibilities and limitations of XSLT benchmarking and a proposal of an extensible XSLT benchmark system. First of all, it is necessary to describe and compare existing versions of XSLT, to analyze existing XSLT benchmarking projects and to perform a study of current XSLT applications in general. The core of the work will be a proposal and implementation of own project that will solve selected open problems and disadvantages of the current projects, cover a wide range of XSLT use cases and enable user-defined parameterization and extensibility. The work will include suitable experiments with the existing XSLT processors using the proposed benchmark.
  • Author: Viktor Mašíček
  • Status: Defended on 3.9. 2012 (mark: A)
  • Reviewer: Jakub Lokoc
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Masicek, V. - Holubová (Mlýnková), I.: XSLTMark II - a Simple, Extensible and Portable XSLT Benchmark. ADBIS '13: Proceedings of the 17th East-European Conference on Advances in Databases and Information Systems, pages 113 - 120, Genoa, Italy, September 2013. Advances in Intelligent Systems and Computing, volume 241. Springer-Verlag, 2013. ISBN 978-3-319-01862-1. ISSN 2194-5357. [www]

Management of Undo/Redo Operations in Complex Environments

  • An undo/redo operation is currently a natural functionality of every application that interacts with a user. In the recent literature we can find several models which solve single-user, single-workspace issues relatively well. However, in cases when multiple users share the edited objects or there exist relations between multiple workspaces, the task becomes much more complex and the dependencies need to be solved carefully.
    The aim of this thesis is a research on possibilities and limitations of undo/redo management in the described complex environments. First of all it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work will be a proposal and implementation of own approach suitable for the student SW project DaemonX designed for the purpose of modelling and evolution management in various interconnected conceptual models. The work will include experimental results.
  • Author: Karel Jakubec
  • Status: Defended on 28.5. 2012 (mark: A)
  • Reviewer: Jakub Lokoc
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Jakubec, K. - Polak, M. - Necasky, M. - Holubová, I.: Undo/Redo Operations in Complex Environments. ANT '14: Proceedings of the 5th International Conference on Ambient Systems, Networks and Technologies, pages 561 - 570, Hasselt, Belgium, June 2014. Procedia Computer Sciences, volume 32. Elsevier, 2014. ISSN 1877-0509. [www]

Inference of an XML Schema with the Knowledge of XML Operations

  • Currently there exists a plenty of papers dealing with inference of an XML schema for given XML documents. The main aim of the approaches is to find an optimal schema which describes the structure of the data precisely and is not too complicated or artificial. For this purpose the authors exploit various metrics as well as other input data.
    The aim of this work is a research on the problem of inference of an XML schema for the given set of XML data in a situation when we are provided also with a set of related operations (XML queries, XSLT scripts etc.). Firstly, it is necessary to analyze existing inference solutions in general and to discuss their advantages and disadvantages. The core of the work is identification and discussion of information that can be extracted from a given set of XML operations and how they can be exploited to achieve more precise and realistic XML schema. The result of the work will be a proposal of own approach involving the improvements, its implementation and suitable experiments that will show its advantages.
  • Author: Mário Mikula
  • Status: Defended on 28.5. 2012 (mark: A)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Mikula, M. - Starka, J. - Mlýnková, I.: Inference of an XML Schema with the Knowledge of XML Operations. SITIS '12: Proceedings of the 8th International Conference on Signal-Image Technology and Internet-Based Systems, pages 433 - 440, Naples, Italy, November 2012. IEEE Computer Society Press, 2012. ISBN 978-1-4673-5152-2. [www]

Inference of XML Integrity Constraints

  • Currently there exists a plenty of papers dealing with inference of XML schemas of XML documents. However, most of these approaches focus on inference of structural aspects, whereas others are often omitted. In particular, both DTD and XML Schema languages involve ID and IDREF(S) data types that specify unique identifiers and references to them. XML Schema extends this feature using unique, key and keyref constructs that have the same purpose but enable one to specify the unique/key values more precisely. In addition, its assert and report constructs enable one to express specific constraints on values using XPath language. And there are also more general integrity constraints that could be inferred, though they cannot be expressed in the existing schema specification languages so far.
    The aim of this work is a research on various aspects of the problem of (semi)automatic inference of various integrity constraints of XML data. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own approach facing selected disadvantages of the existing ones. The work will include suitable experimental results.
  • Author: Matej Vitásek
  • Status: Defended on 6.2. 2012 (mark: A)
  • Reviewer: Tomáš Knap
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Vitasek, M. - Mlýnková, I.: Inference of XML Integrity Constraints. ADBIS '12: Proceedings of the 16th East-European Conference on Advances in Databases and Information Systems, pages 285 - 296, Poznan, Poland, September 2012. Advances in Intelligent and Soft Computing, Springer-Verlag, 2012. ISBN 978-3-642-32740-7. ISSN 2194-5357. [www]

Adaptation of Relational Database Schema

  • Since most of the current applications are dynamic, sooner or later the structure of the data needs to be changed and so have to be changed also all related issues. We speak about evolution and adaptability of applications. One of the aspects of this problem is adaptation of the respective storage of the data.
    The aim of this work is a research on possibilities and limitations of adaptation of a relational database schema. First of all it is necessary to analyze the related issues (such as, e.g., adaptation of the schema, adaptation of respective queries, integration of new schema etc.) in general and to discuss their key problems and solutions. On the basis of the analysis the author will select one particular direction. The core of the work will be a proposal and implementation of own approach dealing with selected open issue(s) related to database schema evolution. The work will include experimental results.
  • Author: Martin Chytil
  • Status: Defended on 6.2. 2012 (mark: A)
  • Reviewer: Michal Kopecký
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Chytil, M. - Polak, M. - Necasky, M. - Holubová (Mlýnková), I.: Evolution of a Relational Schema and its Impact on SQL Queries. IDC '13: Proceedings of the 7th International Symposium on Intelligent Distributed Computing, pages 5 - 15, Prague, Czech Republic. Studies in Computational Intelligence, volume 511. Springer, 2013. ISBN 978-3-319-01570-5. [www]

Schematron Schema Inference

  • Currently there exist a plenty of papers dealing with inference of XML schemas of XML documents expressed either in XML Schema or DTD. However, there are other languages that enable to specify the structure of XML data in quite a different way. An example of such language is Schematron, an ISO standard based on specification of conditions the XML data should follow instead of a grammar. However, there seems to be no approach for inference of Schematron schemas.
    The aim of this work is a research on various aspects of the problem of automatic inference of an XML schema for a given set of XML documents. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method of automatic schema inference dealing with constructs of Schematron. The work will include suitable experimental results.
  • Author: Michal Kozák
  • Status: Defended on 6.2. 2012 (mark: A)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Kozak, M. - Starka, J. - Mlýnková, I.: Schematron Schema Inference. IDEAS '12: Proceedings of the 16th International Database Engineering & Applications Symposium, pages 42 - 50, Prague, Czech Republic, August 2012. ACM Press, 2012. ISBN 978-1-4503-1234-9. [www]

Efficient Detection of XML Integrity Constraints

  • An important aspect of efficient data processing is knowledge of integrity constraints that are covered in the described reality. However, similarly to the problem of schemas which are often omitted and rather assumed implicitly, also integrity constraints are not explicitly expressed in most applications, even though there exist several suitable languages and tools, such as general OCL, Incox for XML data etc.
    The aim of this work is a research on various aspects of the problem of (semi)automatic detection of various integrity constraints covered in the given set of data. In particular, it will focus on XML data. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own approach facing selected disadvantages of the existing ones. The work will include suitable experimental results.
  • Author: Michal Švirec
  • Status: Defended on 5.9. 2011 (mark: B)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Svirec, M. - Mlýnková, I.: Efficient Detection of XML Integrity Constraints Violation. NDT '12: Proceedings of the 4th International Symposium on Networked Digital Technologies, pages 259 - 273, Dubai, UAE, April 2012. Communications in Computer and Information Science 293, Springer-Verlag, 2012. ISBN 978-3-642-30506-1. ISSN 1865-0929. [www]

Optimization and Refinement of XML Schema Inference Approaches

  • Currently there exist several works which focus on the problem of (semi)automatic inference of XML schemas for a given set of XML documents. Even though most of the approaches focus on inference of correct and optimal regular expressions, the results they output are still quite complex and unnatural.
    The aim of this work is a research on various aspects of the problem. Firstly, it is necessary to analyze the existing solutions and compare and discuss their outputs. The core of the work is a proposal and implementation of own method focusing on optimization and refinement of existing approaches to obtain more realistic and natural schemas. For this purpose the approach can exploit, e.g., detailed analyses of the input data, user interaction, various metrics etc. The work will include suitable experimental results.
  • Author: Michal Klempa
  • Status: Defended on 5.9. 2011 (mark: A)
  • Reviewer: Jakub Stárka
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Klempa, M. - Starka, J. - Mlýnková, I.: Optimization and Refinement of XML Schema Inference Approaches. ANT '12: Proceedings of the 3rd International Conference on Ambient Systems, Networks and Technologies, pages 120 - 127, Niagara Falls, Ontario, Canada, August 2012. Procedia Computer Sciences, volume 10, Elsevier, 2012. ISSN 1877-0509. [www]

XML Query Adaptation

  • Since XML has become a de-facto standard for data representation and manipulation, there exists a huge amount of applications having their data represented in XML. However, since most of applications are dynamic, sooner or later the structure of the data needs to be changed and so have to be changed also all related issues. We speak about so-called evolution and adaptability of XML applications. One of the aspects of this problem is to adapt the respective operations over the evolving XML data, in particular XML queries, expressed, e.g., in XPath or XQuery.
    The aim of this work is a research on possibilities and limitations of XML query adaptation. First of all it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work should be a proposal and implementation of own approach dealing with selected disadvantages and open issues. The proposal should involve classification of the respective queries and adaptation steps as well as discussion of possible/necessary user involvement. The work will include experimental results.
  • Author: Marek Polák
  • Status: Defended on 5.9. 2011 (mark: C)
  • Reviewer: Jakub Malý
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Polak, M. - Mlýnková, I. - Pardede, E.: XML Query Adaptation as Schema Evolves. ISD '12: Proceedings of the 21st International Conference on Information Systems Development, pages 401 - 416, Prato, Italy, August 2012. Springer Science+Business Media, Inc., 2013. ISBN 978-1-4614-7539-2. [www]

Similarity of XML Data

  • A possible enhancing of XML data management tools is to store and manage similar XML data in the same or similar way, i.e. to exploit the idea of clustering. For this purpose it is necessary to propose a suitable technique, which is able to measure similarity among XML documents, XML schemes, or between the two groups.
    The aim of this work is a research on various aspects of the problem and its limitations. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method for similarity evaluation focusing on the found disadvantages and shortcomings. The work will include suitable experimental results.
  • Author: Jakub Stárka
  • Status: Defended on 6.9. 2010 (mark: B)
  • Reviewer: Jakub Klímek
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Starka, J. - Mlýnková, I. - Klimek, J. - Necasky, M.: Integration of Web Service Interfaces via Decision Trees. IIT '11: Proceedings of the 7th International Symposium on Innovations in Information Technology, pages 47-52, Abu Dhabi, United Arab Emirates, April 2011. IEEE Computer Society, 2011. ISBN 978-1-4577-0311-9. [www]

Processing of Incorrect XML Data

  • Since XML has become a de-facto standard for data representation and manipulation, there exists a huge amount of applications having their data represented in XML. The problem is that all such applications assume that their input is correct, i.e. the XML documents are well-formed and valid against eventually existing XML schema. But, according to analyses, almost 50% of real-world XML documents involve various errors and for further processing need to be appropriately corrected or treated specifically.
    The aim of this work is a research on possibilities and limitations of techniques for processing and/or correction of incorrect XML data. First of all it is necessary to analyze existing solutions (both theoretical and commercial) and to discuss their advantages and disadvantages. The core of the work should be a proposal and implementation of own approach dealing with the identified disadvantages. The work will include experimental results.
  • Author: Martin Svoboda
  • Status: Defended on 6.9. 2010 (mark: A)
  • Reviewer: Martin Nečaský
  • Text: [pdf]
  • Notes:
    • The Dean's Award for the Best Master Thesis
    • The results of the thesis have been published in:
      Svoboda, M. - Mlýnková, I.: Correction of Invalid XML Documents with Respect to Single Type Tree Grammars. NDT '11: Proceedings of the 3rd International Symposium on Networked Digital Technologies, pages 179 - 194, Macau, China, July 2011. Communications in Computer and Information Science 136, Springer-Verlag, 2011. ISBN 978-3-642-22185-9. ISSN 1865-0929. [www]

XML Schema Evolution

  • Since XML has become a de-facto standard for data representation and manipulation, there exists a huge amount of applications having their data represented in XML. However, since most of applications are dynamic, sooner or later the structure of the data needs to be changed, whereas we still need to be able to work with the old as well as new data. We speak about so-called schema evolution, i.e. encountering the situation that XML schema of the data is updated and we need to apply these updates on the respective XML documents or even XML operations (queries) so that they become valid and relevant again.
    The aim of this work is a research on possibilities and limitations of techniques for XML schema evolution. First of all it is necessary to analyze existing solutions (both theoretical and commercial) and to discuss their advantages and disadvantages. The core of the work should be a proposal and implementation of own approach dealing with the identified disadvantages. The work will include experimental results.
  • Author: Jakub Malý
  • Status: Defended on 24.5. 2009 (mark: A)
  • Reviewer: Jakub Klímek
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    • Maly, J. - Necasky, M. - Mlýnková, I.: Efficient adaptation of XML data using a conceptual model. Information Systems Frontiers, pages 1 - 34. Springer Science+Business Media, 2012. ISSN 1387-3326. [IF: 0.912, 5-Year IF: 1.074] [www]
    • Maly, J. - Mlýnková, I. - Necasky, M.: XML Data Transformations as Schema Evolves. ADBIS '11: Proceedings of the 15th International Conference on Advances in Databases and Information Systems, pages 375 - 388, Vienna, Austria, September 2011. Lecture Notes in Computer Science 6909, Springer-Verlag, 2011. ISBN 978-3-642-23736-2. ISSN 0302-9743. [www]

Adaptability in XML-to-Relational Mapping Strategies

  • One of the ways how to manage XML documents is to exploit tools and functions offered by (object-)relational database systems. The key aim of such techniques is to find the optimal mapping strategy, i.e. the way the XML data are stored into relations. Currently the most efficient approaches, so-called adaptive methods, search a space of possible mappings and choose the one which suits the given sample data and query workload the most. Since the space of the possibilities is theoretically infinite, the methods exploit various approximations, heuristics, terminal conditions etc., and in fact search for a suboptimal solution.
    The aim of this work is a research on various aspects of the problem and its limitations. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method of adaptive XML-to-relational mapping focusing on the found disadvantages and shortcomings. One of the possible ways can be exploitation of a general heuristic method called ant colony optimization. The work will include suitable experimental results.
  • Author: Luboš Kulič
  • Status: Defended on 25.5. 2009 (mark: A)
  • Reviewer: Martin Nečaský
  • Text: [pdf]
  • Note:
    • 3rd place in competition Master Thesis of 2009
    • The results of the thesis have been published in:
      Kulič L.: Adaptability in XML-to-Relational Mapping Strategies. SAC '10: Proceedings of the 25th Symposium On Applied Computing, track Database Theory, Technology, and Applications, pages 1674 - 1679, Sierre, Switzerland, March 2010. ACM Press, 2010. ISBN 978-1-60558-639-7. [www]

Automatic Construction of an XML Schema for a Given Set of XML Documents

  • Statistical analyses of real-world XML data show that a significant portion of XML documents do not have an appropriate XML schema. And even if they have, the XML Schema language is exploited even less. It is probably caused by the fact that manual construction of an XML schema is not an easy task and that the XML Schema language is relatively complex.
    The aim of this work is a research on various aspects of the problem of automatic construction of an XML schema for a given set of XML documents. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method of automatic construction focusing on the found disadvantages and shortcomings. A possible approach can focus on new XML Schema constructs such as, e.g., inheritance, global and local items, groups of elements and attributes, etc. in combination with user interaction. The work will include suitable experimental results.
  • Author: Julie Vyhnanovská
  • Status: Defended on 25.5. 2009 (mark: A)
  • Reviewer: Martin Nečaský
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Vyhnanovska, J. - Mlýnková, I.: Interactive Inference of XML Schemas. RCIS '10: Proceedings of the 4th International Conference on Research Challenges in Information Science, pages 191 - 202, Nice, France, May 2010. IEEE Computer Society Press, 2010. ISBN 978-1-4244-4840-1. [www]

XML Benchmarking

  • The main aim of this work is a research on possibilities and limitations of XML benchmarking projects that enable to test performance of XML processing techniques. First of all it is necessary to analyze existing solutions and projects and to discuss their advantages and disadvantages. The core of the work should be a proposal and implementation of own benchmarking project which can focus e.g. on statistically frequent XML patterns. The work will include suitable experimental results.
  • Author: Maroš Vranec
  • Status: Defended on 24.9. 2008 (mark: A)
  • Reviewer: Jan Ulrych
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Vranec, M. - Mlýnková, I.: FlexBench: A Flexible XML Query Benchmark. DASFAA '09: Proceedings of the 14th International Conference on Database Systems for Advance Applications, pages 421 - 436, Brisbane, Australia, April 2009. Springer-Verlag, 2009. ISBN 978-3-642-00886-3. [www]

Comparison of Fully Software and Hardware Accelerated XML Processing

  • Since XML technologies are currently exploited as a de-facto standard in numerous spheres of human activities, the demand for efficient XML processing is increasing every day. A natural approach to its optimization is exploitation of a hardware designed particularly for the purpose of XML processing. But this solution brings several open questions such as, e.g., what is the current state of the art of such appliances, in which situations is this solution suitable or what are the related advantages and disadvantages.
    The aim of this work is to compare and analyze features of standard fully software and hardware accelerated solutions for XML processing from the point of view of currently supported technologies, efficiency, price and other relevant aspects. Firstly, it is necessary to select an appropriate appliance and describe its XML processing features. The core of the work is specification of appropriate testing scenarios which will focus on key features of the appliance and analysis of the related results. The second part of the thesis will focus on selecting suitable fully software implementations of the supported XML technologies and comparison of their features and efficiency with the hardware accelerated solution.
  • Author: Tomáš Knap
  • Status: Defended on 25.9. 2008 (mark: A)
  • Reviewer: Martin Nečaský
  • Text: [pdf]
  • Notes:
    • Finalist of competition IT Master Thesis of 2009
    • The results of the thesis have been published in:
      Knap, T. - Mlýnková, I. - Necasky, M.: Performance of Fully Software and Hardware Accelerated XML Processing and Securing. Innovations '08: Proceedings of the 5th International Conference on Innovations in Information Technology, pages 64 - 68, Al Ain, United Arab Emirates, December 2008. IEEE Computer Society Press, 2008. ISBN 978-1-4244-3397-1. [www]

Similarity of XML Data

  • A possible enhancing of XML data management tools is to store and manage similar XML documents in the same or similar way, i.e. to exploit the idea of clustering. For this purpose it is necessary to propose a suitable technique, which is able to measure similarity among XML documents, XML schemes, or between the two groups.
    The aim of this work is a research on various aspects of the problem and its limitations. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method for similarity evaluation focusing on the found disadvantages and shortcomings. The work will include suitable experimental results.
  • Author: Aleš Wojnar
  • Status: Defended on 26.5. 2008 (mark: A)
  • Reviewer: Martin Nečaský
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    • Wojnar, A. - Mlýnková, I. - Dokulil, J.: Similarity of DTDs Based on Edit Distance and Semantics. IDC '08: Proceedings of the 2nd International Symposium on Intelligent Distributed Computing, pages 207 - 216, Catania, Italy, September 2008. Springer-Verlag, 2008. ISBN 978-3-540-85256-8. ISSN 1860-949X. [www]
    • Wojnar, A. - Mlýnková, I. - Dokulil, J.: Structural and Semantic Aspects of Similarity of Document Type Definitions and XML Schemas. Special Issue on Intelligent Distributed Information Systems of the International Journal on Information Sciences, volume 180, issue 10, pages 1817 - 1836. Elsevier, May 2010. ISSN 0020-0255. [www]

Automatic Construction of an XML Schema for a Given Set of XML Documents

  • Author: Ondřej Vošta
  • Status: Defended on 6.2. 2006 (mark: A)
  • Reviewer: Kamil Toman
  • Text (in Czech): [pdf]
  • Note: The results of the thesis have been published in:
    Vosta, O. - Mlýnková, I. - Pokorný, J.: Even an Ant Can Create an XSD. Proceedings of the 13th International Conference on Database Systems for Advance Applications, pages 35 - 50, New Delhi, India, March 2008. Lecture Notes in Computer Science 4947, Springer-Verlag, 2008. ISBN 978-3-540-78567-5. ISSN 0302-9743. [www]