Master Theses

General Information

Defended Theses:

Comparison of Approaches for Querying of Chemical Compounds

  • Chemical compounds represent a unique type of a graph data set with a specific exploitation and querying. Currently there exist various approaches for storing and querying chemical compounds. They can be represented as general graphs or specific strings (e.g., in the SMILES format), queried using specific languages (e.g., the SMARTS language), indexed using specific indexes (e.g., GString) etc. The aim of the thesis is to describe, discuss and, in particular, experimentally compare the existing approaches for efficient storing and querying chemical compounds, including NoSQL graph databases and relational databases.
  • Literature:
    • Holubová, I. - Kosek, J. - Minařík, K. - Novák, D.: Big Data a NoSQL databáze. Grada, Praha, Česká republika, říjen 2015. ISBN 978-80-247-5466-6.
    • PubChem http://pubchem.ncbi.nlm.nih.gov/
    • ZINC http://zinc.docking.org/
    • ChEMBL https://www.ebi.ac.uk/chembl/
    • SMILES - A Simplified Chemical Language. http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
    • SMARTS - A Language for Describing Molecular Patterns. http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
    • Haoliang Jiang, Haixun Wang, Shuigeng Zhou: GString: A Novel Approach for Efficient Search in Graph Databases http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4221705
    • Sherif Sakr - Eric Pardede: Graph Data Management: Techniques and Applications
    • Vojtech Šípek: Vizuální dotazování v chemických databázích pomocí SMARTS vzorů. Bakalářská práce. MFF UK, 2014.
  • Author: Vojtech Sipek
  • Status: Defended on 17.06.2019 (mark: B)
  • Reviewer: Jaroslav Pokorny
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Sipek, V. - Holubova, I. - Svoboda, M.: Comparison of Approaches for Querying Chemical Compounds. DMAH@VLDB '19: Proceedings of the 5th International Workshop on Data Management and Analytics for Medicine and Healthcare, held in conjunction with VLDB '19, pages 204 - 221, Los Angeles, CA, USA, August 2019. Lecture Notes in Computer Science 11721. Springer, 2019. ISBN 978-3-030-33751-3. [www]

Evolution Management in NoSQL Document Databases

  • Since most of the applications are dynamic, sooner or later the structure of the data needs to be changed and so have to be changed also all related aspects (storage strategies, queries, source codes etc.). We speak about so-called evolution management or adaptability. This aspect is also related to the area of Big Data and NoSQL databases where the solutions can inspire themselves in the area of evolution management in centralized XML dataabases.
  • The aim of the thesis is to analyze the XML evolution management approaches with regards to the specifics of the distributed NoSQL document databases (i.e. replication, weak consistency of data, mutual references, duplicities, schemalessness etc.) which also enable to store semi-structured data (mainly in the JSON format). On the basis of the analysis, the author will propose and implement a respective extension of a selected document database (e.g. MongoDB). The features of the proposal will be demonstrated experimentally.
  • Literature:
    • Holubová, I. - Kosek, J. - Minařík, K. - Novák, D.: Big Data a NoSQL databáze. Grada, Praha, Česká republika, říjen 2015. ISBN 978-80-247-5466-6.
    • Polak, M. - Chytil, M. - Jakubec, K. - Kudelas, V. - Pijak, P. - Necasky, M. - Holubova (Mlynkova), I.: Data and Query Adaptation using DaemonX. Computing and Informatics Journal, volume 34, number 1, pages 1001 - 1039. Institute of Informatics, Slovak Academy of Sciences, 2015. ISSN 1335-9150.
    • Maly, J. - Necasky, M. - Mlynkova, I.: Efficient Adaptation of XML Data Using a Conceptual Model. Information Systems Frontiers, volume 16, issue 4, pages 663 - 696. Springer Science+Business Media, 2014. ISSN 1387-3326.
    • Necasky, M. - Klimek, J. - Maly, J. - Mlynkova, I.: Evolution and Change Management of XML-based Systems. Journal of Systems and Software, volume 85, issue 3, pages 683 - 707. Elsevier, February 2012. ISSN 0164-1212.
  • Author: Michal Vavrek
  • Status: Defended on 12.06.2018 (mark: A)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Vavrek, M. - Holubova, I. - Scherzinger, S.: MM-evolver: A Multi-Model Evolution Management Tool. EDBT '19: Proceedings of the 22nd International Conference on Extending Database Technology, pages 586 - 589, Lisbon, Portugal, March 2019. OpenProceedings.org, 2019. ISBN 978-3-89318-081-3. [www]

Web Data Extraction

  • The vast majority of the information on the Internet is designed for human-consumption and, therefore, has no specific structure. The area of web data extraction thus focuses on extracting important information from the unstructured data into a structured form by special programs called web wrappers.
  • This work will focus on the area of restricted and safe execution of web wrappers executed in a restricted environment, e.g., in web browsers. First, the author will analyze the existing approaches and evaluate their capabilities and open problems. On the basis of the findings, the author will propose, implement, and evaluate own solution targeting the selected issues with an emphasis on modularity and extensibility.
  • Literature:
    • Alberto HF Laender, Berthier A Ribeiro-Neto, Altigran S da Silva, and Juliana S Teixeira. A brief survey of web data extraction tools. ACM Sigmod Record, 31(2):84–93, 2002.
    • Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, and Robert Baumgartner. Web data extraction, applications and techniques: A survey. Knowledge based systems, 70:301–323, 2014.
    • Nicholas Kushmerick. Finite-state approaches to web information extraction. In Information Extraction in the Web Era, pages 77–91. Springer, 2003.
    • Arnaud Sahuguet and Fabien Azavant. Building intelligent web applications using lightweight wrappers. Data & Knowledge Engineering, 36(3):283–316, 2001.
    • Mary Elaine Califf and Raymond J Mooney. Bottom-up relational learning of pattern matching rules for information extraction. The Journal of Machine Learning Research, 4:177–210, 2003.
    • Giovanni Grasso, Tim Furche, and Christian Schallhart. Effective web scraping with OXPath. In Proceedings of the 22nd international conference on World Wide Web companion, pages 23–26. International World Wide Web Conferences Steering Committee, 2013.
    • Tim Furche, Georg Gottlob, Giovanni Grasso, Omer Gunes, Xiaoanan Guo, Andrey Kravchenko, Giorgio Orsi, Christian Schallhart, Andrew Sellers, and Cheng Wang. DIADEM: domain-centric, intelligent, automated data extraction methodology. In Proceedings of the 21st international konference companion on World Wide Web, pages 267–270. ACM, 2012.
  • Author: Tomáš Novella
  • Status: Defended on 12.9. 2015 (mark: B)
  • Reviewer: Marek Polák
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Novella, T. - Holubova, I.: User-friendly and Extensible Web Data Extraction. ISD '17: Proceedings of the 26th International Conference on Information Systems Development, Larnaca, Cyprus, September 2017. AIS Electronic Library 2017. ISBN 978-9963-2288-3-6. [www]

Mining Parallel Corpora from the Web

  • The Web is a source of numerous valuable information which can be mined and efficiently exploited. One of such information is a parallel corpus, i.e. a multilingual corpus consisting of tuples (pairs) of equivalent phrases in different languages. On the other hand, the main problems of Web-crawled data are their size, veracity and “dirtiness”.
  • The aim of the thesis is to analyze existing methods for mining parallel corpora from the data crawled from the Web and identify their weaknesses. On the basis of the analysis the author will propose a technique which will enable to identify promising candidates more efficiently and precisely. The proposed approach will be trained using an existing reliable parallel corpus (such as CzEng) modified according to the features of real-world data. Then it will be experimentally tested and evaluated using real-world Web-crawled data (e.g. the data from the Common Crawl).
  • Literature:
    • Jason Smith, Herve Saint-Amand, Magdalena Plamada, Philipp Koehn, Chris Callison-Burch and Adam Lopez: Dirt cheap Web-scale parallel text from the Common Crawl. [www]
    • Miquel Espla-Gomis: Bitextor, a Free/Open-source Software to Harvest Translation Memories from Multilingual Websites. [www]
    • Philip Resnik, Noah A. Smith: The Web as a Parallel Corpus. [www]
    • CzEng 1.0 (Czech-English Parallel Corpus, version 1.0). [www]
    • Common Crawl
    • Bojar Ondřej. Čeština a strojový překlad. ÚFAL, Praha, Czechia, ISBN 978-80-904571-4-0, 168 pp. 2012. [www]
  • Author: Jakub Kúdela
  • Status: Defended on 9.9. 2015 (mark: A)
  • Reviewer: Jindřich Helcl
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Kudela, J. - Holubova, I. - Bojar, O.: Extracting Parallel Paragraphs from Common Crawl. The Prague Bulletin of Mathematical Linguis, number 107, pages 39 - 56. 2017. ISSN 0032-6585. [www]

Analysis of Real-World XML Queries

  • A statistical analysis of the structure of XML data provides useful information for various optimization strategies. Unfortunately, probably there exists no analysis of real-world XML queries, which would provide even more specific information on their most commonly used constructs and their context. The software project Analyzer is a tool which enables to crawl various types of data and perform their analyses. It was further extended in several Master theses; however, so far it has not been used for an extensive analysis of real-world XML operations.
    The aim of this thesis is to analyze the current structure of real-world XML queries. First, the author should study the existing results of analyses of XML data as well as the Analyzer tool. Next, the author should utilize or implement a suitable crawler. And, finally, having the crawled non-trivial sample of real-world XML queries, an extensive analysis should be performed, possibly using the results of thesis [6]. Modifications and/or extensions of the Analyzer tool might be necessary.
  • Literature:
    • Mlynkova, I. - Toman, K. - Pokorny, J.: Statistical Analysis of Real XML Data Collections. Technical report 2006/5. Charles University, Prague, Czech Republic, June 2006, 43 pages. [www]
    • Analyzer - a tool for batch file analysis. [www]
    • Extensible Markup Language (XML) 1.0 (Fourth Edition). W3C Recommendation, 16 August 2006. [www]
    • W3C Technical Reports and Publications. [www]
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-SD-247-2725-7. (in Czech)
    • J. Schejbal: A System for Analysis of Collections of XML Queries. Master Thesis, Charles University in Prague, Czech Republic, May 2010. [www]
    • J. Sochna: Collecting XML Data and Meta-Data from the Internet. Master Thesis, Charles University in Prague, Czech Republic, May 2010. [www] (in Czech)
    • M. Svoboda: Processing of Incorrect XML Data. Master Thesis, Charles University in Prague, Czech Republic, September 2010. [www]
    • J. Starka: Similarity of XML Data. Master Thesis, Charles University in Prague, Czech Republic, September 2010. [www]
  • Author: Peter Hlísta
  • Status: Defended on 9.9. 2015 (mark: B)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Hlista, P. - Holubova, I.: An Analysis of Real-World XML Queries. ODBASE '16: Proceedings of the 15th International Conference on Ontologies, DataBases, and Applications of Semantics, pages 608 - 624, Rhodes, Greece, October 2016. Lecture Notes in Computer Science 10033, Springer, 2016. ISBN 978-3-319-48471-6. [www]

Automatic Generation of Synthetic XML Documents

  • The aim of this work is a research on possibilities and limitations of automatic generation of synthetic XML documents for the purpose of testing of XML applications. First of all it is necessary to analyze existing data generators and to discuss their advantages and disadvantages. The core of the work should be a proposal and implementation of own algorithm that would focus on reasonable subset of possible XML data characteristics such as, e.g., size of the document, depth, fan-out, number of elements, number of attributes, mixed contents, IDs and IDREF(S), distribution of the constructs, complexity of the constructs, textual values etc. At the same time the usage of such a system should be easy and fast. The resulting algorithm will be also (at least partly) able to deal with mutual dependencies of various parameters. The parameters can be set either manually, or extracted from a given set of XML documents, XML queries etc. The work will include suitable experimental results.
  • Literature:
    • Extensible Markup Language (XML) 1.0 (Fourth Edition). W3C Recommendation, 16 August 2006.
    • W3C Technical Reports and Publications.
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7. [in Czech]
    • XMark
    • XOO7 Benchmark
    • XMach-1
    • The Michigan Benchmark
    • XBench
    • XPathMark
    • MemBeR: XQuery Micro-Benchmark Repository
    • TPoX
    • ToXgene
    • A. Aboulnaga, J. F. Naughton, and C. Zhang. Generating Synthetic Complex-Structured XML Data. In WebDB'01: Proc. of the 4th Int. Workshop on the Web and Databases, pages 79-84, Washington, DC, USA, 2001.
    • L. Afanasiev, I. Manolescu, and P. Michiels. MemBeR XML Generator.
    • P. Azalov and F. Zlatarova. SDG - A System for Synthetic Data Generation. In ITCC'03: Proc of the Int. Conf. on Information Technology: Computers and Communications, pages 69-75, Washington, DC, USA, 2003. IEEE Computer Society.
    • Mlynkova, I. - Toman, K. - Pokorný, J.: Statistical Analysis of Real XML Data Collections. Technical report 2006/5. Charles University in Prague, Czech Republic, June 2006, 43 pages.
    • Maroš Vranec: XML Benchmarking. Master Thesis, Charles University in Prague, Czech Republic, 2008.
    • Mlynkova, I.: XML Benchmarking: Limitations and Opportunities. Technical report 2008/1. Charles University in Prague, Czech Republic, January 2008, 23 pages.
  • Author: Roman Betík
  • Status: Defended on 9.9. 2015 (mark: B)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Betik, R. - Holubova, I.: JBD Generator: Towards Semi-Structured JSON Big Data. ADBIS '16: Proceedings of the 20th East-European Conference on Advances in Databases and Information Systems, pages 54 - 62, Prague, Czech Republic, August 2016. Communications in Computer and Information Science 637, Springer, 2016. ISBN 978-3-319-44065-1. [www]

Generating of Synthetic XML Data

  • The aim of this work is a research on possibilities and limitations of automatic generating of synthetic XML documents for the purpose of testing of XML applications. First of all it is necessary to analyze existing data generators and to discuss their advantages and disadvantages. The core of the work will be a proposal and implementation of a system that will solve selected problems of the existing tools. The work will include suitable experimental results that will provide the proof of the concept.
  • Literature:
    • Extensible Markup Language (XML) 1.0 (Fourth Edition). W3C Recommendation, 16 August 2006.
    • W3C Technical Reports and Publications.
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7. [in Czech]
    • XMark
    • XOO7 Benchmark
    • XMach-1
    • The Michigan Benchmark
    • XBench
    • XPathMark
    • MemBeR: XQuery Micro-Benchmark Repository
    • TPoX
    • ToXgene
    • A. Aboulnaga, J. F. Naughton, and C. Zhang. Generating Synthetic Complex-Structured XML Data. In WebDB'01: Proc. of the 4th Int. Workshop on the Web and Databases, pages 79-84, Washington, DC, USA, 2001.
    • L. Afanasiev, I. Manolescu, and P. Michiels. MemBeR XML Generator.
    • P. Azalov and F. Zlatarova. SDG - A System for Synthetic Data Generation. In ITCC'03: Proc of the Int. Conf. on Information Technology: Computers and Communications, pages 69-75, Washington, DC, USA, 2003. IEEE Computer Society.
    • Mlynkova, I. - Toman, K. - Pokorný, J.: Statistical Analysis of Real XML Data Collections. Technical report 2006/5. Charles University in Prague, Czech Republic, June 2006, 43 pages.
    • Maroš Vranec: XML Benchmarking. Master Thesis, Charles University in Prague, Czech Republic, 2008.
    • Mlynkova, I.: XML Benchmarking: Limitations and Opportunities. Technical report 2008/1. Charles University in Prague, Czech Republic, January 2008, 23 pages.
  • Author: Dušan Rychnovský
  • Status: Defended on 26.5. 2014 (mark: A)
  • Reviewer: Marek Polák
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Rychnovsky, D. - Holubova, I.: Generating XML Data for XPath Queries. SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing - track Web Technologies, pages 724 - 731, Salamanca, Spain, April 2015. ACM Press, 2015. ISBN 978-1-4503-3196-8. [www]

Adaptive Similarity of XML Data

  • Exploitation of similarity of XML data is currently a typical optimization strategy for many related areas of their processing. However, most of the approaches suffer from the same problem caused by the fact that different similarity evaluations are suitable for different types of data. The current strategies are either fixed, or their calibration for particular type of data has to be done manually which is not an easy task.
    The aim of this work is a research on various aspects of (semi-)automatic adaptive evaluation of similarity of XML data. Firstly, it is necessary to analyze existing solutions in the area of both XML and non-XML data and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method for similarity evaluation strategy focusing on the found disadvantages and shortcomings. The work will include suitable experimental results.
  • Literature:
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7. [in Czech]
    • W3C Technical Reports and Publications.
    • Jakub Stárka. Similarity of XML Data. Master Thesis. Charles University in Prague, Czech Republic, 2010.
    • E. Rahm and P. A. Bernstein. A Survey of Approaches to Automatic Schema Matching. The VLDB Journal, 10(4):334-350, 2001.
    • H. Do, S. Melnik, and E. Rahm. Comparison of Schema Matching Evaluations. In Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems, pages 221-237, London, UK, 2003. Springer-Verlag.
  • Author: Eva Jílková
  • Status: Defended on 27.1. 2014 (mark: B)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Jilkova, E. - Polak, M. - Holubova, I.: Adaptive Similarity of XML Data. ODBASE '14: Proceedings of the 13th International Conference on Ontologies, DataBases, and Applications of Semantics, Amantea, Italy, October 2014. Springer, 2014. [www]

Analysis and Experimental Comparison of Graph Databases

  • The aim of the thesis is a research on possibilities and limitations of a new set of database systems called graph databases which belong to the set of NoSQL databases. The author will first study and describe basic characteristics of various types of NoSQL databases (key-value, document, column-family, graph, ...) in general and provide an overview of general features of graph database systems. Next, (s)he will select a representative set of existing implementations of graph databases and compare their key features. The core of the work will be an experimental comparison of the systems, accompanied with a respective extensible framework.
  • Literature:
    • http://nosql-database.org/
    • Sherif Sakr - Eric Pardede: Graph Data Management: Techniques and Applications
    • Eric Redmond - Jim R. Wilson: Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
    • Pramod J. Sadalage - Martin Fowler: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence
    • Tom White: Hadoop: The Definitive Guide
    • Eelco Plugge - Tim Hawkins - Peter Membrey: The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing
    • Shashank Tiwari: Professional NoSQL
  • Author: Vojtěch Kolomičenko
  • Status: Defended on 27.5. 2013 (mark: B)
  • Reviewer: Jaroslav Pokorny
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Kolomicenko, V. - Svoboda, M. - Holubova (Mlynkova), I.: Experimental Comparison of Graph Databases. iiWAS '13: Proceedings of the 15th International Conference on Information Integration and Web-based Applications & Services, pages 115 - 124, Vienna, Austria, December 2013. ACM Press, 2013. ISBN 978-1-4503- 2113-6. [www]

XSLT Benchmarking

  • The aim of this work is a research on possibilities and limitations of XSLT benchmarking and a proposal of an extensible XSLT benchmark system. First of all, it is necessary to describe and compare existing versions of XSLT, to analyze existing XSLT benchmarking projects and to perform a study of current XSLT applications in general. The core of the work will be a proposal and implementation of own project that will solve selected open problems and disadvantages of the current projects, cover a wide range of XSLT use cases and enable user-defined parameterization and extensibility. The work will include suitable experiments with the existing XSLT processors using the proposed benchmark.
  • Literature:
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7.
    • The Extensible Stylesheet Language Family (XSL)
    • OASIS XSLT Conformance TC
    • XSLT Processor Benchmarks
    • XSLT benchmarks
    • XSLT benchmark
    • XSLTMark
    • ToXgene
    • A. Aboulnaga, J. F. Naughton, and C. Zhang. Generating Synthetic Complex-Structured XML Data. In WebDB'01: Proc. of the 4th Int. Workshop on the Web and Databases, pages 79-84, Washington, DC, USA, 2001.
    • L. Afanasiev, I. Manolescu, and P. Michiels. MemBeR XML Generator.
    • P. Azalov and F. Zlatarova. SDG - A System for Synthetic Data Generation. In ITCC'03: Proc of the Int. Conf. on Information Technology: Computers and Communications, pages 69-75, Washington, DC, USA, 2003. IEEE Computer Society.
  • Author: Viktor Mašíček
  • Status: Defended on 3.9. 2012 (mark: A)
  • Reviewer: Jakub Lokoc
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Masicek, V. - Holubova (Mlynkova), I.: XSLTMark II - a Simple, Extensible and Portable XSLT Benchmark. ADBIS '13: Proceedings of the 17th East-European Conference on Advances in Databases and Information Systems, pages 113 - 120, Genoa, Italy, September 2013. Advances in Intelligent Systems and Computing, volume 241. Springer-Verlag, 2013. ISBN 978-3-319-01862-1. ISSN 2194-5357. [www]

Management of Undo/Redo Operations in Complex Environments

  • An undo/redo operation is currently a natural functionality of every application that interacts with a user. In the recent literature we can find several models which solve single-user, single-workspace issues relatively well. However, in cases when multiple users share the edited objects or there exist relations between multiple workspaces, the task becomes much more complex and the dependencies need to be solved carefully.
    The aim of this thesis is a research on possibilities and limitations of undo/redo management in the described complex environments. First of all it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work will be a proposal and implementation of own approach suitable for the student SW project DaemonX designed for the purpose of modelling and evolution management in various interconnected conceptual models. The work will include experimental results.
  • Literature:
    • Aaron G. Cass, Chris S. T. Fernandes, and Andrew Polidore. 2006. An empirical evaluation of undo mechanisms. In Proceedings of the 4th Nordic conference on Human-computer interaction: changing roles (NordiCHI '06). ACM, New York, NY, USA, 19-27.
    • Alan Dix, Roberta Mancini, Stefano Levialdi: Alas I am Undone - Reducing the Risk of Interaction? Short paper for HCI'96, Imperial College, London.
    • Roberta Mancini, Alan Dix, Stefano Levialdi: Reflections on Undo. Technical report 96/11. Dipartimento di Scienze dell?Informazione, Universit`a degli Studi di Roma La Sapienza, Via Salaria 113, 00198, Rome, Italy.
    • A. J. Dix (1995). Moving between contexts. Design, Specification and Verification of Interactive Systems '95, Eds. P. Palanque and R. Bastide. Toulouse, France, Springer Wien. pp. 149-173.
    • Chengzheng Sun. 2002. Undo as concurrent inverse in group editors. ACM Trans. Comput.-Hum. Interact. 9, 4 (December 2002), 309-361.
    • Marco Loregian and Marco P. Locatelli. 2008. An Experimental Analysis of Undo in Ubiquitous Computing Environments. In Proceedings of the 5th international conference on Ubiquitous Intelligence and Computing (UIC '08). Springer-Verlag, Berlin, Heidelberg, 505-519.
    • Xueyi Wang, Jiajun Bu, and Chun Chen. 2002. Achieving undo in bitmap-based collaborative graphics editing systems. In Proceedings of the 2002 ACM conference on Computer supported cooperative work (CSCW '02). ACM, New York, NY, USA, 68-76.
    • Chengzheng Sun. 2000. Undo any operation at any time in group editors. In Proceedings of the 2000 ACM conference on Computer supported cooperative work (CSCW '00). ACM, New York, NY, USA, 191-200.
    • Rajiv Choudhary and Prasun Dewan. 1995. A general multi-user undo/redo model. In Proceedings of the fourth conference on European Conference on Computer-Supported Cooperative Work (ECSCW'95). Kluwer Academic Publishers, Norwell, MA, USA, 231-246.
  • Author: Karel Jakubec
  • Status: Defended on 28.5. 2012 (mark: A)
  • Reviewer: Jakub Lokoc
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Jakubec, K. - Polak, M. - Necasky, M. - Holubova, I.: Undo/Redo Operations in Complex Environments. ANT '14: Proceedings of the 5th International Conference on Ambient Systems, Networks and Technologies, pages 561 - 570, Hasselt, Belgium, June 2014. Procedia Computer Sciences, volume 32. Elsevier, 2014. ISSN 1877-0509. [www]

Inference of an XML Schema with the Knowledge of XML Operations

  • Currently there exists a plenty of papers dealing with inference of an XML schema for given XML documents. The main aim of the approaches is to find an optimal schema which describes the structure of the data precisely and is not too complicated or artificial. For this purpose the authors exploit various metrics as well as other input data.
    The aim of this work is a research on the problem of inference of an XML schema for the given set of XML data in a situation when we are provided also with a set of related operations (XML queries, XSLT scripts etc.). Firstly, it is necessary to analyze existing inference solutions in general and to discuss their advantages and disadvantages. The core of the work is identification and discussion of information that can be extracted from a given set of XML operations and how they can be exploited to achieve more precise and realistic XML schema. The result of the work will be a proposal of own approach involving the improvements, its implementation and suitable experiments that will show its advantages.
  • Literature:
    • Extensible Markup Language (XML) 1.0 (Fourth Edition). W3C Recommendation, 16 August 2006.
    • W3C Technical Reports and Publications.
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7. [in Czech]
    • Vyhnanovská, J.: Automatic Construction of an XML Schema for a Given Set of XML Documents. Master Thesis, Charles University in Prague, Czech Republic, 2010.
    • Vošta, O.: Automatická konstrukce schématu pro množinu XML dokumentů. Master Thesis, Charles University in Prague, Czech Republic, 2005.
    • Moh, C.-H. - Lim, E.-P. - Ng, W.-K.: Re-engineering Structures from Web Documents. In DL '00: Proc. of the 5th ACM Conf. on Digital Libraries, pages 67-76, New York, NY, USA, 2000. ACM Press.
    • Garofalakis, M. - Gionis, A. - Rastogi, R. - Seshadri, S. - Shim K.: XTRACT: a System for Extracting Document Type Descriptors from XML Documents. In SIGMOD '00: Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of Data, pages 165-176, New York, NY, USA, 2000. ACM Press.
    • Ahonen, H.: Generating Grammars for Structured Documents Using Grammatical Inference Methods. Report A-1996-4, Department of Computer Science, University of Helsinki, 1996.
  • Author: Mário Mikula
  • Status: Defended on 28.5. 2012 (mark: A)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Mikula, M. - Starka, J. - Mlynkova, I.: Inference of an XML Schema with the Knowledge of XML Operations. SITIS '12: Proceedings of the 8th International Conference on Signal-Image Technology and Internet-Based Systems, pages 433 - 440, Naples, Italy, November 2012. IEEE Computer Society Press, 2012. ISBN 978-1-4673-5152-2. [www]

Inference of XML Integrity Constraints

  • Currently there exists a plenty of papers dealing with inference of XML schemas of XML documents. However, most of these approaches focus on inference of structural aspects, whereas others are often omitted. In particular, both DTD and XML Schema languages involve ID and IDREF(S) data types that specify unique identifiers and references to them. XML Schema extends this feature using unique, key and keyref constructs that have the same purpose but enable one to specify the unique/key values more precisely. In addition, its assert and report constructs enable one to express specific constraints on values using XPath language. And there are also more general integrity constraints that could be inferred, though they cannot be expressed in the existing schema specification languages so far.
    The aim of this work is a research on various aspects of the problem of (semi)automatic inference of various integrity constraints of XML data. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own approach facing selected disadvantages of the existing ones. The work will include suitable experimental results.
  • Literature:
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7. [in Czech]
    • W3C Technical Reports and Publications.
    • Object Constraint Language (OCL).
    • Mlynkova, I.: An Analysis of Approaches to XML Schema Inference. SITIS '08, Bali, Indonesia, November/December 2008. IEEE Computer Society Press, 2008
    • Fassetti. F. - Fazzinga, B.: FOX: Inference of Approximate Functional Dependencies from XML Data. In DEXA'07, pages 10-14, Washington, DC, USA, 2007. IEEE.
    • Shiu, H. - Fong, J. - Biuk-Aghai, R. P.: Reverse Engineering XML Documents Into DTD Graph With SAX. WSEAS Transactions on Computers, 5(6):1236-1241, 2006.
    • Barbosa, D. - Mendelzon, A.: Finding ID Attributes in XML Documents. Database and XML Technologies, Volume 2824, pages 180-194. Springer, 2003.
    • Yu, C. - Jagadish, H. V.: XML Schema Refinement Through Redundancy Detection and Normalization. The VLDB Journal, 17(2):203-223, 2007.
    • Opocenska, K. - Kopecky, M.: Incox - a Language for XML Integrity Constraints Description. In DATESO'08, pages 1-12. CEUR-WS.org, 2008.
    • Fan, W.: XML Constraints: Specification, Analysis, and Applications. In DEXA'05, pages 805-809, IEEE, 2005.
  • Author: Matej Vitásek
  • Status: Defended on 6.2. 2012 (mark: A)
  • Reviewer: Tomáš Knap
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Vitasek, M. - Mlynkova, I.: Inference of XML Integrity Constraints. ADBIS '12: Proceedings of the 16th East-European Conference on Advances in Databases and Information Systems, pages 285 - 296, Poznan, Poland, September 2012. Advances in Intelligent and Soft Computing, Springer-Verlag, 2012. ISBN 978-3-642-32740-7. ISSN 2194-5357. [www]

Adaptation of Relational Database Schema

  • Since most of the current applications are dynamic, sooner or later the structure of the data needs to be changed and so have to be changed also all related issues. We speak about evolution and adaptability of applications. One of the aspects of this problem is adaptation of the respective storage of the data.
    The aim of this work is a research on possibilities and limitations of adaptation of a relational database schema. First of all it is necessary to analyze the related issues (such as, e.g., adaptation of the schema, adaptation of respective queries, integration of new schema etc.) in general and to discuss their key problems and solutions. On the basis of the analysis the author will select one particular direction. The core of the work will be a proposal and implementation of own approach dealing with selected open issue(s) related to database schema evolution. The work will include experimental results.
  • Literature:
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7. [in Czech]
    • Carlo Curino, Hyun J. Moon, and Carlo Zaniolo. 2009. Automating database schema evolution in information system upgrades. In Proceedings of the 2nd International Workshop on Hot Topics in Software Upgrades (HotSWUp '09). ACM, New York, NY, USA.
    • Carlo A. Curino, Hyun J. Moon, and Carlo Zaniolo. 2008. Graceful database schema evolution: the PRISM workbench. Proc. VLDB Endow. 1, 1 (August 2008), 761-772.
    • Jay Banerjee, Won Kim, Hyoung-Joo Kim, and Henry F. Korth. 1987. Semantics and implementation of schema evolution in object-oriented databases. SIGMOD Rec. 16, 3 (December 1987), 311-322.
    • Barbara Staudt Lerner. 2000. A model for compound type changes encountered in schema evolution. ACM Trans. Database Syst. 25, 1 (March 2000), 83-127.
    • Yuan An, Xiaohua Hu, and Il-Yeol Song. 2008. Round-Trip Engineering for Maintaining Conceptual-Relational Mappings. In Proceedings of the 20th international conference on Advanced Information Systems Engineering (CAiSE '08), Springer-Verlag, Berlin, Heidelberg, 296-311.
  • Author: Martin Chytil
  • Status: Defended on 6.2. 2012 (mark: A)
  • Reviewer: Michal Kopecký
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Chytil, M. - Polak, M. - Necasky, M. - Holubova (Mlynkova), I.: Evolution of a Relational Schema and its Impact on SQL Queries. IDC '13: Proceedings of the 7th International Symposium on Intelligent Distributed Computing, pages 5 - 15, Prague, Czech Republic. Studies in Computational Intelligence, volume 511. Springer, 2013. ISBN 978-3-319-01570-5. [www]

Schematron Schema Inference

  • Currently there exist a plenty of papers dealing with inference of XML schemas of XML documents expressed either in XML Schema or DTD. However, there are other languages that enable to specify the structure of XML data in quite a different way. An example of such language is Schematron, an ISO standard based on specification of conditions the XML data should follow instead of a grammar. However, there seems to be no approach for inference of Schematron schemas.
    The aim of this work is a research on various aspects of the problem of automatic inference of an XML schema for a given set of XML documents. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method of automatic schema inference dealing with constructs of Schematron. The work will include suitable experimental results.
  • Literature:
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7.
    • W3C Technical Reports and Publications
    • Schematron
    • Mlynkova, I.: An Analysis of Approaches to XML Schema Inference. SITIS '08, Bali, Indonesia, November/December 2008. IEEE Computer Society Press, 2008.
    • Vošta, O.: Automatická konstrukce schématu pro množinu XML dokumentů. Diplomová práce, Charles University in Prague, Czech Republic, 2005.
    • Moh, C.-H. - Lim, E.-P. - Ng, W.-K.: Re-engineering Structures from Web Documents. In DL '00, pages 67-76, New York, NY, USA, 2000. ACM Press.
    • Garofalakis, M. - Gionis, A. - Rastogi, R. - Seshadri, S. - Shim K.: XTRACT: a System for Extracting Document Type Descriptors from XML Documents. In SIGMOD '00, pages 165-176, New York, NY, USA, 2000. ACM Press.
    • Ahonen, H.: Generating Grammars for Structured Documents Using Grammatical Inference Methods. Report A-1996-4, Department of Computer Science, University of Helsinki, 1996.
  • Author: Michal Kozák
  • Status: Defended on 6.2. 2012 (mark: A)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Kozak, M. - Starka, J. - Mlynkova, I.: Schematron Schema Inference. IDEAS '12: Proceedings of the 16th International Database Engineering & Applications Symposium, pages 42 - 50, Prague, Czech Republic, August 2012. ACM Press, 2012. ISBN 978-1-4503-1234-9. [www]

Efficient Detection of XML Integrity Constraints

  • An important aspect of efficient data processing is knowledge of integrity constraints that are covered in the described reality. However, similarly to the problem of schemas which are often omitted and rather assumed implicitly, also integrity constraints are not explicitly expressed in most applications, even though there exist several suitable languages and tools, such as general OCL, Incox for XML data etc.
    The aim of this work is a research on various aspects of the problem of (semi)automatic detection of various integrity constraints covered in the given set of data. In particular, it will focus on XML data. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own approach facing selected disadvantages of the existing ones. The work will include suitable experimental results.
  • Literature:
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7. [in Czech]
    • W3C Technical Reports and Publications.
    • Object Constraint Language (OCL).
    • Mlynkova, I.: An Analysis of Approaches to XML Schema Inference. SITIS '08, Bali, Indonesia, November/December 2008. IEEE Computer Society Press, 2008
    • Fassetti. F. - Fazzinga, B.: FOX: Inference of Approximate Functional Dependencies from XML Data. In DEXA'07, pages 10-14, Washington, DC, USA, 2007. IEEE.
    • Shiu, H. - Fong, J. - Biuk-Aghai, R. P.: Reverse Engineering XML Documents Into DTD Graph With SAX. WSEAS Transactions on Computers, 5(6):1236-1241, 2006.
    • Barbosa, D. - Mendelzon, A.: Finding ID Attributes in XML Documents. Database and XML Technologies, Volume 2824, pages 180-194. Springer, 2003.
    • Yu, C. - Jagadish, H. V.: XML Schema Refinement Through Redundancy Detection and Normalization. The VLDB Journal, 17(2):203-223, 2007.
    • Opocenska, K. - Kopecky, M.: Incox - a Language for XML Integrity Constraints Description. In DATESO'08, pages 1-12. CEUR-WS.org, 2008.
    • Fan, W.: XML Constraints: Specification, Analysis, and Applications. In DEXA'05, pages 805-809, IEEE, 2005.
  • Author: Michal Švirec
  • Status: Defended on 5.9. 2011 (mark: B)
  • Reviewer: Martin Svoboda
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Svirec, M. - Mlynkova, I.: Efficient Detection of XML Integrity Constraints Violation. NDT '12: Proceedings of the 4th International Symposium on Networked Digital Technologies, pages 259 - 273, Dubai, UAE, April 2012. Communications in Computer and Information Science 293, Springer-Verlag, 2012. ISBN 978-3-642-30506-1. ISSN 1865-0929. [www]

Optimization and Refinement of XML Schema Inference Approaches

  • Currently there exist several works which focus on the problem of (semi)automatic inference of XML schemas for a given set of XML documents. Even though most of the approaches focus on inference of correct and optimal regular expressions, the results they output are still quite complex and unnatural.
    The aim of this work is a research on various aspects of the problem. Firstly, it is necessary to analyze the existing solutions and compare and discuss their outputs. The core of the work is a proposal and implementation of own method focusing on optimization and refinement of existing approaches to obtain more realistic and natural schemas. For this purpose the approach can exploit, e.g., detailed analyses of the input data, user interaction, various metrics etc. The work will include suitable experimental results.
  • Literature:
    • Extensible Markup Language (XML) 1.0 (Fourth Edition). W3C Recommendation, 16 August 2006.
    • W3C Technical Reports and Publications.
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7. [in Czech]
    • Vošta, O.: Automatická konstrukce schématu pro množinu XML dokumentů. Master Thesis, Charles University in Prague, Czech Republic, 2005. [in Czech]
    • Vyhnanovská, J.: Automatic Construction of an XML Schema for a Given Set of XML Documents. Master Thesis, Charles University in Prague, Czech Republic, 2009.
    • Ahonen, H.: Generating Grammars for Structured Documents Using Grammatical Inference Methods. Report A-1996-4, Department of Computer Science, University of Helsinki, 1996.
    • Christoph Neumann. Converting deterministic finite automata to regular expressions. 2005.
    • Yo-Sub Han and Derick Wood. Obtaining shorter regular expressions from finite-state automata. Theor. Comput. Sci., 370(1-3):110?120, 2007.
  • Author: Michal Klempa
  • Status: Defended on 5.9. 2011 (mark: A)
  • Reviewer: Jakub Stárka
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Klempa, M. - Starka, J. - Mlynkova, I.: Optimization and Refinement of XML Schema Inference Approaches. ANT '12: Proceedings of the 3rd International Conference on Ambient Systems, Networks and Technologies, pages 120 - 127, Niagara Falls, Ontario, Canada, August 2012. Procedia Computer Sciences, volume 10, Elsevier, 2012. ISSN 1877-0509. [www]

XML Query Adaptation

  • Since XML has become a de-facto standard for data representation and manipulation, there exists a huge amount of applications having their data represented in XML. However, since most of applications are dynamic, sooner or later the structure of the data needs to be changed and so have to be changed also all related issues. We speak about so-called evolution and adaptability of XML applications. One of the aspects of this problem is to adapt the respective operations over the evolving XML data, in particular XML queries, expressed, e.g., in XPath or XQuery.
    The aim of this work is a research on possibilities and limitations of XML query adaptation. First of all it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work should be a proposal and implementation of own approach dealing with selected disadvantages and open issues. The proposal should involve classification of the respective queries and adaptation steps as well as discussion of possible/necessary user involvement. The work will include experimental results.
  • Literature:
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7. [in Czech]
    • W3C Technical Reports and Publications.
    • M. M. Moro, S. Malaika, L. Lim, Preserving XML queries during schema evolution, in: WWW '07: Proceedings of the 16th international conference on World Wide Web, ACM, New York, NY, USA, 2007, pp. 1341-1342.
    • P. Geneves, N. Layaida, V. Quint, Identifying query incompatibilities with evolving XML schemas, in: ICFP '09: Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, ACM, New York, NY, USA, 2009, pp. 221-230.
    • C. Curino, H. J. Moon, C. Zaniolo, Automating database schema evolution in information system upgrades, in: HotSWUp '09: Proceedings of the 2nd International Workshop on Hot Topics in Software Upgrades, ACM, New York, NY, USA, 2009, pp. 1-5.
    • Guerrini, G. - Mesiti, M.: XML Schema Evolution and Versioning: Current Approaches and Future Trends. Open and Novel Issues in XML Database Applications: Future Directions and Advanced Technologies, Idea Group Publishing, 2008/9.
  • Author: Marek Polák
  • Status: Defended on 5.9. 2011 (mark: C)
  • Reviewer: Jakub Malý
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Polak, M. - Mlynkova, I. - Pardede, E.: XML Query Adaptation as Schema Evolves. ISD '12: Proceedings of the 21st International Conference on Information Systems Development, pages 401 - 416, Prato, Italy, August 2012. Springer Science+Business Media, Inc., 2013. ISBN 978-1-4614-7539-2. [www]

Similarity of XML Data

  • A possible enhancing of XML data management tools is to store and manage similar XML data in the same or similar way, i.e. to exploit the idea of clustering. For this purpose it is necessary to propose a suitable technique, which is able to measure similarity among XML documents, XML schemes, or between the two groups.
    The aim of this work is a research on various aspects of the problem and its limitations. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method for similarity evaluation focusing on the found disadvantages and shortcomings. The work will include suitable experimental results.
  • Literature:
    • Extensible Markup Language (XML) 1.0 (Fourth Edition)
    • W3C Technical Reports and Publications
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7.
    • A. Nierman and H. V. Jagadish. Evaluating Structural Similarity in XML Documents. In Proceedings of the Fifth International Workshop on the Web and Databases - WebDB 2002, Madison, Wisconsin, USA, 2002.
    • T. Jiang, L. Wang, and K. Zhang. Alignment of Trees - An Alternative to Tree Edit. Theor. Comput. Sci., 143(1):137-148, 1995.
    • Z. Zhang, R. Li, S. Cao, and Y. Zhu. Similarity Metric for XML Documents. In Proceedings of FGWM03: Workshop on Knowledge and Experience Management, Karlsruhe, Germany, 2003.
    • E. Bertino, G. Guerrini, and M. Mesiti. A Matching Algorithm for Measuring the Structural Similarity between an XML Document and a DTD and its Applications. Inf. Syst., 29(1):23-46, 2004.
    • P. K.L. Ng and V. T.Y. Ng. Structural Similarity between XML Documents and DTDs. In Springer Berlin / Heidelberg, pages 412-421. Lecture Notes in Computer Science, 2003.
    • E. Rahm and P. A. Bernstein. A Survey of Approaches to Automatic Schema Matching. The VLDB Journal, 10(4):334-350, 2001.
    • H. Do, S. Melnik, and E. Rahm. Comparison of Schema Matching Evaluations. In Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems, pages 221-237, London, UK, 2003. Springer-Verlag.
  • Author: Jakub Stárka
  • Status: Defended on 6.9. 2010 (mark: B)
  • Reviewer: Jakub Klímek
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Starka, J. - Mlynkova, I. - Klimek, J. - Necasky, M.: Integration of Web Service Interfaces via Decision Trees. IIT '11: Proceedings of the 7th International Symposium on Innovations in Information Technology, pages 47-52, Abu Dhabi, United Arab Emirates, April 2011. IEEE Computer Society, 2011. ISBN 978-1-4577-0311-9. [www]

Processing of Incorrect XML Data

  • Since XML has become a de-facto standard for data representation and manipulation, there exists a huge amount of applications having their data represented in XML. The problem is that all such applications assume that their input is correct, i.e. the XML documents are well-formed and valid against eventually existing XML schema. But, according to analyses, almost 50% of real-world XML documents involve various errors and for further processing need to be appropriately corrected or treated specifically.
    The aim of this work is a research on possibilities and limitations of techniques for processing and/or correction of incorrect XML data. First of all it is necessary to analyze existing solutions (both theoretical and commercial) and to discuss their advantages and disadvantages. The core of the work should be a proposal and implementation of own approach dealing with the identified disadvantages. The work will include experimental results.
  • Literature:
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7.
    • W3C Technical Reports and Publications
    • HtmlCleaner
    • Bertino, E. - Guerrini, G. - Mesiti, M. - Tosetto, L.: Evolving a Set of DTDs According to a Dynamic Set of XML Documents. In EDBT '02, pages 45-66, London, UK, 2002. Springer-Verlag.
    • Guerrini, G. - Mesiti, M. - Sorrenti, M. A.: XML Schema Evolution: Incremental Validation and Efficient Document Adaptation. In XSym '07, pages 92-106, Berlin, Heidelberg, 2007. Springer.
    • Rahm, E. - Bernstein, P. A.: An Online Bibliography on Schema Evolution. SIGMOD Rec., 35(4):30-31, 2006.
  • Author: Martin Svoboda
  • Status: Defended on 6.9. 2010 (mark: A)
  • Reviewer: Martin Nečaský
  • Text: [pdf]
  • Notes:
    • The Dean's Award for the Best Master Thesis
    • The results of the thesis have been published in:
      Svoboda, M. - Mlynkova, I.: Correction of Invalid XML Documents with Respect to Single Type Tree Grammars. NDT '11: Proceedings of the 3rd International Symposium on Networked Digital Technologies, pages 179 - 194, Macau, China, July 2011. Communications in Computer and Information Science 136, Springer-Verlag, 2011. ISBN 978-3-642-22185-9. ISSN 1865-0929. [www]

XML Schema Evolution

  • Since XML has become a de-facto standard for data representation and manipulation, there exists a huge amount of applications having their data represented in XML. However, since most of applications are dynamic, sooner or later the structure of the data needs to be changed, whereas we still need to be able to work with the old as well as new data. We speak about so-called schema evolution, i.e. encountering the situation that XML schema of the data is updated and we need to apply these updates on the respective XML documents or even XML operations (queries) so that they become valid and relevant again.
    The aim of this work is a research on possibilities and limitations of techniques for XML schema evolution. First of all it is necessary to analyze existing solutions (both theoretical and commercial) and to discuss their advantages and disadvantages. The core of the work should be a proposal and implementation of own approach dealing with the identified disadvantages. The work will include experimental results.
  • Literature:
    • Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7.
    • W3C Technical Reports and Publications
    • Guerrini, G. - Mesiti, M.: XML Schema Evolution and Versioning: Current Approaches and Future Trends. Open and Novel Issues in XML Database Applications: Future Directions and Advanced Technologies, Idea Group Publishing, 2008/9.
    • Guerrini, G. - Mesiti, M. - Sorrenti, M. A.: XML Schema Evolution: Incremental Validation and Efficient Document Adaptation. In XSym '07, pages 92-106, Berlin, Heidelberg, 2007. Springer.
    • Su, H. - Kramer, D. K. - Rundensteiner, E. A.: XEM: XML Evolution Management. Technical Report WPI-CS-TR-02-09, Computer Science Department, Worcester Polytechnnic Institute, Worcester, Massachusetts, 2002.
    • Fiedler, G. - Thalheim, B.: An Approach to Conceptual Schema Evolution. Technical Report 0701, Institut fur Informatik der Christian-Albrechts-Universitat, Kiel, 2007.
    • Klettke, M.: Conceptual XML Schema Evolution - the CoDEX Approach for Design and Redesign. In BTW Workshops, pages 53-63. Verlagshaus Mainz, Aachen, 2007.
    • Moro, M. M. - Malaika, S. - Lim, L.: Preserving XML Queries During Schema Evolution. In WWW'07, pages 1341 - 1342, ACM, 2007.
    • Rahm, E. - Bernstein, P. A.: An Online Bibliography on Schema Evolution. SIGMOD Rec., 35(4):30-31, 2006.
  • Author: Jakub Malý
  • Status: Defended on 24.5. 2009 (mark: A)
  • Reviewer: Jakub Klímek
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    • Maly, J. - Necasky, M. - Mlynkova, I.: Efficient adaptation of XML data using a conceptual model. Information Systems Frontiers, pages 1 - 34. Springer Science+Business Media, 2012. ISSN 1387-3326. [IF: 0.912, 5-Year IF: 1.074] [www]
    • Maly, J. - Mlynkova, I. - Necasky, M.: XML Data Transformations as Schema Evolves. ADBIS '11: Proceedings of the 15th International Conference on Advances in Databases and Information Systems, pages 375 - 388, Vienna, Austria, September 2011. Lecture Notes in Computer Science 6909, Springer-Verlag, 2011. ISBN 978-3-642-23736-2. ISSN 0302-9743. [www]

Adaptability in XML-to-Relational Mapping Strategies

  • One of the ways how to manage XML documents is to exploit tools and functions offered by (object-)relational database systems. The key aim of such techniques is to find the optimal mapping strategy, i.e. the way the XML data are stored into relations. Currently the most efficient approaches, so-called adaptive methods, search a space of possible mappings and choose the one which suits the given sample data and query workload the most. Since the space of the possibilities is theoretically infinite, the methods exploit various approximations, heuristics, terminal conditions etc., and in fact search for a suboptimal solution.
    The aim of this work is a research on various aspects of the problem and its limitations. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method of adaptive XML-to-relational mapping focusing on the found disadvantages and shortcomings. One of the possible ways can be exploitation of a general heuristic method called ant colony optimization. The work will include suitable experimental results.
  • Literature:
    • Extensible Markup Language (XML) 1.0 (Fourth Edition)
    • W3C Technical Reports and Publications
    • Mlynkova, I. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: XML Technologies. Textbook. Charles University in Prague, Czech Republic, September 2006. [in Czech]
    • Mlynkova, I. - Pokorny, J.: Adaptability of Methods for Processing XML Data using Relational Databases - the State of the Art and Open Problems. Technical report 2006/9. Charles University in Prague, Czech Republic, October 2006, 26 pages.
    • Klettke, M. - Meyer, H.: XML and Object-Relational Database Systems - Enhancing Structural Mappings Based on Statistics. In Lecture Notes in Computer Science, volume 1997, pages 151-170, 2000.
    • Ramanath, M. - Freire, J. - Haritsa, J. - Roy, P.: Searching for Efficient XML-to-Relational Mappings. In XSym 2003: Proceedings of the 1st International XML Database Symposium, volume 2824, pages 19-36, Berlin, Germany, 2003. Springer.
    • Xiao-ling, W. - Jin-feng, L. - Yi-sheng, D.: An Adaptable and Adjustable Mapping from XML Data to Tables in RDB. In Proceedings of the VLDB 2002 Workshop EEXTT and CAiSE 2002 Workshop DTWeb, pages 117-130, London, UK, 2003. Springer-Verlag.
    • Zheng, S. - Wen, J. - Lu, H.: Cost-Driven Storage Schema Selection for XML. In DASFAA 2003: Proceedings of the 8th International Conference on Database Systems for Advanced Applications, pages 337-344, Kyoto, Japan, 2003. IEEE Computer Society.
    • Dorigo, M. - Birattari, M. - Stutzle, T.: Ant Colony Optimization - Artificial Ants as a Computational Intelligence Technique. Technical Report No. TR/IRIDIA/2006-023, IRIDIA, Bruxelles, Belgium, September 2006.
  • Author: Luboš Kulič
  • Status: Defended on 25.5. 2009 (mark: A)
  • Reviewer: Martin Nečaský
  • Text: [pdf]
  • Note:
    • 3rd place in competition Master Thesis of 2009
    • The results of the thesis have been published in:
      Kulič L.: Adaptability in XML-to-Relational Mapping Strategies. SAC '10: Proceedings of the 25th Symposium On Applied Computing, track Database Theory, Technology, and Applications, pages 1674 - 1679, Sierre, Switzerland, March 2010. ACM Press, 2010. ISBN 978-1-60558-639-7. [www]

Automatic Construction of an XML Schema for a Given Set of XML Documents

  • Statistical analyses of real-world XML data show that a significant portion of XML documents do not have an appropriate XML schema. And even if they have, the XML Schema language is exploited even less. It is probably caused by the fact that manual construction of an XML schema is not an easy task and that the XML Schema language is relatively complex.
    The aim of this work is a research on various aspects of the problem of automatic construction of an XML schema for a given set of XML documents. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method of automatic construction focusing on the found disadvantages and shortcomings. A possible approach can focus on new XML Schema constructs such as, e.g., inheritance, global and local items, groups of elements and attributes, etc. in combination with user interaction. The work will include suitable experimental results.
  • Literature:
    • Extensible Markup Language (XML) 1.0 (Fourth Edition)
    • W3C Technical Reports and Publications
    • Mlynkova, I. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: XML Technologies. Textbook. Charles University in Prague, Czech Republic, September 2006. [in Czech]
    • Vosta, O.: Automatic Construction of an XML Schema for a Given Set of XML Documents. Master Thesis, Charles University in Prague, Czech Republic, 2005. [in Czech]
    • Moh, C.-H. - Lim, E.-P. - Ng, W.-K.: Re-engineering Structures from Web Documents. In DL '00: Proc. of the 5th ACM Conf. on Digital Libraries, pages 67-76, New York, NY, USA, 2000. ACM Press.
    • Garofalakis, M. - Gionis, A. - Rastogi, R. - Seshadri, S. - Shim K.: XTRACT: a System for Extracting Document Type Descriptors from XML Documents. In SIGMOD '00: Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of Data, pages 165-176, New York, NY, USA, 2000. ACM Press.
    • Ahonen, H.: Generating Grammars for Structured Documents Using Grammatical Inference Methods. Report A-1996-4, Department of Computer Science, University of Helsinki, 1996.
  • Author: Julie Vyhnanovská
  • Status: Defended on 25.5. 2009 (mark: A)
  • Reviewer: Martin Nečaský
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Vyhnanovska, J. - Mlynkova, I.: Interactive Inference of XML Schemas. RCIS '10: Proceedings of the 4th International Conference on Research Challenges in Information Science, pages 191 - 202, Nice, France, May 2010. IEEE Computer Society Press, 2010. ISBN 978-1-4244-4840-1. [www]

XML Benchmarking

  • The main aim of this work is a research on possibilities and limitations of XML benchmarking projects that enable to test performance of XML processing techniques. First of all it is necessary to analyze existing solutions and projects and to discuss their advantages and disadvantages. The core of the work should be a proposal and implementation of own benchmarking project which can focus e.g. on statistically frequent XML patterns. The work will include suitable experimental results.
  • Literature:
  • Author: Maroš Vranec
  • Status: Defended on 24.9. 2008 (mark: A)
  • Reviewer: Jan Ulrych
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    Vranec, M. - Mlynkova, I.: FlexBench: A Flexible XML Query Benchmark. DASFAA '09: Proceedings of the 14th International Conference on Database Systems for Advance Applications, pages 421 - 436, Brisbane, Australia, April 2009. Springer-Verlag, 2009. ISBN 978-3-642-00886-3. [www]

Comparison of Fully Software and Hardware Accelerated XML Processing

  • Since XML technologies are currently exploited as a de-facto standard in numerous spheres of human activities, the demand for efficient XML processing is increasing every day. A natural approach to its optimization is exploitation of a hardware designed particularly for the purpose of XML processing. But this solution brings several open questions such as, e.g., what is the current state of the art of such appliances, in which situations is this solution suitable or what are the related advantages and disadvantages.
    The aim of this work is to compare and analyze features of standard fully software and hardware accelerated solutions for XML processing from the point of view of currently supported technologies, efficiency, price and other relevant aspects. Firstly, it is necessary to select an appropriate appliance and describe its XML processing features. The core of the work is specification of appropriate testing scenarios which will focus on key features of the appliance and analysis of the related results. The second part of the thesis will focus on selecting suitable fully software implementations of the supported XML technologies and comparison of their features and efficiency with the hardware accelerated solution.
  • Literature:
  • Author: Tomáš Knap
  • Status: Defended on 25.9. 2008 (mark: A)
  • Reviewer: Martin Nečaský
  • Text: [pdf]
  • Notes:
    • Finalist of competition IT Master Thesis of 2009
    • The results of the thesis have been published in:
      Knap, T. - Mlynkova, I. - Necasky, M.: Performance of Fully Software and Hardware Accelerated XML Processing and Securing. Innovations '08: Proceedings of the 5th International Conference on Innovations in Information Technology, pages 64 - 68, Al Ain, United Arab Emirates, December 2008. IEEE Computer Society Press, 2008. ISBN 978-1-4244-3397-1. [www]

Similarity of XML Data

  • A possible enhancing of XML data management tools is to store and manage similar XML documents in the same or similar way, i.e. to exploit the idea of clustering. For this purpose it is necessary to propose a suitable technique, which is able to measure similarity among XML documents, XML schemes, or between the two groups.
    The aim of this work is a research on various aspects of the problem and its limitations. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method for similarity evaluation focusing on the found disadvantages and shortcomings. The work will include suitable experimental results.
  • Literature:
    • Extensible Markup Language (XML) 1.0 (Fourth Edition)
    • W3C Technical Reports and Publications
    • Mlynkova, I. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: XML Technologies. Textbook. Charles University in Prague, Czech Republic, September 2006. [in Czech]
    • A. Nierman and H. V. Jagadish. Evaluating Structural Similarity in XML Documents. In Proceedings of the Fifth International Workshop on the Web and Databases - WebDB 2002, Madison, Wisconsin, USA, 2002.
    • T. Jiang, L. Wang, and K. Zhang. Alignment of Trees - An Alternative to Tree Edit. Theor. Comput. Sci., 143(1):137-148, 1995.
    • Z. Zhang, R. Li, S. Cao, and Y. Zhu. Similarity Metric for XML Documents. In Proceedings of FGWM03: Workshop on Knowledge and Experience Management, Karlsruhe, Germany, 2003.
    • E. Bertino, G. Guerrini, and M. Mesiti. A Matching Algorithm for Measuring the Structural Similarity between an XML Document and a DTD and its Applications. Inf. Syst., 29(1):23-46, 2004.
    • P. K.L. Ng and V. T.Y. Ng. Structural Similarity between XML Documents and DTDs. In Springer Berlin / Heidelberg, pages 412-421. Lecture Notes in Computer Science, 2003.
    • E. Rahm and P. A. Bernstein. A Survey of Approaches to Automatic Schema Matching. The VLDB Journal, 10(4):334-350, 2001.
    • H. Do, S. Melnik, and E. Rahm. Comparison of Schema Matching Evaluations. In Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems, pages 221-237, London, UK, 2003. Springer-Verlag.
  • Author: Aleš Wojnar
  • Status: Defended on 26.5. 2008 (mark: A)
  • Reviewer: Martin Nečaský
  • Text: [pdf]
  • Note: The results of the thesis have been published in:
    • Wojnar, A. - Mlynkova, I. - Dokulil, J.: Similarity of DTDs Based on Edit Distance and Semantics. IDC '08: Proceedings of the 2nd International Symposium on Intelligent Distributed Computing, pages 207 - 216, Catania, Italy, September 2008. Springer-Verlag, 2008. ISBN 978-3-540-85256-8. ISSN 1860-949X. [www]
    • Wojnar, A. - Mlynkova, I. - Dokulil, J.: Structural and Semantic Aspects of Similarity of Document Type Definitions and XML Schemas. Special Issue on Intelligent Distributed Information Systems of the International Journal on Information Sciences, volume 180, issue 10, pages 1817 - 1836. Elsevier, May 2010. ISSN 0020-0255. [www]

Automatic Construction of an XML Schema for a Given Set of XML Documents

  • Author: Ondřej Vošta
  • Status: Defended on 6.2. 2006 (mark: A)
  • Reviewer: Kamil Toman
  • Text (in Czech): [pdf]
  • Note: The results of the thesis have been published in:
    Vosta, O. - Mlynkova, I. - Pokorny, J.: Even an Ant Can Create an XSD. Proceedings of the 13th International Conference on Database Systems for Advance Applications, pages 35 - 50, New Delhi, India, March 2008. Lecture Notes in Computer Science 4947, Springer-Verlag, 2008. ISBN 978-3-540-78567-5. ISSN 0302-9743. [www]