MM-quecat is a member of the multi-model data modeling and management tool family. Namely, the tool enables one to query multi-model data regardless of the underlying multi-model database or polystore.
Using category theory, we provide a unified abstract representation of multi-model data, which can be viewed as a graph and, thus, queried using a SPARQL-based query language.
For the purpose of demonstration of key contributions, MM-quecat supports unified querying over four database systems chosen to cover most of the features of different approaches to querying (multi-model) data. In particular:
We expect that the user has already created a categorical schema for the multi-model data, either manually using the MM-evocat or by extracting the schema from the data using the MM-infer. Hence, in the demo of MM-quecat, we utilize the existing categorical schema and we gradually go through the process of querying of multi-model data utilizing proposed multi-model query language MMQL and we demonstrate all interesting cases.
In particular, the following steps will be taken during the demonstration:
The first step is to create a query over the multi-model data. The user queries over the data structure, which we describe using a categorical schema. A skilled user can enter the query using the MMQL text notation. Alternatively, we also introduce a graphical notation of the query language for less technically skilled users, who can access the data using a manual graph exploration.
Subsequently, each query is decomposed according to the mapping of the schema to the underlying database systems. It may happen that a user has multiple strategies to choose from to execute a query, e.g., when the same data are redundantly stored in multiple underlying systems. Moreover, all query evaluation strategies are price tagged (i.e., a plan cost for each strategy exists).
In this particular case, the cost-effective strategy is to decompose the query into two query parts, namely over Neo4j and MongoDB systems. The former performs the projection of customer names and identifiers of the orders they have ordered, except for customers named "Peter". The latter returns all titles ordered in a particular order.
The alternative plan is to decompose the query into three query parts over PostgreSQL, Apache Cassandra and MongoDB. Customer names except for name "Peter" along with their identifier are retrieved from PostgreSQL, ordered products along with the order identifier are retrieved from MongoDB, and the connection of customers to ordered products is realized using the Customer-Orders-Order relationship stored in Apache Cassandra. Note that this plan is more expensive and hence is not marked as the default plan.
Intermediate results are joined in a unified categorical representation and the query evaluation is finalized. Afterwards, the user can select the desired logical representation of the result, or the result can be returned in a unified representation. Currently, we support mapping the query result to JSON, RDF formats.
In this case, the query result is transformed into JSON format, and the structure of the JSON document matches the structure of the result in the SELECT clause of the MMQL expression. However, we are working on an extension that would allow the user to create a mapping of the query result to more than one logical representation.
Department of Software Engineering
Faculty of Mathematics and Physics
Charles University
Malostranské náměstí 25
118 00 Prague
Czech Republic
irena.holubova@matfyz.cuni.cz
pavel.koupil@matfyz.cuni.cz