Advisor to identify genuine functional dependencies
Parent project: MM-Infer
Currently, there are a number of tools for detecting functional dependencies in relational data (e.g. Metanome). The approaches on which these tools are based are usually optimized for small data samples, which may lead to the detection of functional dependencies that are only valid on a given sample by chance. In a real and larger dataset, these functional dependencies may not be valid.
The goal of this research project is to implement a tool that will not only detect functional dependencies in the data, but more importantly focus on eliminating spurious functional dependencies that are only valid in a small sample of the data. A key component of the tool will be the use of so-called negative examples - data records that purposely violate the detected functional dependencies but still correspond to potentially real data. The goal is to keep the number of these negative examples as small as possible, yet eliminate false functional dependencies as efficiently as possible.
Furthermore, interaction with domain experts (e.g., crowdsourcing) can play an important role in assessing whether the proposed negative examples correspond to real data values without accidentally disturbing the actual functional dependencies valid in the domain.
The research project can be carried out together with the follow-up thesis "Identification of genuine functional dependencies".
Contact: Pavel Koupil