Skip to content

High data quality is a prerequisite for accurate data analysis. However, data inconsistencies often arise in real data, leading to un-trusted decision making downstream in the data analysis pipeline. We studied the problem of inconsistency detection and repair of the Ontology Multi-dimensional Data Model (OMD). We proposed a framework of data qu…

ehmoni/Datalog-Weighted-Repairs

Repository files navigation

Datalog-Weighted-Repairs

High data quality is a prerequisite for accurate data analysis. However, data inconsistencies often arise in real data, leading to un-trusted decision making downstream in the data analysis pipeline. We studied the problem of inconsistency detection and repair of the Ontology Multi-dimensional Data Model (OMD). We proposed a framework of data quality assessment, and repair for the OMD. We formally define a weight-based repair-by-deletion semantics, and present an automatic weight generation mechanism to consider the multiple criteria often needed in computing objective and accurate weights. Our presented methods are rooted in multi-criteria decision making that provide the benefits to consider the correlation, contrast, and conflict that may exist among multiple criteria, and is often needed in the data cleaning domain. Then we implemented minimal repair generation by pyDatalog and finally used genetic algorithm for scalability.

Note: Though this work is open source and online but if you have to use anything from here for any of your works, you have to take permission from the author (If not get within 1 week of asking then you can use it mentioning the name url and proper credit.) For any research work if you use this code, you will have to mention my name as part of the author).

About

High data quality is a prerequisite for accurate data analysis. However, data inconsistencies often arise in real data, leading to un-trusted decision making downstream in the data analysis pipeline. We studied the problem of inconsistency detection and repair of the Ontology Multi-dimensional Data Model (OMD). We proposed a framework of data qu…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published