Explanations for Data Repair Through Shapley Values | TransferLab

Reference

Explanations for Data Repair Through Shapley Values, Daniel Deutch, Nave Frost, Amir Gilad, Oren Sheffer. Proceedings of the 30th ACM International Conference on Information & Knowledge Management(2021)

Publication

Abstract

Data repair, i.e., the identification and fix of errors in the data, is a central component of the Data Science cycle. As such, significant research effort has been devoted to automate the repair process. Yet it still requires significant manual labor by the Data Scientists, tweaking and optimizing repair modules (up to 80% of their time, according to surveys). To this end, we propose in this paper a novel framework for explaining the results of any data repair module. Explanations involve identifying the table cells and database constraints having the strongest influence on the process. Influence, in turn, is quantified through the game-theoretic notion of Shapley values, commonly used for explaining Machine Learning classifier results. The main technical challenge is that exact computation of Shapley values incurs exponential time. We consequently devise and optimize novel approximation algorithms, and analyze them both theoretically and empirically. Our results show the efficiency of our approach when compared to the alternative of adapting existing Shapley value computation techniques to the data repair settings.

Content citing this item

Blog

Applications of data valuation in machine learning

At TransferLab we have extensively covered existing and developing methods for Data valuation, the task of attributing value to samples in a …

Data Valuation

Nov 20, 2023

All works referenced in our site...