Influence functions and Data Pruning: from theory to non-convergence

15 Jun 23 11:00 UTC

Large opaque models like neural networks require dedicated methods to study and interpret their behavior. In this series we review recent developments, analysing their relevance for business applications.

Fabio Peruzzo (appliedAI Initiative)

After receiving his PhD in Applied Mathematics with a specialization in complex systems, Fabio was drawn to the potential of the burgeoning field of machine learning. This interest led him to transition into AI engineering. His professional journey includes roles at several startups in the UK. As of 2021, he has been contributing his expertise to the appliedAI Transferlab team. His primary areas of research encompass data valuation, interpretable AI, and reinforcement learning.

Today’s session brings Influence Functions under the spotlight - the theory, non-convergence issues, and uses for data pruning. Fabio will uncover the fragile nature of influence functions in deep learning, helping us understand what neural networks memorize, and exploring the possibility of beating power law scaling of model performance with dataset size.

References

[Koh17U]

Understanding Black-box Predictions via Influence Functions, Pang Wei Koh, Percy Liang.

Jul 2017

How can we explain the predictions of a black-box model? In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a …

[Bas20I]

Influence Functions in Deep Learning Are Fragile, Samyadeep Basu, Phil Pope, Soheil Feizi.

Sep 2020

Influence functions approximate the effect of training samples in test-time predictions and have a wide variety of applications in machine learning interpretability and uncertainty estimation. A...

[Fel20W]

What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation, Vitaly Feldman, Chiyuan Zhang.

2020

Deep learning algorithms are well-known to have a propensity for fitting the training data very well and often fit even outliers and mislabeled data points. Such fitting requires memorization of training data labels, a phenomenon that has attracted significant research interest but has not been given a compelling explanation so far. A recent work of Feldman (2019) proposes a theoretical …

[Sor22N]

Beyond neural scaling laws: beating power law scaling via data pruning, Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos.

Oct 2022

Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show both in theory and practice that we can break …

References

In this series →