Informed decision-making based on classifiers requires that the confidence in their predictions reflect the actual error rates. When this happens, one speaks of a calibrated model. Recent work showed that expressive neural networks are able to overfit the cross-entropy loss without losing accuracy, thus producing overconfident (i.e. miscalibrated) models. We analyse several definitions of calibration and the relationships between them, look into related empirical measures and their usefulness, and explore several algorithms to improve calibration.

The calibrated classifier

# References

[Ben19C]

Calibration for Anomaly Detection,

[Daw82W]

The Well-Calibrated Bayesian,

[Fer19S]

Setting decision thresholds when operating conditions are uncertain,

[Guo17C]

On Calibration of Modern Neural Networks,

[Kul19T]

Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration,

[Kum18T]

Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings,

[Lak17S]

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles,

[Lin17F]

Focal Loss for Dense Object Detection,

[Muk20C]

Calibrating deep neural networks using focal loss,

[Nic05P]

Predicting good probabilities with supervised learning,

[Per17R]

Regularizing Neural Networks by Penalizing Confident Output Distributions,

[Pla99P]

Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods,

[Rag18B]

Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters,

[Wid19C]

Calibration tests in multi-class classification: A unifying framework,