Kyle is a python library that contains utilities for measuring and visualizing calibration of probabilistic classifiers as well as for recalibrating them. Currently, only methods for recalibration through post-processing are supported, although we plan to include calibration specific training algorithms as well in the future.
Kyle is model agnostic, any probabilistic classifier can be wrapped in a thin
wrapper called CalibratableModel
which supports multiple calibration
algorithms.
Apart from tools for analysing models, kyle also offers support for developing and testing custom calibration metrics, algorithms and decision processes. In order not to have to rely on evaluation data sets and trained models for delivering labels and confidence vectors, kyle can construct custom samplers based on fake classifiers. These samplers can also be fit on an arbitrary data set (the outputs of a classifier together with the labels) in case one wants to mimic a real classifier using a fake one.
Using fake classifiers, an arbitrary number of ground truth labels and miscalibrated confidence vectors can be generated to streamline the analysis of calibration related algorithms (common use cases are e.g. analysis of variance and bias of calibration metrics and sensitivity of decision processes to miscalibration).