Isolation Forests: The good, the bad and the ugly

02 Apr 20 00:00 UTC

Many industrial applications of automated decision-making involve the detection of anomalous behaviour. Precisely defining what this means is a complex problem hindered by strong class imbalances, lack of useful models or changing …

Faried firmly believes that understanding the mathematical foundations of Artificial Intelligence is not only important for anyone looking to create reliable AI applications, but also that they are fun and interesting when communicated in an intuitive way alongside real world examples. This conviction was shaped by his journey, which led him from researching the mathematical foundations of computer science to consulting in data science and finally to applied AI research.

Anomaly detection is one of the main methods behind numerous real life machine learning use cases such as predictive maintenance, network intrusion detection, system health monitoring, fraud detection and novelty detection. Because of the high relevance and sensitivity of many application areas, robustness and reliability are main concerns when designing anomaly detection systems. A good understanding of the mathematical principles behind the algorithms that are used in practice is therefore highly desirable.

In this talk we will investigate a simple algorithm called isolation forest which has gained large popularity over the last decade. Despite of its success, the reasons for the good performance of isolation forest are currently only partially understood. We review some of the recent literature which shows strength and weaknesses of the algorithm and conclude with a few observations which might lead to further research directions.