- Session 4: Unsupervised, Semi-supervised Learning, Reinforcement Learning -- Day 3 (Nov.19), talks: 10:50-11:30 (5th floor Hall 2), poster session: 11:30-14:00
- Poster number: Tue23
- Download paper
Guillaume Staerman (Télécom Paris); Pavlo Mozharovskyi (Télécom Paristech); Stéphan Clémençon (Télécom ParisTech); Florence d’Alche-Buc (Télécom ParisTech)
For the purpose of monitoring the behavior of complex infrastructures (e.g. aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional nature raises many challenging methodological questions. The primary goal of this paper is to extend the popularIsolation Forest (IF) approach to Anomaly Detection, originally dedicated to finite dimensional observations, to functional data. The major difficulty lies in the wide variety of topological structures that may equip a space of functions and the great variety of patterns that may characterize abnormal curves. We address the issue of (randomly) splitting the functional space in a flexible manner in order to isolate progressively any trajectory from the others, a key ingredient to the efficiency of the algorithm. Beyond a detailed description of the algorithm, computational complexity and stability issues are investigated at length. From the scoring function measuring the degree of abnormality of an observation provided by the proposed variant of the IF algorithm, a Functional Statistical Depth function is defined and discussed as well as a multivariate functional extension. Numerical experimentsprovide strong empirical evidence of the accuracy of the extension proposed.