Comparative Evaluation of Unsupervised Machine Learning Algorithms for Anomaly Detection in Time Series Data

dc.contributor.authorAsela, H.
dc.date.accessioned2022-08-31T06:03:32Z
dc.date.available2022-08-31T06:03:32Z
dc.date.issued2021
dc.description.abstractAnomaly detection is a mechanism of identifying data, occurrences and observations deviating from the normal pattern. Anomaly detection in time series data is used to identify critical and fraudulent events, technical glitches and potential opportunities in the systems. Hence it is important to build robust models that can properly identify anomalies in time series data. In the literature, Anomaly detection is done using supervised, unsupervised and hybrid machine learning algorithms. However, most researches have focused on unsupervised algorithms to build anomaly detection models due to unavailability of labelled data. These unsupervised algorithms are based on probability, distance, density or a boundary function. This study provides a comparative evaluation of mutltiple unsupervised algorithms for anomaly detection in time series data, namely Elliptic Envelope, Gaussian Mixture Model, Isolation Forest, Local Outlier Factor, One Class Support Vector Machine and K-Means Clustering algorithm. Based on previous literature, these algortihms were selected as a famously used subset of algorithms for multi-domain anomaly detection. The algorithms were evaluated using Yahoo! Webscope S5 labeled dataset. This dataset contains real and synthetic time series data in 4 classes with overall 572,966 data instances and 367 metrics. Feature extraction was done using time series decomposition and statistical techniques. These extracted features were integrated with specific features given in the data classes to improve the performance of these algorithms. The feature normalization was done using min-max scale. Elliptic Envelope and Gaussian Mixture Model were the best performing algorithms with 26.3% - 81.7% F1 score, 26.4% - 82.7% true positive rate and below 2% false alarm rate for the 4 data classes in the dataset. The reason for this is the ability of probabilistic models to adapt and identify the complex patterns in time series data that helps to identify deviations in a more robust way. One Class Support Vector Machine is the worst performing algorithm with 1.2% - 6.5% F1 score and around 50% false alarm rate for the data classes in the dataset as its decision function was unable to properly adapt to the complex patterns in time series data. However, it had 96.2% - 99.5% true positive rate. Other algorithms performed moderately where Isolation Forest performed best in the high contamination data class Keywords: Anomaly detection; Time series data; Unsupervised machine learning algorithms; Comparative evaluationen_US
dc.identifier.isbn978-624-5856-04-6
dc.identifier.urihttp://www.erepo.lib.uwu.ac.lk/bitstream/handle/123456789/9574/Page%20109%20-%20IRCUWU2021-127%20-Asela-Comparative%20Evaluation%20of%20Unsupervised%20Machine%20Learning%20Algorithms%20for%20Anomaly.pdf?sequence=1&isAllowed=y
dc.language.isoenen_US
dc.publisherUva Wellassa University of Sri Lankaen_US
dc.subjectElectrical and Information Engineeringen_US
dc.subjectComputing and Information Scienceen_US
dc.subjectEnvironment Scienceen_US
dc.titleComparative Evaluation of Unsupervised Machine Learning Algorithms for Anomaly Detection in Time Series Dataen_US
dc.title.alternativeInternational Research Conference 2021en_US
dc.typeOtheren_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Page 109 - IRCUWU2021-127 -Asela-Comparative Evaluation of Unsupervised Machine Learning Algorithms for Anomaly.pdf
Size:
147.97 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: