Comparative Evaluation of Unsupervised Machine Learning Algorithms for Anomaly Detection in Time Series Data

Asela, H.

Comparative Evaluation of Unsupervised Machine Learning Algorithms for Anomaly Detection in Time Series Data

dc.contributor.author	Asela, H.
dc.date.accessioned	2022-08-31T06:03:32Z
dc.date.available	2022-08-31T06:03:32Z
dc.date.issued	2021
dc.description.abstract	Anomaly detection is a mechanism of identifying data, occurrences and observations deviating from the normal pattern. Anomaly detection in time series data is used to identify critical and fraudulent events, technical glitches and potential opportunities in the systems. Hence it is important to build robust models that can properly identify anomalies in time series data. In the literature, Anomaly detection is done using supervised, unsupervised and hybrid machine learning algorithms. However, most researches have focused on unsupervised algorithms to build anomaly detection models due to unavailability of labelled data. These unsupervised algorithms are based on probability, distance, density or a boundary function. This study provides a comparative evaluation of mutltiple unsupervised algorithms for anomaly detection in time series data, namely Elliptic Envelope, Gaussian Mixture Model, Isolation Forest, Local Outlier Factor, One Class Support Vector Machine and K-Means Clustering algorithm. Based on previous literature, these algortihms were selected as a famously used subset of algorithms for multi-domain anomaly detection. The algorithms were evaluated using Yahoo! Webscope S5 labeled dataset. This dataset contains real and synthetic time series data in 4 classes with overall 572,966 data instances and 367 metrics. Feature extraction was done using time series decomposition and statistical techniques. These extracted features were integrated with specific features given in the data classes to improve the performance of these algorithms. The feature normalization was done using min-max scale. Elliptic Envelope and Gaussian Mixture Model were the best performing algorithms with 26.3% - 81.7% F1 score, 26.4% - 82.7% true positive rate and below 2% false alarm rate for the 4 data classes in the dataset. The reason for this is the ability of probabilistic models to adapt and identify the complex patterns in time series data that helps to identify deviations in a more robust way. One Class Support Vector Machine is the worst performing algorithm with 1.2% - 6.5% F1 score and around 50% false alarm rate for the data classes in the dataset as its decision function was unable to properly adapt to the complex patterns in time series data. However, it had 96.2% - 99.5% true positive rate. Other algorithms performed moderately where Isolation Forest performed best in the high contamination data class Keywords: Anomaly detection; Time series data; Unsupervised machine learning algorithms; Comparative evaluation	en_US
dc.identifier.isbn	978-624-5856-04-6
dc.identifier.uri	http://www.erepo.lib.uwu.ac.lk/bitstream/handle/123456789/9574/Page%20109%20-%20IRCUWU2021-127%20-Asela-Comparative%20Evaluation%20of%20Unsupervised%20Machine%20Learning%20Algorithms%20for%20Anomaly.pdf?sequence=1&isAllowed=y
dc.language.iso	en	en_US
dc.publisher	Uva Wellassa University of Sri Lanka	en_US
dc.subject	Electrical and Information Engineering	en_US
dc.subject	Computing and Information Science	en_US
dc.subject	Environment Science	en_US
dc.title	Comparative Evaluation of Unsupervised Machine Learning Algorithms for Anomaly Detection in Time Series Data	en_US
dc.title.alternative	International Research Conference 2021	en_US
dc.type	Other	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Page 109 - IRCUWU2021-127 -Asela-Comparative Evaluation of Unsupervised Machine Learning Algorithms for Anomaly.pdf
Size:: 147.97 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

International Research Conference of UWU-2021