Treffer: Detecting Cyber Threats in UWF-ZeekDataFall22 Using K-Means Clustering in the Big Data Environment.

Title:
Detecting Cyber Threats in UWF-ZeekDataFall22 Using K-Means Clustering in the Big Data Environment.
Authors:
Bagui, Sikha S.1 (AUTHOR) bagui@uwf.edu, Carvalho, Germano Correa Silva De1,2 (AUTHOR), Mishra, Asmi1,3 (AUTHOR), Mink, Dustin1,2 (AUTHOR), Bagui, Subhash C.2,3 (AUTHOR), Eager, Stephanie1,3 (AUTHOR)
Source:
Future Internet. Jun2025, Vol. 17 Issue 6, p267. 35p.
Database:
Library, Information Science & Technology Abstracts

Weitere Informationen

In an era marked by the rapid growth of the Internet of Things (IoT), network security has become increasingly critical. Traditional Intrusion Detection Systems, particularly signature-based methods, struggle to identify evolving cyber threats such as Advanced Persistent Threats (APTs)and zero-day attacks. Such threats or attacks go undetected with supervised machine-learning methods. In this paper, we apply K-means clustering, an unsupervised clustering technique, to a newly created modern network attack dataset, UWF-ZeekDataFall22. Since this dataset contains labeled Zeek logs, the dataset was de-labeled before using this data for K-means clustering. The labeled data, however, was used in the evaluation phase, to determine the attack clusters post-clustering. In order to identify APTs as well as zero-day attack clusters, three different labeling heuristics were evaluated to determine the attack clusters. To address the challenges faced by Big Data, the Big Data framework, that is, Apache Spark and PySpark, were used for our development environment. In addition, the uniqueness of this work is also in using connection-based features. Using connection-based features, an in-depth study is done to determine the effect of the number of clusters, seeds, as well as features, for each of the different labeling heuristics. If the objective is to detect every single attack, the results indicate that 325 clusters with a seed of 200, using an optimal set of features, would be able to correctly place 99% of attacks. [ABSTRACT FROM AUTHOR]