"STG2P: A two-stage pipeline model for intrusion detection based on improved LightGBM and K-means" by Zhiqiang Zhang

Selected Works of Mohsen Guizani

Article

STG2P: A two-stage pipeline model for intrusion detection based on improved LightGBM and K-means

Simulation Modelling Practice and Theory

Zhiqiang Zhang, Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
Le Wang, Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
Guangyao Chen, Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
Zhaoquan Gu, Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
Zhihong Tian, Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
Xiaojiang Du, Department of Electrical and Computer Engineering Stevens Institute of Technology, Hobossken, NJ, United States
Mohsen Guizani, Mohamed bin Zayed University of Artificial Intelligence

Link

Document Type

Article

Abstract

Network attack behavior is always mixed with a large number of normal communications, which makes the attack characteristics only account for a very small fraction in the log data. From the perspective of simulation and modeling, the data for attack detection is extremely unbalanced if we regard the attack behavior as the positive label. Network instruction detection is an important topic in identifying the attack behavior, but the detection methods based on simulation and model, such as traditional machine learning, face the challenges of poor effectiveness and efficiency. Supervised models, such as LightGBM, can effectively classify abnormal data because of the fast training speed and its high efficiency. However, it works badly when dealing with sparse negative data, such as the network intrusion data. On the other hand, unsupervised models, such as K-means, can achieve good performance with undesirable training time cost. However, it is difficult to select an appropriate parameter for network intrusion. In this paper, we propose a two-stage pipeline model named STG2P, which leverages the improved LightGBM and the reinforced K-means. Specifically, STG2P introduces a threshold for LightGBM in the coarse classification stage, and pipelines the draft results to K-means for filtering the false positive samples in the fine classification stage. By adaptively adopting the pipelined data of the improved LightGBM and K-means, the method can avoid the shortcomings of both models. We also conduct extensive simulations on the LANL dataset, and the results show that the AUC value can be improved as high as 29.48%. The detection rate of our method can reach 96.64%, which shows superior performance compared with some traditional detection methods. © 2022 Elsevier B.V.

DOI

10.1016/j.simpat.2022.102614

Publication Date

11-1-2022

Keywords

Improved LightGBM,
Intrusion detection,
Pipeline model,
Reinforced K-means,
Efficiency,
Pipelines,
Reinforcement,
Attack behavior,
Detection methods,
Improved lightgbm,
Intrusion-Detection,
K-means,
Network intrusions,
Performance,
Pipeline models,
Reinforced K-mean,
Simulation and modeling

Disciplines

Comments

IR Deposit conditions:

OA version (pathway b): Accepted version

24 months embargo

Licence: CC BY-NC-ND

Must link to publisher version with DOI

Citation Information

Z. Zhang, et al, "STG2P: A two-stage pipeline model for intrusion detection based on improved LightGBM and K-means", in Simulation Modelling Practice and Theory, Nov 2022, vol 120 (102614), doi: 10.1016/j.simpat.2022.102614