Skip to main content
Article
STG2P: A two-stage pipeline model for intrusion detection based on improved LightGBM and K-means
Simulation Modelling Practice and Theory
  • Zhiqiang Zhang, Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
  • Le Wang, Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
  • Guangyao Chen, Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
  • Zhaoquan Gu, Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
  • Zhihong Tian, Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
  • Xiaojiang Du, Department of Electrical and Computer Engineering Stevens Institute of Technology, Hobossken, NJ, United States
  • Mohsen Guizani, Mohamed bin Zayed University of Artificial Intelligence
Document Type
Article
Abstract

Network attack behavior is always mixed with a large number of normal communications, which makes the attack characteristics only account for a very small fraction in the log data. From the perspective of simulation and modeling, the data for attack detection is extremely unbalanced if we regard the attack behavior as the positive label. Network instruction detection is an important topic in identifying the attack behavior, but the detection methods based on simulation and model, such as traditional machine learning, face the challenges of poor effectiveness and efficiency. Supervised models, such as LightGBM, can effectively classify abnormal data because of the fast training speed and its high efficiency. However, it works badly when dealing with sparse negative data, such as the network intrusion data. On the other hand, unsupervised models, such as K-means, can achieve good performance with undesirable training time cost. However, it is difficult to select an appropriate parameter for network intrusion. In this paper, we propose a two-stage pipeline model named STG2P, which leverages the improved LightGBM and the reinforced K-means. Specifically, STG2P introduces a threshold for LightGBM in the coarse classification stage, and pipelines the draft results to K-means for filtering the false positive samples in the fine classification stage. By adaptively adopting the pipelined data of the improved LightGBM and K-means, the method can avoid the shortcomings of both models. We also conduct extensive simulations on the LANL dataset, and the results show that the AUC value can be improved as high as 29.48%. The detection rate of our method can reach 96.64%, which shows superior performance compared with some traditional detection methods. © 2022 Elsevier B.V.

DOI
10.1016/j.simpat.2022.102614
Publication Date
11-1-2022
Keywords
  • Improved LightGBM,
  • Intrusion detection,
  • Pipeline model,
  • Reinforced K-means,
  • Efficiency,
  • Pipelines,
  • Reinforcement,
  • Attack behavior,
  • Detection methods,
  • Improved lightgbm,
  • Intrusion-Detection,
  • K-means,
  • Network intrusions,
  • Performance,
  • Pipeline models,
  • Reinforced K-mean,
  • Simulation and modeling
Comments

IR Deposit conditions:

OA version (pathway b): Accepted version

24 months embargo

Licence: CC BY-NC-ND

Must link to publisher version with DOI

Citation Information
Z. Zhang, et al, "STG2P: A two-stage pipeline model for intrusion detection based on improved LightGBM and K-means", in Simulation Modelling Practice and Theory, Nov 2022, vol 120 (102614), doi: 10.1016/j.simpat.2022.102614