Skip to main content
Article
Combining Software Metrics and Text Features for Vulnerable File Prediction
20th International Conference on Engineering of Complex Computer Systems (ICECCS 2015)
  • Yun ZHANG
  • David LO, Singapore Management University
  • Xin XIA
  • Bowen XU
  • Jianling Sun SUN
  • Shanping LI
Publication Type
Conference Proceeding Article
Publication Date
12-2015
Abstract

In recent years, to help developers reduce time and effort required to build highly secure software, a number of prediction models which are built on different kinds of features have been proposed to identify vulnerable source code files. In this paper, we propose a novel approach VULPREDICTOR to predict vulnerable files, it analyzes software metrics and text mining together to build a composite prediction model. VULPREDICTOR first builds 6 underlying classifiers on a training set of vulnerable and non-vulnerable files represented by their software metrics and text features, and then constructs a meta classifier to process the outputs of the 6 underlying classifiers. We evaluate our solution on datasets from three web applications including Drupal, PHPMyAdmin and Moodle which contain a total of 3,466 files and 223 vulnerabilities. The experiment results show that VULPREDICTOR can achieve F1 and EffectivenessRatio@20% scores of up to 0.683 and 75%, respectively. On average across the 3 projects, VULPREDICTOR improves the F1 and EffectivenessRatio@20% scores of the best performing state-of-the-art approaches proposed by Walden et al. by 46.53% and 14.93%, respectively.

Keywords
  • Machine Learning,
  • Text Mining,
  • Vulnerable File
ISBN
9781467385817
Identifier
10.1109/ICECCS.2015.15
Publisher
IEEE
City or Country
Gold Coast, Australia
Additional URL
http://dx.doi.org/10.1109/ICECCS.2015.15
Citation Information
Yun ZHANG, David LO, Xin XIA, Bowen XU, et al.. "Combining Software Metrics and Text Features for Vulnerable File Prediction" 20th International Conference on Engineering of Complex Computer Systems (ICECCS 2015) (2015) p. 40 - 49
Available at: http://works.bepress.com/david_lo/223/