Skip to main content
Article
Evolving Decision Trees for the Categorization of Software
Proceedings of the IEEE 38th Annual International Computers, Software and Applications Conference Workshops, COMPSACW 2014
  • Jasenko Hosic
  • Daniel R. Tauritz, Missouri University of Science and Technology
  • Samuel A. Mulder
Abstract

Current manual techniques of static reverse engineering are inefficient at providing semantic program understanding. We have developed an automated method to categorize applications in order to quickly determine pertinent characteristics. Prior work in this area has had some success, but a major strength of our approach is that it produces heuristics that can be reused for quick analysis of new data. Our method relies on a genetic programming algorithm to evolve decision trees which can be used to categorize software. The terminals, or leaf nodes, within the trees each contain values based on selected features from one of several attributes: system calls, byte n-grams, opcode n-grams, cyclomatic complexity, and bonding. The evolved decision trees are reusable and achieve average accuracies above 95% when categorizing programs based on compiler origin and versions. Developing new decision trees simply requires more labeled datasets and potentially different feature selection algorithms for other attributes, depending on the data being classified.

Meeting Name
38th Annual IEEE Computer Software and Applications Conference Workshops, COMPSACW 2014 (2014: Jul. 27-29, Vasteras, Sweden)
Department(s)
Computer Science
Research Center/Lab(s)
Center for High Performance Computing Research
Sponsor(s)
Missouri University of Science and Technology. Natural Computation Laboratory
Keywords and Phrases
  • Genetic Programming,
  • Program Understanding
International Standard Book Number (ISBN)
9781479935789
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2014 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.
Publication Date
1-1-2014
Citation Information
Jasenko Hosic, Daniel R. Tauritz and Samuel A. Mulder. "Evolving Decision Trees for the Categorization of Software" Proceedings of the IEEE 38th Annual International Computers, Software and Applications Conference Workshops, COMPSACW 2014 (2014) p. 337
Available at: http://works.bepress.com/daniel-tauritz/36/