Using runtime execution artifacts to identify malware and its associated "family" is an established technique in the security domain. Many papers in the literature rely on explicit features derived from network, file system, or registry interaction. While effective, use of these fine-granularity data points makes these techniques computationally expensive. Moreover, the signatures and heuristics this analysis produces are often circumvented by subsequent malware authors.
To this end we propose CHATTER, a system that is concerned only with the order in which high-level system events take place. Individual events are mapped onto an alphabet and execution traces are captured via terse concatenations of those letters. Then, leveraging an analyst labeled corpus of malware, n-gram document classification techniques are applied to produce a classifier predicting malware family. This paper describes that technique and its proof-of-concept evaluation. In its prototype form only network events are considered and three malware families are highlighted. We show the technique achieves roughly 80% accuracy in isolation and makes non-trivial performance improvements when integrated with a baseline classifier of non-ordered features (with accuracy of roughly 95%).
- malware family,
- document classification,