Skip to main content
Article
Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic
Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
  • M. Pereira
  • S. Coleman
  • B. Yu
  • Martine DeCock
  • Anderson Nascimento, University of Washington Tacoma
Publication Date
1-1-2018
Document Type
Conference Proceeding
Abstract

Automatic detection of algorithmically generated domains (AGDs) is a crucial element for fighting Botnets. Modern AGD detection systems have benefited from the combination of powerful advanced machine learning algorithms and linguistic distinctions between legitimate domains and malicious AGDs. However, a more evolved class of AGDs misleads the aforementioned detection systems by generating domains based on wordlists (also called dictionaries). The resulting domains, Dictionary-AGDs, are seemingly benign to both human analysis and most of AGD detection methods that receive as input solely the domain itself. In this paper, we design and implement method called WordGraph for extracting dictionaries used by the Domain Generation Algorithms (DGAs) solely DNS traffic. Our result immediately gives us an efficient mechanism for detecting this elusive, new type of DGA, without any need for reverse engineering to extract dictionaries. Our experimental results on data from known Dictionary-AGDs show that our method can extract dictionary information that is embedded in the malware code even when the number of DGA domains is much smaller than that of legitimate domains, or when multiple dictionaries are present in the data. This allows our approach to detect Dictionary-AGDs in real traffic more accurately than state-of-the-art methods based on human defined features or featureless deep learning approaches. © Springer Nature Switzerland AG 2018.

DOI
10.1007/978-3-030-00470-5_14
Citation Information
M. Pereira, S. Coleman, B. Yu, Martine DeCock, et al.. "Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic" Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11050 LNCS (2018) p. 295 - 314
Available at: http://works.bepress.com/anderson-nascimento/23/