Skip to main content
Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis
Department of Electrical and Computer Engineering
  • Juan Caballero, Carnegie Mellon University
  • Heng Yin, Carnegie Mellon University
  • Zhenkai Liang, Carnegie Mellon University
  • Dawn Song, Carnegie Mellon University
Date of Original Version
Conference Proceeding
Rights Management
Abstract or Description

Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work [17] has proposed protocol reverse engineering by using clustering on network traces. That kind of approach is limited by the lack of semantic information on network traces. In this paper we propose a new approach using program binaries. Our approach, shadowing, uses dynamic binary analysis and is based on a unique intuition—the way that an implementation of the protocol processes the received application data reveals a wealth of information about the protocol message format. We have implemented our approach in a system called Polyglot and evaluated it extensively using real-world implementations of five different protocols: DNS, HTTP, IRC, Samba and ICQ.We compare our results with the manually crafted message format, included in Wireshark, one of the state-of-the-art protocol analyzers. The differences we find are small and usually due to different implementations handling fields in different ways. Finding such differences between implementations is an added benefit, as they are important for problems such as fingerprint generation, fuzzing, and error detection.

Citation Information
Juan Caballero, Heng Yin, Zhenkai Liang and Dawn Song. "Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis" (2007)
Available at: