Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis
Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work  has proposed protocol reverse engineering by using clustering on network traces. That kind of approach is limited by the lack of semantic information on network traces. In this paper we propose a new approach using program binaries. Our approach, shadowing, uses dynamic binary analysis and is based on a unique intuition—the way that an implementation of the protocol processes the received application data reveals a wealth of information about the protocol message format. We have implemented our approach in a system called Polyglot and evaluated it extensively using real-world implementations of five different protocols: DNS, HTTP, IRC, Samba and ICQ.We compare our results with the manually crafted message format, included in Wireshark, one of the state-of-the-art protocol analyzers. The differences we find are small and usually due to different implementations handling fields in different ways. Finding such differences between implementations is an added benefit, as they are important for problems such as fingerprint generation, fuzzing, and error detection.
Juan Caballero, Heng Yin, Zhenkai Liang, and Dawn Song. "Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis" Department of Electrical and Computing Engineering (2007).
Available at: http://works.bepress.com/heng_yin/3