"Web Robot Detection Techniques: Overview and Limitations" by Derek Doran

Selected Works of Derek Doran

Follow Contact

Article

Web Robot Detection Techniques: Overview and Limitations

Data Mining and Knowledge Discovery

Derek Doran, Wright State University - Main Campus
Swapna S. Gokhale

Find in your library

Document Type

Article

Publication Date

1-1-2011

Disciplines

Computer Sciences and
Engineering

Abstract

Most modern Web robots that crawl the Internet to support value-added services and technologies possess sophisticated data collection and analysis capabilities. Some of these robots, however, may be ill-behaved or malicious, and hence, may impose a significant strain on a Web server. It is thus necessary to detect Web robots in order to block undesirable ones from accessing the server. Such detection is also essential to ensure that the robot traffic is considered appropriately in the performance and capacity planning of Web servers. Despite a variety of Web robot detection techniques, there is no consensus regarding a single technique, or even a specific “type” of technique, that performs well in practice. Therefore, to aid in the development of a practically applicable robot detection technique, this survey presents a critical analysis and comparison of the prevalent detection approaches. We propose a framework to classify the existing detection techniques into four categories based on their underlying detection philosophy. We compare the different classes to gain insights into those characteristics that make up an effective robot detection scheme. Finally, we discuss why the contemporary techniques fail to offer a general solution to the robot detection problem and propose a set of key ingredients necessary for strong Web robot detection.

DOI

10.1007/s10618-010-0180-z

Citation Information

Derek Doran and Swapna S. Gokhale. "Web Robot Detection Techniques: Overview and Limitations" Data Mining and Knowledge Discovery Vol. 22 Iss. 1-2 (2011) p. 183 - 210 ISSN: 13845810
Available at: http://works.bepress.com/derek_doran/19/