Skip to main content
Article
Detecting Extreme Rank Anomalous Collections
12th SIAM International Conference on Data Mining 2012: Anaheim, California, 26-28 April 2012
  • Hanbo DAI, Singapore Management University
  • Feida ZHU, Singapore Management University
  • Ee-Peng LIM, Singapore Management University
  • Hwee Hwa PANG, Singapore Management University
Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
4-2012
Abstract

Anomaly or outlier detection has a wide range of applications, including fraud and spam detection. Most existing studies focus on detecting point anomalies, i.e., individual, isolated entities. However, there is an increasing number of applications in which anomalies do not occur individually, but in small collections. Unlike the majority, entities in an anomalous collection tend to share certain extreme behavioral traits. The knowledge essential in understanding why and how the set of entities becomes outliers would only be revealed by examining at the collection level. A good example is web spammers adopting common spamming techniques. To discover this kind of anomalous collections, we introduce a novel definition of anomaly, called Extreme Rank Anomalous Collection. We propose a statistical model to quantify the anomalousness of such a collection, and present an exact as well as a heuristic algorithms for finding top-K extreme rank anomalous collections. We apply the algorithms on real Web spam data to detect spamming sites, and on IMDB data to detect unusual actor groups. Our algorithms achieve higher precisions compared to existing spam and anomaly detection methods. More importantly, our approach succeeds in finding meaningful anomalous collections in both datasets.

ISBN
9781622760947
Identifier
10.1137/1.9781611972825.76
Publisher
SIAM
City or Country
Philadelphia, PA
Creative Commons License
Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International
Additional URL
http://doi.org/10.1137/1.9781611972825.76
Citation Information
Hanbo DAI, Feida ZHU, Ee-Peng LIM and Hwee Hwa PANG. "Detecting Extreme Rank Anomalous Collections" 12th SIAM International Conference on Data Mining 2012: Anaheim, California, 26-28 April 2012 (2012) p. 883 - 894
Available at: http://works.bepress.com/hweehwa-pang/21/