Skip to main content
Article
Autonomous Link Spam Detection in Purely Collaborative Environments
7th International Symposium on Wikis and Open Collaboration (WikiSym '11)
  • Andrew G. West, University of Pennsylvania
  • Avantika Agrawal, University of Pennsylvania
  • Phillip Baker, University of Pennsylvania
  • Brittney Exline, University of Pennsylvania
  • Insup Lee, University of Pennsylvania
Date of this Version
10-5-2011
Document Type
Conference Paper
Comments
Seventh International Symposium on Wikis and Open Collaboration, Mountain View, California, USA, October 2011.
Abstract

Collaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations outside the host environment (i.e., link spam). The collaborative encyclopedia, Wikipedia, is the basis for our analysis.

Recent research has exposed vulnerabilities in Wikipedia's link spam mitigation, finding that human editors are latent and dwindling in quantity. To this end, we propose and develop an autonomous classifier for link additions. Such a system presents unique challenges. For example, low barriers-to-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination).

In this work, a spam corpus is extracted from over 235,000 link additions to English Wikipedia. From this, 40+ features are codified and analyzed. These indicators are computed using "wiki" metadata, landing site analysis, and external data sources. The resulting classifier attains 64% recall at 0.5% false-positives (ROC-AUC=0.97). Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection. Finally, a live Wikipedia implementation of the technique has been developed.

DOI
10.1145/2038558.2038574
Copyright/Permission Statement
© ACM 2011. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym '11), http://dx.doi.org/10.1145/2038558.2038574.
Keywords
  • Wikipedia,
  • collaboration,
  • collaborative security,
  • information security,
  • link spam,
  • spam mitigation,
  • reputation,
  • spatio-temporal features,
  • machine-learning,
  • intelligent routing
Citation Information
Andrew G. West, Avantika Agrawal, Phillip Baker, Brittney Exline, et al.. "Autonomous Link Spam Detection in Purely Collaborative Environments" 7th International Symposium on Wikis and Open Collaboration (WikiSym '11) (2011) p. 91 - 100
Available at: http://works.bepress.com/andrew_g_west/20/