Skip to main content
Article
Detecting click fraud in online advertising: A data mining approach
Journal of Machine Learning Research
  • Richard OENTARYO, Singapore Management University
  • Ee Peng LIM, Singapore Management University
  • Michael FINEGOLD, Carnegie Mellon University
  • David LO, Singapore Management University
  • Feida ZHU, Singapore Management University
  • Clifton PHUA, Institute for Infocomm Research
  • Eng-Yeow CHEU, Institute for Infocomm Research
  • Ghim-Eng YAP, Institute for Infocomm Research
  • Kelvin SIM, Institute for Infocomm Research
  • Kasun PERERA, Masdar Institute of Science and Technology
  • Bijay NEUPANE, Masdar Institute of Science and Technology
  • Mustafa FAISAL, Masdar Institute of Science and Technology
  • Zeyar AUNG, Masdar Institute of Science and Technology
  • Wei Lee WOON, Masdar Institute of Science and Technology
  • Wei CHEN, Institute for InfoComm Research
  • Dhaval PATEL, Indian Institute of Technology Roorkee
  • Daniel BERRAR, Tokyo Institute of Technology
Publication Type
Journal Article
Version
acceptedVersion
Publication Date
1-2014
Abstract

Click fraud - the deliberate clicking on advertisements with no real interest on the product or service offered - is one of the most daunting problems in online advertising. Building an elective fraud detection method is thus pivotal for online advertising businesses. We organized a Fraud Detection in Mobile Advertising (FDMA) 2012 Competition, opening the opportunity for participants to work on real-world fraud data from BuzzCity Pte. Ltd., a global mobile advertising company based in Singapore. In particular, the task is to identify fraudulent publishers who generate illegitimate clicks, and distinguish them from normal publishers. The competition was held from September 1 to September 30, 2012, attracting 127 teams from more than 15 countries. The mobile advertising data are unique and complex, involving heterogeneous information, noisy patterns with missing values, and highly imbalanced class distribution. The competition results provide a comprehensive study on the usability of data mining-based fraud detection approaches in practical setting. Our principal findings are that features derived from fine-grained time series analysis are crucial for accurate fraud detection, and that ensemble methods offer promising solutions to highly-imbalanced nonlinear classification tasks with mixed variable types and noisy/missing patterns.

Keywords
  • Data mining,
  • Ensemble learning,
  • Feature engineering,
  • Fraud detection,
  • Imbalanced classification
Publisher
MIT Press
Copyright Owner and License
Authors
Creative Commons License
Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International
Comments

Submit request for dataset at https://larc.smu.edu.sg/buzzcity-mobile-advertisement-dataset

Additional URL
https://www.jmlr.org/papers/volume15/oentaryo14a/oentaryo14a.pdf
Citation Information
Richard OENTARYO, Ee Peng LIM, Michael FINEGOLD, David LO, et al.. "Detecting click fraud in online advertising: A data mining approach" Journal of Machine Learning Research Vol. 15 Iss. 1 (2014) p. 99 - 140 ISSN: 1533-7928
Available at: http://works.bepress.com/david_lo/117/