Skip to main content
Article
L2AP: Fast Cosine Similarity Search With Prefix L-2 Norm Bounds
30th IEEE International Conference on Data Engineering (ICDE 2014) (2014)
  • David C. Anastasiu, University of Minnesota - Twin Cities
  • George Karypis, University of Minnesota - Twin Cities
Abstract
The All-Pairs similarity search, or self-similarity join problem, finds all pairs of vectors in a high dimensional sparse dataset with a similarity value higher than a given threshold. The problem has been classically solved using a dynamically built inverted index. The search time is reduced by early pruning of candidates using size and value-based bounds on the similarity. In the context of cosine similarity and weighted vectors, leveraging the Cauchy-Schwarz inequality, we propose new ℓ2-norm bounds for reducing the inverted index size, candidate pool size, and the number of full dot-product computations. We tighten previous candidate generation and verification bounds and introduce several new ones to further improve our algorithm's performance. Our new pruning strategies enable significant speedups over baseline approaches, most times outperforming even approximate solutions. We perform an extensive evaluation of our algorithm, L2AP, and compare against state-of-the-art exact and approximate methods, AllPairs, MMJoin, and BayesLSH, across a variety of real-world datasets and similarity thresholds.
Keywords
  • Vectors,
  • Upper bound,
  • Search problems,
  • Approximation algorithms,
  • Indexing,
  • Heuristic algorithms
Publication Date
2014
DOI
10.1109/ICDE.2014.6816700
Publisher Statement
This is the Accepted Manuscript of an article that appeared in Proceedings of the 18th ACM Conference on Information and Knowledge Management. The Version of Record is available at the following link.


Citation Information
David C. Anastasiu and George Karypis. "L2AP: Fast Cosine Similarity Search With Prefix L-2 Norm Bounds" 30th IEEE International Conference on Data Engineering (ICDE 2014) (2014) p. 784 - 795
Available at: http://works.bepress.com/david-anastasiu/9/