Skip to main content
Contribution to Book
Cosine Approximate Nearest Neighbors
Data Science – Analytics and Applications: Proceedings of the 1st International Data Science Conference – iDSC2017 (2017)
  • David C. Anastasiu, San Jose State University
Abstract
Cosine similarity graph construction, or all-pairs similarity search, is an important kernel in many data mining and machine learning methods. Building the graph is a difficult task. Up to n2 pairs of objects should be na¨ıvely compared to solve the problem for a set of n objects. For large object sets, approximate solutions for this problem have been proposed that address the complexity of the task by retrieving most, but not necessarily all, of the nearest neighbors. We propose a novel approximate graph construction method that leverages properties of the object vectors to effectively select few comparison candidates, those that are likely to be neighbors. Furthermore, our method leverages filtering strategies recently developed for exact methods to quickly eliminate unpromising comparison candidates, leading to few overall similarity computations and increased efficiency. We compare our method against several state-of-the-art approximate and exact baselines on six real-world datasets. Our results show that our approach provides a good tradeoff between efficiency and effectiveness, showing up to 35.81x efficiency improvement over the best alternative at 0.9 recall.
Publication Date
2017
Editor
Peter Haber, Thomas Lampoltshammer, and Manfred Mayr
Publisher
Springer Vieweg
ISBN
978-3-658-19287-7
DOI
10.1007/978-3-658-19287-7
Citation Information
David C. Anastasiu. "Cosine Approximate Nearest Neighbors" 1Data Science – Analytics and Applications: Proceedings of the 1st International Data Science Conference – iDSC2017 (2017) p. 45 - 50
Available at: http://works.bepress.com/david-anastasiu/44/