Skip to main content
Unpublished Paper
Towards Theoretical Bounds for Resource-bounded Information Gathering for Correlation Clustering
(2009)
  • Pallika Kanani
  • Andrew McCallum, University of Massachusetts - Amherst
  • Ramesh Sitaraman
Abstract
Resource-bounded Information Gathering for Correlation Clustering deals with designing efficient methods for obtaining and incorporating information from external sources to improve accuracy of clustering tasks. In this paper, we formulate the problem, and some specific goals and lay the foundation for better theoretical understanding of this framework. We address the challenging problem of analytically quantifying the effect of changing a single edge weight on the partitioning of the entire graph, under some simplifying assumptions, hence demonstrating a method to calculate the expected reduction in error. Our analysis of different query selection criteria provides a formal way of comparing different heuristics. We compare the solution of our theoretical analysis with simulation results. We also estimate the probability of recovering the true partition under various query selection strategies for general random graphs and discuss some possible directions for approximation. Next, we prove a related bound under certain assumptions. We also describe some general techniques to efficiently query and select nodes for expanding graphs.
Disciplines
Publication Date
2009
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Pallika Kanani, Andrew McCallum and Ramesh Sitaraman. "Towards Theoretical Bounds for Resource-bounded Information Gathering for Correlation Clustering" (2009)
Available at: http://works.bepress.com/andrew_mccallum/73/