Skip to main content
Unpublished Paper
Disambiguating Web Appearances of People in a Social Network
(2005)
  • Ron Bekkerman
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
You are looking for information about a particular person. A search engine returns many pages for that person's name, but which pages are about the person you care about, and which are about other people who happen to have the same name? Furthermore, if we are looking for multiple people who are related in some way, how can we best leverage this social network? This paper presents two unsupervised frameworks for solving this problem: one based on link structure of the web pages, another using the recently introduced Bootstrapped Information Bottleneck (BIB) clustering method. To evaluate our methods, we collected and hand-disambiguated a dataset of over 1000 web pages retrieved from Google queries on 12 personal names appearing together in someones in an email folder. On this dataset our proposed methods outperform traditional agglomerative clustering by more than 20%, achieving over 80% F-measure.
Keywords
  • Information Systems,
  • Information storage and retrieval,
  • Information Search and Retrieval,
  • Web appearance,
  • name disambiguation,
  • social network,
  • document clustering,
  • link structure,
  • information bottleneck
Disciplines
Publication Date
2005
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Ron Bekkerman and Andrew McCallum. "Disambiguating Web Appearances of People in a Social Network" (2005)
Available at: http://works.bepress.com/andrew_mccallum/47/