Skip to main content
Article
Measuring Importance and Query Relevance in Toopic-Focused Multi-Document Summarization
Departmental Papers (CIS)
  • Surabhi Gupta, Stanford University
  • Ani Nenkova, Univesity of Pennsylvania
  • Dan Jurafsky, Stanford University
Date of this Version
6-1-2007
Document Type
Conference Paper
Comments

Gupta, S., Nenkova, A., & Jurafsky, D., Measuring Importance and Query Relevance in Topic-Focused Multi-Document Summarization, 45th Annual Meeting of the Association for Computational Linguistics, June 2007, doi: anthology-new/P

Abstract

The increasing complexity of summarization systems makes it difficult to analyze exactly which modules make a difference in performance. We carried out a principled comparison between the two most commonly used schemes for assigning importance to words in the context of query focused multi-document summarization: raw frequency (word probability) and log-likelihood ratio. We demonstrate that the advantages of log-likelihood ratio come from its known distributional properties which allow for the identification of a set of words that in its entirety defines the aboutness of the input. We also find that LLR is more suitable for query-focused summarization since, unlike raw frequency, it is more sensitive to the integration of the information need defined by the user.

Disciplines
Citation Information
Surabhi Gupta, Ani Nenkova and Dan Jurafsky. "Measuring Importance and Query Relevance in Toopic-Focused Multi-Document Summarization" (2007)
Available at: http://works.bepress.com/ani_nenkova/17/