Skip to main content
Unpublished Paper
Learning to Select Actions for Resource-bounded Information Extraction
(2011)
  • P. Kinani
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
Given a database with missing or uncertain information, our goal is to extract specific information from a large corpus such as the Web under limited resources. We cast the information gathering task as a series of alternative, resource-consuming actions to choose from and propose a new algorithm for learning to select the best action to perform at each time step. The function that selects these actions is trained using an online, error-driven algorithm called SampleRank. We present a system that finds the faculty directory pages of top Computer Science departments in the U.S. and show that the learning-based approach accomplishes this task very efficiently under a limited action budget, obtaining approximately 90% of the overall F1 using less than 2% of actions. If we apply our method to the task of filling missing values in a large scale database with millions of rows and a large number of columns, the system can obtain just the required information from the Web very efficiently.
Keywords
  • Resource-bounded Information Extraction,
  • Active Information Acquisition,
  • Learning Value Function,
  • Missing Data,
  • SampleRank
Disciplines
Publication Date
2011
Comments
This is the pre-published version harvested from CIIR.
Citation Information
P. Kinani and Andrew McCallum. "Learning to Select Actions for Resource-bounded Information Extraction" (2011)
Available at: http://works.bepress.com/andrew_mccallum/69/