"Learning to Select Actions for Resource-bounded Information Extraction" by P. Kinani

Selected Works of Andrew McCallum

Follow Contact

Unpublished Paper

Learning to Select Actions for Resource-bounded Information Extraction

(2011)

P. Kinani
Andrew McCallum, University of Massachusetts - Amherst

Download

Abstract

Given a database with missing or uncertain information, our goal is to extract specific information from a large corpus such as the Web under limited resources. We cast the information gathering task as a series of alternative, resource-consuming actions to choose from and propose a new algorithm for learning to select the best action to perform at each time step. The function that selects these actions is trained using an online, error-driven algorithm called SampleRank. We present a system that finds the faculty directory pages of top Computer Science departments in the U.S. and show that the learning-based approach accomplishes this task very efficiently under a limited action budget, obtaining approximately 90% of the overall F1 using less than 2% of actions. If we apply our method to the task of filling missing values in a large scale database with millions of rows and a large number of columns, the system can obtain just the required information from the Web very efficiently.

Keywords

Resource-bounded Information Extraction,
Active Information Acquisition,
Learning Value Function,
Missing Data,
SampleRank

Disciplines

Computer Sciences

Publication Date

2011

Comments

This is the pre-published version harvested from CIIR.

Citation Information

P. Kinani and Andrew McCallum. "Learning to Select Actions for Resource-bounded Information Extraction" (2011)
Available at: http://works.bepress.com/andrew_mccallum/69/