Learning to Select Actions for Resource-bounded Information Extraction(2011)
AbstractGiven a database with missing or uncertain information, our goal is to extract specific information from a large corpus such as the Web under limited resources. We cast the information gathering task as a series of alternative, resource-consuming actions to choose from and propose a new algorithm for learning to select the best action to perform at each time step. The function that selects these actions is trained using an online, error-driven algorithm called SampleRank. We present a system that finds the faculty directory pages of top Computer Science departments in the U.S. and show that the learning-based approach accomplishes this task very efficiently under a limited action budget, obtaining approximately 90% of the overall F1 using less than 2% of actions. If we apply our method to the task of filling missing values in a large scale database with millions of rows and a large number of columns, the system can obtain just the required information from the Web very efficiently.
- Resource-bounded Information Extraction,
- Active Information Acquisition,
- Learning Value Function,
- Missing Data,
Citation InformationP. Kinani and Andrew McCallum. "Learning to Select Actions for Resource-bounded Information Extraction" (2011)
Available at: http://works.bepress.com/andrew_mccallum/69/