Skip to main content
Unpublished Paper
TaxaMiner: Improving Taxonomy Label Quality using Latent Semantic Indexing
Kno.e.sis Publications
  • Cartic Ramakrishnan, Wright State University - Main Campus
  • Christopher Thomas
  • Vipul Kashyap
  • Amit P. Sheth, Wright State University - Main Campus
Document Type
Report
Publication Date
1-1-2006
Abstract

The development of taxonomies/ontologies is a human intensive process requiring prohibitively large resource commitments in terms of time and cost. In our previous work we have identified an experimentation framework for semi-automatic taxonomy/hierarchy generation from unstructured text. In the preliminary results presented, the taxonomy/hierarchy quality was lower than we had anticipated. In this paper, we present two variations of our experimentation framework, viz. Latent semantic Indexing (LSI) for document indexing and the use of term vectors to prune labels assigned to nodes in the final taxonomy/hierarchy. Using our previous results of taxonomy/hierarchy quality as the baseline we present results that demonstrate significant improvement in taxonomy/hierarchy label quality resulting from the above and present insights into the reason for the same. Finally, we present a discussion on methods for further improving taxonomy/hierarchy quality.

Comments

University of Georgia, Athens, Computer Science Department, UGA-CS-TR-04-006

Citation Information
Cartic Ramakrishnan, Christopher Thomas, Vipul Kashyap and Amit P. Sheth. "TaxaMiner: Improving Taxonomy Label Quality using Latent Semantic Indexing" (2006)
Available at: http://works.bepress.com/amit_sheth/182/