Skip to main content
Unpublished Paper
Rapid Development of Hindi Named Entity Recognition using Conditional Random Fields and Feature Induction
(2003)
  • Wei Li
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
This paper describes our application of Conditional Random Fields (CRFs) with feature induction to a Hindi named entity recognition task. With only five days development time and little knowledge of this language, we automatically discover relevant features by providing a large array of lexical tests and using feature induction to automatically construct the features that most increase conditional likelihood. In an effort to reduce overfitting, we use a combination of a Gaussian prior and early-stopping based on the results of 10-fold cross validation.
Keywords
  • Artificial intelligence,
  • natural language processing,
  • text analysis,
  • mathematics of computing,
  • probablistic algorithms
Disciplines
Publication Date
2003
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Wei Li and Andrew McCallum. "Rapid Development of Hindi Named Entity Recognition using Conditional Random Fields and Feature Induction" (2003)
Available at: http://works.bepress.com/andrew_mccallum/138/