Skip to main content
Unpublished Paper
Chinese Segmentation and New Word Detection using Conditional Random Fields
(2004)
  • Fuchun Peng
  • Fangfang Feng
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
Chinese word segmentation is a difficult, important and widely-studied sequence modeling problem. This paper demonstrates the ability of linear-chain conditional random fields (CRFs) to perform robust and accurate Chinese word segmentation by providing a principled framework that easily supports the integration of domain knowledge in the form of multiple lexicons of characters and words. We also present a probabilistic new word detection method, which further improves performance. Our system is evaluated on four datasets used in a recent comprehensive Chinese word segmentation competition. State-of-the-art performance is obtained.
Disciplines
Publication Date
2004
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Fuchun Peng, Fangfang Feng and Andrew McCallum. "Chinese Segmentation and New Word Detection using Conditional Random Fields" (2004)
Available at: http://works.bepress.com/andrew_mccallum/43/