![](https://d3ilqtpdwi981i.cloudfront.net/LDCREpnR-hcB1t5p71tfrdQaPhA=/425x550/smart/https://bepress-attached-resources.s3.amazonaws.com/uploads/d5/5c/74/d55c7476-867b-4a36-b6ac-afa533041d3c/thumbnail_BPFile%20object.jpg)
Unpublished Paper
Chinese Segmentation and New Word Detection using Conditional Random Fields
(2004)
Abstract
Chinese word segmentation is a difficult, important and widely-studied sequence modeling problem. This paper demonstrates the ability of linear-chain conditional random fields (CRFs) to perform robust and accurate Chinese word segmentation by providing a principled framework that easily supports the integration of domain knowledge in the form of multiple lexicons of characters and words. We also present a probabilistic new word detection method, which further improves performance. Our system is evaluated on four datasets used in a recent comprehensive Chinese word segmentation competition. State-of-the-art performance is obtained.
Disciplines
Publication Date
2004
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Fuchun Peng, Fangfang Feng and Andrew McCallum. "Chinese Segmentation and New Word Detection using Conditional Random Fields" (2004) Available at: http://works.bepress.com/andrew_mccallum/43/