陈婧汶, 陈建国, 王成彬, 朱月琴. 基于条件随机场的地质矿产文本分词研究[J]. 中国矿业, 2018, 27(9): 69-74,101. DOI: 10.12075/j.issn.1004-4051.2018.09.035
    引用本文: 陈婧汶, 陈建国, 王成彬, 朱月琴. 基于条件随机场的地质矿产文本分词研究[J]. 中国矿业, 2018, 27(9): 69-74,101. DOI: 10.12075/j.issn.1004-4051.2018.09.035
    CHEN Jingwen, CHEN Jianguo, WANG Chengbin, ZHU Yueqin. Research on segmentation of geological mineral text using conditional random fields[J]. CHINA MINING MAGAZINE, 2018, 27(9): 69-74,101. DOI: 10.12075/j.issn.1004-4051.2018.09.035
    Citation: CHEN Jingwen, CHEN Jianguo, WANG Chengbin, ZHU Yueqin. Research on segmentation of geological mineral text using conditional random fields[J]. CHINA MINING MAGAZINE, 2018, 27(9): 69-74,101. DOI: 10.12075/j.issn.1004-4051.2018.09.035

    基于条件随机场的地质矿产文本分词研究

    Research on segmentation of geological mineral text using conditional random fields

    • 摘要: 中文与英文不同,词与词之间没有类似空格的天然分隔符,致使中文分词成为中文信息处理中的难题。地质矿产文本中含有大量未登录地质专业术语,现阶段仍无效果较好的分词方法。本文探讨了一种基于双语料库条件随机场模型的方法对地质矿产文本进行分词,并与通用领域分词方法、单语料库条件随机场模型分词方法进行对比实验。实验表明,本文提出的方法在开放测试下分词效果明显优于其他方法,准确率为94.80%,召回率为92.68%,F-值为93.73%。本文对地质矿产文本进行了中文分词研究,既能够很好地识别未登录地质专业术语,又保证了普通词汇的识别率,为对地质领域的自然语言处理工作奠定了基础。

       

      Abstract: Unlike English, the Chinese language has no space between words, it is difficult for machines to detect what constitutes a word in Chinese.The geological mineral text contains a large number of unknown geological words, which still have no effective Chinese word segmentation method.This motivated us to develop a segmenter specifically for geological mineral text which combines the characteristic of dictionary and conditional random fields model.We make a comparison experiment with generic segmentation method and a conditional random fields model which just use a single corpus.The results show that this measure should go far towards solving the Chinese word segmentation problem, and get 94.80% in precision, 92.68% in recall, 93.73% in F-score.Here we explore CRFs for a Chinese word segmentation of geological mineral text task that is good to identify the unknown geological words and ensure the recognition rate of ordinary words.This work makes a base for natural language processing in the field of geology.

       

    /

    返回文章
    返回