Research on segmentation of geological mineral text using conditional random fields
-
Graphical Abstract
-
Abstract
Unlike English, the Chinese language has no space between words, it is difficult for machines to detect what constitutes a word in Chinese.The geological mineral text contains a large number of unknown geological words, which still have no effective Chinese word segmentation method.This motivated us to develop a segmenter specifically for geological mineral text which combines the characteristic of dictionary and conditional random fields model.We make a comparison experiment with generic segmentation method and a conditional random fields model which just use a single corpus.The results show that this measure should go far towards solving the Chinese word segmentation problem, and get 94.80% in precision, 92.68% in recall, 93.73% in F-score.Here we explore CRFs for a Chinese word segmentation of geological mineral text task that is good to identify the unknown geological words and ensure the recognition rate of ordinary words.This work makes a base for natural language processing in the field of geology.
-
-