基于CoalBERT模型的煤矿安全隐患智能分类研究

李泽荃; 陈豪斌; 赵嘉良; 刘飞翔

doi:10.12075/j.issn.1004-4051.20242260

摘要: 在各类信息化平台的飞速发展之下，煤矿行业已经广泛采用了各类信息化平台来优化运营和提升安全生产水平。这些平台帮助煤矿企业实现了大量相关数据的积累与管理，但煤矿安全隐患文本数据的语义复杂性和领域专业性使得这些数据无法得到有效利用。为此，本文依据2022版《煤矿安全规程》，定义了17个一级隐患类别和109个二级隐患类别，作为煤矿安全隐患数据的样本标签体系，构建了一套系统的煤矿安全隐患分类方法，并利用CoalBERT预训练语言模型对煤矿安全隐患文本数据进行双层类别体系的文本分类，同时以BERT模型作为参照进行对比分析。该模型通过引入领域术语掩码语言建模（DP-MLM）和句子顺序预测（SOP）任务，解决了通用模型在煤矿安全领域对专业术语（如“锚杆支护”“瓦斯抽采”）的语义理解不足和对隐患描述文本中逻辑连贯性有限的两大局限性。模型训练在PyTorch框架下进行，通过设定学习率和迭代次数，并使用随机梯度下降法进行优化。研究结果表明：CoalBERT模型在煤矿安全隐患分类任务中表现出色。在一级类别分类实验中，CoalBERT模型在精准率、召回率和F₁值上均高于BERT模型，分别提高了0.34%、0.21%和0.27%。在二级类别分类实验中，CoalBERT模型的F₁值平均提高了3%～5%，最高分类效果可达97.75%。特别是在“矿井建设”“冲击地压防治”和“隐患排查”等类别上，CoalBERT模型展现出显著优势。由此可知，基于CoalBERT预训练语言模型的煤矿安全隐患分类算法在任务上表现出色，能够成为煤矿安全管理工作的重要辅助工具，对提升煤矿安全管理水平和预防事故发生提供有力支持。

Abstract: With the rapid development of various information platforms, the coal mining industry has widely adopted various information platforms to optimize operations and improve safety production levels. These platforms have helped coal mining enterprises accumulate and manage a large amount of relevant data, but the semantic complexity and domain specialization of coal mine safety hazard text data make it difficult to effectively utilize this data. To this end, based on the 2022 version of the Coal Mine Safety Regulations, 17 first-level hazard categories and 109 second-level hazard categories are defined as the sample label system for coal mine safety hazard data. A systematic coal mine safety hazard classification method is constructed, and a CoalBERT pre trained language model is used to classify coal mine safety hazard text data into a two-layer category system. At the same time, the BERT model is used as a reference for comparative analysis. This model solves the two major limitations of general models in the field of coal mine safety, namely insufficient semantic understanding of professional terms such as “anchor support” and “gas extraction”, and limited logical coherence in hazard description texts, by introducing domain term masking language modeling（DP-MLM） and sentence order prediction（SOP） tasks. The model training is carried out under the PyTorch framework, by setting the learning rate and iteration times, and optimizing using stochastic gradient descent. The research results indicate that the CoalBERT model performs well in coal mine safety hazard classification tasks. In the first level category classification experiment, the CoalBERT model outperforms the BERT model in accuracy, recall, and F₁ score, with improvements of 0.34%, 0.21%, and 0.27%, respectively. In the second-level category classification experiment, the F₁ value of the CoalBERT model increases by an average of 3%-5%, and the highest classification performance reaches 97.75%. Especially in categories such as “mine construction”“rockburst prevention and control”, and “hidden danger investigation”, the CoalBERT model demonstrates significant advantages. It can be seen that the coal mine safety hazard classification algorithm based on the CoalBERT pre trained language model performs well in the task and can become an important auxiliary tool for coal mine safety management, providing strong support for improving the level of coal mine safety management and preventing accidents.

基于CoalBERT模型的煤矿安全隐患智能分类研究

Research on intelligent classification of coal mine safety hazards based on the CoalBERT model