Abstract:
Helium, as a non-renewable strategic resource essential for supporting high-tech fields such as aerospace, medical imaging, semiconductor manufacturing, and quantum computing, presents dual challenges in both resource scarcity and information access within China’s industry chain, becoming a critical bottleneck constraining autonomous innovation in related sectors. At the resource level, China’s foreign-dependency ratio for helium has long been relatively high, classifying it as a typical “chokehold” resource; furthermore, the helium content in major domestic natural gas fields is generally below 0.1%, far lower than the 0.3% economic extraction threshold, resulting in inadequate economic viability of extraction processes and difficulty in supporting large-scale autonomous production. At the information level, over 90% of core literature in the helium industry is published in English, leading domestic researchers to face high costs in literature acquisition, inconsistent translation of technical terms, and low efficiency in knowledge extraction, which significantly hinders the process of autonomous innovation. To effectively address these challenges, this study leverages the technical advantages of the open-source ChatGLM series large language models(LLMs) from Zhipu AI and spearheaded the development of a domain-specific large model and an intelligent knowledge graph system tailored specifically for the entire helium industry chain in China. The study innovatively constructs a dedicated helium dataset covering the period from 1990 to 2024, integrating six major categories of data sources—global journal papers, patents, industry reports, technical standards, monographs, and news reports—with a scale exceeding 1.2 million document segments, providing high-quality data support for model training and knowledge graph construction. By employing entity recognition and relation extraction technologies, the system transforms unstructured knowledge into “entity-relation-entity” triples stored in a Neo4j database, forming a comprehensive and dynamic knowledge graph of the whole industry chain of helium. Simultaneously, by adopting a Retrieval-Augmented Generation(RAG) architecture combined with expert annotation mechanisms, the system achieves fact-prioritized knowledge services, fundamentally resolving the “hallucination” problem common in domain-specific large models. On a professional test set comprising 500 questions, the system demonstrates excellent performance: knowledge question-answering accuracy reaches 86.4%, the
F1 score for entity recognition is 89.7%, and relation extraction accuracy is 84.3%, with all metrics showing significant improvements over general-purpose models and an average response time of only 1.2 seconds, meeting real-time research demands. Practical applications indicate that the system efficiently supports whole chain research tasks such as helium resource assessment, extraction process optimization, recycling technology pathway design, and industry policy analysis. The domain-specific anti-hallucination knowledge service system constructed in this study not only provides critical technical support for breaking through technological bottlenecks and enhancing autonomous innovation in China’s helium industry, but also offers a replicable technical paradigm for the intelligent development of other strategic resource sectors, holding significant strategic importance for ensuring the autonomous controllability and high-quality development of China’s helium industry chain.