基于BERT-CRF模型的生鲜蛋供应链命名实体识别
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

北京市科委科技计划项目(Z191100008619007)


Named Entity Recognition of Fresh Egg Supply Chain Based on BERT-CRF Architecture
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    对于生鲜蛋供应链知识图谱构建过程中供应链领域实体名称多样、特征信息提取不充分的问题,提出了一种基于BERT-CRF模型(Bidirectional encoder representations from transformers-conditional random field)的命名实体识别方法。该方法使用BIO(Begin、Internal、Other)标记规则进行序列标注,以字向量和位置向量作为输入,通过BERT预训练模型提取输入序列全局特征,并在模型的末端添加CRF层引入硬约束,构建适合生鲜蛋供应链领域命名实体识别的模型框架。所提出的模型与其他3种命名实体识别模型在自建数据集上进行了对比实验,该数据集包含12810条文本语料数据,5大类21个小类。实验结果表明,本文模型取得了很好的结果,准确率、召回率和F1值分别达到91.82%、90.44%、91.01%,验证了本文模型优于其他3种模型。最后本文模型使用自建的食品领域菜谱数据集进行实验,结果表明模型具有一定的泛化能力。

    Abstract:

    Recognizing named entities from raw text is the first step to construct a fresh egg supply chain knowledge graph and support a variety of downstream natural language processing tasks. This task can sort out the information in the supply chain and provide a basis for food safety traceability. In the raw text of fresh egg supply chain, there were various types of entities, and feature information extraction was inefficient. In order to solve the problem of fast and accurate identification of the named entities which entity types were pre-defined, a bidirectional encoder representations from transformers-conditional random field (BERT-CRF) architecture was proposed to solve the task of named entity recognition (NER) in the area of fresh egg supply chain. In BERT-CRF architecture, begin, internal and other (BIO) labeling rule was used to label the sequence, and the concatenation of character vector and position vector was used as inputs. The pre-training language model (BERT) was used to obtain the global features of input sequence, and the CRF layer was added at the end of the model to introduce hard constraints. A comparative experiment was conducted with other three NER model on the self-constructed dataset that contained five categories and 21 subcategories. The result showed that the BERT-CRF model was superior to the others and reported a state-of-the-art performance. The precision, recall and F1-score were 91.82%, 90.44% and 91.01%, respectively. Finally, through the comparative experiments with other self-constructed dataset (dish dataset), the results showed that the model had a certain generalization ability.

    参考文献
    相似文献
    引证文献
引用本文

刘新亮,张梦琪,谷 情,任延昭,何东彬,高万林.基于BERT-CRF模型的生鲜蛋供应链命名实体识别[J].农业机械学报,2021,52(S0):519-525. LIU Xinliang, ZHANG Mengqi, GU Qing, REN Yanzhao, HE Dongbin, GAO Wanlin. Named Entity Recognition of Fresh Egg Supply Chain Based on BERT-CRF Architecture[J]. Transactions of the Chinese Society for Agricultural Machinery,2021,52(S0):519-525.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-07-17
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-11-10
  • 出版日期: 2021-12-10