融合字词语义信息的猕猴桃种植领域命名实体识别研究
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目(2020YFD1100601)、陕西省重点研发计划项目(2021NY-138)和中央高校基本科研业务专项资金项目(2452019064)


Kiwifruit Planting Entity Recognition Based on Character and Word Information Fusion
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对猕猴桃种植领域命名实体识别任务中实体词复杂度较高,识别精确率较低的问题,提出一种融合字词语义信息的猕猴桃种植实体识别方法。以BiGRU-CRF为基本模型,融合词级别和字符级别的信息。在词级别上,通过引入词集信息,并使用多头自注意力(Multiple self-attention mechanisms,MHA)调整词集中不同词的权重;同时使用注意力机制忽略不可靠的词集,将注意力集中在重要的词集上,从而提高实体识别效果;在字符级别上,引入无监督的基于转换器的双向编码表征(Bidirectional encoder representations form transformers,BERT)预训练模型增强字的语义表示。在包含12477条标注样本和7个类别实体的猕猴桃种植领域自制语料上进行了实验,结果表明,本文模型与SoftLexicon模型相比,F1值提高1.58个百分点。此外,本文模型在公开数据集ResumeNER上与Lattice-LSTM、WC-LSTM等模型进行实验对比取得了最佳效果,F1值达到96.17%,表明本文模型具有一定的泛化能力。

    Abstract:

    Aiming at the problem of high complexity of real words and low recognition accuracy in the named entity recognition task of kiwifruit planting field, a entity recognition method of kiwifruit planting integrating character and word information was proposed. Based on BiGRU-CRF model, word level and character level information were fused. At the word level, by introducing word set information and using multiple self-attention mechanisms (MHA) to adjust the weights of different words in the word set. At the same time, attention mechanism was used to ignore the unreliable word sets and focus on the important word sets to improve the entity recognition effect. At the character level, the unsupervised bidirectional encoder representations form transformers (BERT) pre-training model was introduced to enhance the semantic representation of words. Experiments were conducted on a homemade corpus in the kiwifruit cultivation domain containing 12477 annotated samples and seven categories of entities, and the results showed that the F1 value of the model was improved by 1.58 percentage points compared with the SoftLexicon model. In addition, the experimental comparison of the model ResumeNER with Lattice-LSTM, WC-LSTM and other models in the open data set ResumeNER was carried out, and the best recognition effect was achieved. The F1 value reached 96.17%, indicating that the method proposed had certain generalization ability.

    参考文献
    相似文献
    引证文献
引用本文

李书琴,张明美,刘斌.融合字词语义信息的猕猴桃种植领域命名实体识别研究[J].农业机械学报,2022,53(12):323-331. LI Shuqin, ZHANG Mingmei, LIU Bin. Kiwifruit Planting Entity Recognition Based on Character and Word Information Fusion[J]. Transactions of the Chinese Society for Agricultural Machinery,2022,53(12):323-331.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-12-19
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2022-01-24
  • 出版日期: