基于注意力机制的农业文本命名实体识别
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金项目(61871041)、国家重点研发计划项目(2019YFD1101105)和北京市科技计划项目(Z191100004019007)


Named Entity Recognition of Chinese Agricultural Text Based on Attention Mechanism
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对农业智能问答系统构建过程中传统的农业命名实体识别方法依赖人工特征模板、特征信息提取不充分、实体名称多样导致标注不一致等问题,提出一种基于注意力机制的农业文本命名实体识别方法。采用连续词袋模型(Continuous bag of words,CBOW)对输入字向量进行预训练,丰富字向量特征信息,缓解分词准确度对性能的影响;引入文档级的注意力(Attention)机制,获取实体间相似信息,保证实体在不同语境下的标签一致性;基于双向长短期记忆网络(Bi-directional long-short term memory,BiLSTM)和条件随机场(Conditional random field,CRF)模型,构建适合农业领域实体识别的模型框架。选取4604篇农业文本,针对病害、虫害、农药、农作物品种4类实体进行了识别实验。结果表明,模型能有效地辨别农业文本中的实体,缓解实体标记不一致的问题,在农业语料上达到了较好的结果,识别的准确率、召回率、F值分别为93.48%、90.60%、92.01%。与其他3种识别方法相比,模型在不同规模语料库的准确率均有一定提高,具有明显的性能优势。

    Abstract:

    Agricultural named entity recognition is a fundamental tasks for natural language processing in the agricultural field. More importantly, it is the key basic step of constructing agricultural knowledge graph and intelligent question answering system. Traditional named entity recognition (NER) methods based on CRF model which relies on large amounts of hand-crafted features, cannot extract more effective features and solve the inconsistency of entity tagging caused by the diversity of entity names. To issue the above problems, an Att-BiLSTM-CRF framework was proposed based on deep learning. Firstly, the CBOW model was used to pre-train character embedding on a large number of unlabeled agricultural corpora, and alleviate the impact of segmentation accuracy on the performance of the model. Then, the document-level attention mechanism was introduced to obtain the similar information between entities in the text, so as to ensure the consistency of entity tagging in different contexts. Finally, based on BiLSTM-CRF benchmark model, a model framework suitable for agricultural named entity recognition was constructed. Totally 4604 agricultural texts were chosen to identify diseases, pests, pesticides and crop varieties. The experimental results showed that the model can effectively identify the entities in the agricultural text and alleviate the problem of inconsistent entity tagging. The model achieved good result in the agricultural corpus, and the recognition precision, recall, and F-score were respectively 93.48%, 90.60% and 92.01%. Compared with other models,such as LSTM model,LSTM-CRF model and BiLSTM-CRF model,Att-BiLSTM-CRF had obvious advantages in different size corpus, and it can effectively identify entities for agricultural texts.

    参考文献
    相似文献
    引证文献
引用本文

赵鹏飞,赵春江,吴华瑞,王维.基于注意力机制的农业文本命名实体识别[J].农业机械学报,2021,52(1):185-192. ZHAO Pengfei, ZHAO Chunjiang, WU Huarui, WANG Wei. Named Entity Recognition of Chinese Agricultural Text Based on Attention Mechanism[J]. Transactions of the Chinese Society for Agricultural Machinery,2021,52(1):185-192.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-04-13
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-01-10
  • 出版日期: