基于部首嵌入和注意力机制的病虫害命名实体识别
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目(2016YFD0300710)


Recognition of Chinese Agricultural Diseases and Pests Named Entity with Joint Radicalembedding and Self-attention Mechanism
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为了解决农业病虫害命名实体识别过程中存在的内在语义信息缺失、局部上下文特征易被忽略和捕获长距离依赖能力不足等问题,以农业病虫害文本为研究对象,提出一种基于部首嵌入和注意力机制的农业病虫害命名实体识别模型(Chinese agricultural diseases and pests named entity recognition with joint radicalembedding and selfattention, RS-ADP)。首先,该模型将部首嵌入集成到字符嵌入中作为输入,用以丰富语义信息。其中,针对部首嵌入设计了3种特征提取策略,即卷积神经网络(Convolutional neural network, CNN)、双向长短时记忆网络(Bidirectional long shortterm memory network, BiLSTM) 和CNN-BiLSTM;其次,采用多层不同窗口尺寸的CNNs层提取不同尺度的局部上下文信息;然后,在BiLSTM提取全局序列特征的基础上,采用自注意力机制进一步增强模型提取更长距离依赖的能力;最后,采用条件随机场(Conditional random field, CRF)联合识别实体边界和划分实体类别。在包含11个类别和24715条标注样本的农业病虫害自制语料上进行了实验。结果表明,本文模型RS-ADP在该数据集上精确率、召回率和F1值分别为94.16%、94.47%和94.32%;在具体实体类别上,RS-ADP在作物、病害、虫害等易识别实体上F1值高达95.81%、97.76%和97.23%。同时,RS-ADP在草害、病原等难以识别实体上F1值仍保持86%以上。实验结果表明,本文所提模型能够有效识别农业病虫害命名实体,其识别精度优于其他模型,且具有一定的泛化性。

    Abstract:

    Chinese named entity recognition in agricultural diseases and pests domain(CNER-ADP) plays an important role in agricultural natural language processing such as relation extraction, agricultural knowledge graph construction, and agricultural knowledge question and answering, but it still presents some problems, i.e., the neglect of inherent semantic information and local contextual features and the insufficiency of capturing longdistance dependencies, which will lead to low accuracy and robustness. To solve the above problems and tackle the CNER-ADP task, a novel Chinese named entity recognition method for agricultural diseases and pests via jointly using radicalembedding and selfattention (RS-ADP) was proposed. Firstly, the model integrated radical embedding and character embedding as input to enrich semantic information. Among them, three different strategies, including CNN and BiLSTM were both designed to capture the radicallevel embedding. Secondly, a CNNs layer with different kernel sizes was considered capturing multiscale local contextual features. Thirdly, based on the BiLSTM layer, selfattention mechanism was used to further enhance the ability of the model to extract longerdistance dependencies. Finally, the conditional random field (CRF) was utilized to identify entity boundaries and category. The experiments were carried out on the corpus of agricultural diseases and pests, named AgCNER, which contained 11 categories and 24715 samples. At macrolevel, the RS-ADP model achieved optimal precision, recall, and F1 values of 94.16%, 94.47%, and 94.32%, respectively. In terms of specific categories, it achieved F1 values as high as 95.81%, 97.76%, and 97.23% on easily identifiable entities such as crop, disease, and pest. Meanwhile, this model still maintained over 86% of F1 value on some other difficultly recognized entities such as weed and pathogeny. The experimental results showed that the proposed model could effectively recognize the named entities of agricultural pests and diseases without feature engineering. Moreover, it had certain generalization and outperformed other models. 

    参考文献
    相似文献
    引证文献
引用本文

郭旭超,唐詹,刁磊,周晗,李林.基于部首嵌入和注意力机制的病虫害命名实体识别[J].农业机械学报,2020,51(s2):335-343. GUO Xuchao, TANG Zhan, DIAO Lei, ZHOU Han, LI Lin. Recognition of Chinese Agricultural Diseases and Pests Named Entity with Joint Radicalembedding and Self-attention Mechanism[J]. Transactions of the Chinese Society for Agricultural Machinery,2020,51(s2):335-343.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-08-01
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2020-12-10
  • 出版日期: 2020-12-10