词嵌入BERT-CRF玉米育种实体关系联合抽取方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目(2020YFD1100601)、陕西省重点研发计划项目(2021NY-138)和中央高校基本科研业务专项资金项目(2452019064)


Joint Extraction Method of Entity and Relation in Maize Breeding Based on BERT-CRF and Word Embedding
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对玉米育种文本数据中存在重叠三元组、实体表达方式多样等问题,提出一种嵌入词汇信息的BERT-CRF(Bidirectional encoder representations from transformers-conditional random field)玉米育种实体关系联合抽取方法。首先,分析了玉米育种语料表达特征,采用对实体边界、关系类别和实体位置信息同步标注的策略;其次,构建了嵌入词汇信息的BERT-CRF模型进行训练和预测,自建玉米育种知识词典,通过在BERT中嵌入词汇信息,融合字符特征和词汇特征,增强模型的语义能力,利用CRF模型输出全局最优标签序列,设计了实体关系三元组匹配算法(Entity and relation triple matching algorithm,ERTM),将标签进行匹配和映射来获取三元组;最后,为验证该方法的有效性,在玉米育种数据集上进行实验,结果表明,本文模型精确率、召回率和F1值分别为91.84%、95.84%、93.80%,与现有模型相比性能均有提升。说明该方法能够有效抽取玉米育种领域知识,为构建玉米育种知识图谱及其它下游任务提供数据基础。

    Abstract:

    Aiming at the problems of overlapping triples and diverse entity expressions in maize breeding text data, a joint bidirectional encoder representations from transformers-conditional random field (BERT-CRF) maize breeding entity relation extraction method with embedded lexical information was proposed. Firstly, the expression characteristics of maize breeding corpus were analyzed, and a synchronous labeling strategy for entity boundary, relation type, and entity position information was adopted. Secondly, a BERT-CRF model with embedded lexical information was constructed for training and prediction, a selfbuilt dictionary of maize breeding knowledge was designed to enhance the semantic ability of the model by embedding lexical information in BERT, integrating character features and lexical features, and using CRF model to output the globally optimal label sequence, and an entity and relation triple matching algorithm (ERTM) was designed to obtain triples by mapping and matching labels. Finally, in order to verify the effectiveness of the proposed method, experiments were carried out on maize breeding data set. The results showed that the precision, recall and F1 value were 91.84%, 95.84% and 93.80%, respectively, which improved the performance compared with the existing models. This method can extract maize breeding knowledge effectively and provide data basis for constructing maize breeding knowledge graph and other downstream tasks.

    参考文献
    相似文献
    引证文献
引用本文

李书琴,庞文婷.词嵌入BERT-CRF玉米育种实体关系联合抽取方法[J].农业机械学报,2023,54(11):286-294. LI Shuqin, PANG Wenting. Joint Extraction Method of Entity and Relation in Maize Breeding Based on BERT-CRF and Word Embedding[J]. Transactions of the Chinese Society for Agricultural Machinery,2023,54(11):286-294.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-04-28
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-11-10
  • 出版日期: