基于SBERT-Attention-LDA与ML-LSTM特征融合的烟草问句意图识别方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

中国烟草总公司云南省烟草公司重点项目(2021530000241012)


Tobacco Interrogative Intent Recognition Based on SBERT-Attention-LDA and ML-LSTM Feature Fusion
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对烟草领域中问句意图识别存在的特征稀疏、术语繁多和捕捉文本内部的语义关联困难等问题,提出了一种基于SBERT-Attention-LDA(Sentence-bidirectional encoder representational from transformers-Attention mechanism-Latent dirichlet allocation)与ML-LSTM(Multi layers-Long short term memory)特征融合的问句意图识别方法。该方法首先基于SBERT预训练模型和Attention机制对烟草问句进行动态编码,转换为富含语义信息的特征向量,同时利用LDA模型建模出问句的主题向量,捕捉问句中的主题信息;然后通过更改后的模型级特征融合方法ML-LSTM获得具有更为完整、准确问句语义的联合特征表示;再使用3通道的卷积神经网络(Convolutional neural network,CNN)提取问句混合语义表示中隐藏特征,输入到全连接层和Softmax函数中实现对问句意图的分类。基于烟草行业权威网站上获取的数据集开展了实验验证,实验结果表明,所提方法相比其他几种深度学习结合注意力机制的方法精确率、召回率和F1值上有显著提升,与BERT和ERNIE(Enhanced representation through knowledge integration and embedding)-CNN模型相比提升明显,F1值分别提升2.07、2.88个百分点。

    Abstract:

    Aiming at the problems of feature sparsity, terminology and difficulty in capturing semantic associations within the text in question intention recognition in the tobacco domain, a feature fusion method based on sentence-bidirectional encoder representational from transformers-Attention mechanism-latent dirichlet allocation (SBERT-Attention-LDA) and multi layers-long short term memory (ML-LSTM) feature fusion was proposed. The method first dynamically encoded the tobacco question based on the SBERT pre-training model combined with the Attention mechanism and converted it into semantic-rich feature vectors, and at the same time, the topic vector of the question was modelled by using the LDA model to capture the topic information in the question; and then the joint feature representation with more complete and accurate question semantics was obtained by using the modified model-level ML-LSTM feature fusion method; and then the three-layer LSTM and ML-LSTM feature fusion method was used to identify the intention of the question. Then a 3-channel convolutional neural network (CNN) was used to extract the hidden features in the hybrid semantic representation of the question and fed them into the fully connected layer and Softmax function to achieve the classification of the question intent. Compared with the enhanced representation through knowledge integration and embedding (BERT and ERNIE) CNN models, the improvement was obvious (the F1 values were improved by 2.07 percentage points and 2.88 percentage points, respectively), which supported the construction of the Q&A system for tobacco websites.

    参考文献
    相似文献
    引证文献
引用本文

朱波,黎魁,邱兰,黎博.基于SBERT-Attention-LDA与ML-LSTM特征融合的烟草问句意图识别方法[J].农业机械学报,2024,55(5):273-281. ZHU Bo, LI Kui, QIU Lan, LI Bo. Tobacco Interrogative Intent Recognition Based on SBERT-Attention-LDA and ML-LSTM Feature Fusion[J]. Transactions of the Chinese Society for Agricultural Machinery,2024,55(5):273-281.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-12-26
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-03-08
  • 出版日期:
文章二维码