基于深度卷积神经网络的水稻知识文本分类方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目(2018YFD0300309)


Rice Knowledge Text Classification Based on Deep Convolution Neural Network
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为解决文本特征提取不准确和因网络层次加深而导致模型分类性能变差等问题,提出基于深度卷积神经网络的水稻知识文本分类方法。针对水稻知识文本的特点,采用Word2Vec方法进行文本向量化处理,并与One-Hot、TF-IDF和Hashing方法进行对比分析,得出Word2Vec方法具有较高的分类精度,正确率为86.44%,能够有效解决文本向量表示稀疏和信息不完整等问题。通过调整残差网络(Residual network,ResNet)结构,分析残差模块结构和网络层次对分类网络的影响,构建了9种分类网络结构,测试结果表明,具有4层残差模块结构的网络具有较好的特征提取精度,Top-1准确率为99.79%。采用优选出的4层残差模块结构作为基本结构,使用胶囊网络(Capsule network,CapsNet)替代其池化层,设计了水稻知识文本分类模型。与FastText、BiLSTM、Atten-BiGRU、RCNN、DPCNN和TextCNN等6种文本分类模型的对比分析表明,本文设计的文本分类模型能够较好地对不同样本量和不同复杂程度的水稻知识文本进行精准分类,模型的精准率、召回率和F1值分别不小于95.17%、95.83%和95.50%,正确率为98.62%。本文模型能够实现准确、高效的水稻知识文本分类,满足实际应用需求。

    Abstract:

    The data of weeds, pests, diseases and cultivation management of rice extracted from agricultural text data is a typical text classification problem, which is fundamental to key text information extraction, text data mining and agricultural intelligent question and answer. The classification of Chinese texts, especially agricultural texts, is characterized by poor data redundancy, sparsity and normativity. While the deep learning technology can automatically extract the key features of the text, and the built model has strong adaptability and mobility. For that reason, in order to solve the problem of classification performance of the model deteriorates caused by inaccurate text feature extraction and deepened network hierarchy, a text classification method of rice knowledge oriented Q&A system was proposed. The Python of scrapy was adopted to obtain Chinese text data on rice pests, grass pests, cultivation and management, such as the experts online system of Hownet and the planting question and answer website, as training and test samples. Jieba segmentation method was applied to rice knowledge text for word segmentation to remove useless symbols and stop words in the text. Meanwhile, the results of Chinese segmentation were greatly influenced by the segmentation lexicon. In order to improve the precision of word segmentation of rice knowledge text and reduce the situation of misclassification, omission and misclassification, a ricerelated corpus was constructed on the basis of sogou agricultural corpus, which further expanded the basic Jieba word segmentation database and improved the identification degree of specialized words such as rice diseases, insect pests, grass and drugs, cultivation and management. At the same time, Word2Vec method was used to vectorize text data, and it was compared with One-Hot, TF-IDF and Hashing methods, and it was concluded that Word2Vec method can effectively solve the text vector typical problems such as sparsity and incomplete information. Based on the fundamental structure of ResNet, nine kinds of rice knowledge text classification models were constructed by means of the change and design of its residual module and network hierarchy. The test results indicated that a network with 4-layer residual module structure had good feature extraction accuracy, and the Top-1 accuracy was 99.79%. In the convolutional neural network, the pooling layer was used for the under-sampling operation, which would lose certain text phrase relative position characteristics in the pooling process, thus affecting the classification accuracy of the model, therefore, the optimized 4-layer residual module structure was taken as the basic structure, and the CapsNet was used to replace the pooling layer, and a rice knowledge text classification model, referred to as RIC-Net, was designed. Through comparative analysis of six text classification models, including FastText, BiLSTM, Atten-BiGRU, RCNN, DPCNN and TextCNN, it was concluded that the text classification model designed was able to precisely classify rice knowledge texts with different sample sizes and different levels of complexity, which enabled the accuracy rate, recall rate and F1 value of the model to be no less than 95.17%, 95.83% and 95.50%, respectively, and the accuracy rate was as high as 98.62%. The model can realize accurate and efficient classification of rice knowledge text, meeting practical application requirements.

    参考文献
    相似文献
    引证文献
引用本文

冯帅,许童羽,周云成,赵冬雪,金宁,王郝日钦.基于深度卷积神经网络的水稻知识文本分类方法[J].农业机械学报,2021,52(3):257-264. FENG Shuai, XU Tongyu, ZHOU Yuncheng, ZHAO Dongxue, JIN Ning, WANG Haoriqin. Rice Knowledge Text Classification Based on Deep Convolution Neural Network[J]. Transactions of the Chinese Society for Agricultural Machinery,2021,52(3):257-264.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-06-13
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-03-10
  • 出版日期: