基于农业网络信息分类的热词自动提取方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家高技术研究发展计划(863计划)项目(2013AA102306)和“十二五”国家科技支撑计划项目(2012BAD35B06)


Automatic Extraction Method of Hot Words Based on Agricultural Network Information Classification
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    热词提取对于监控和分析农业舆情具有重要意义,目前已有一定研究基础,但仍存在针对性差等问题,无法满足农业领域不同产业用户群的个性化需求,为此,提出一种基于农业网络信息分类的热词自动提取方法。首先采用多标记分类算法对文本语料进行分类,按分类类别构建语料库,然后采用基于信息熵的方法对每个类别分别提取热词候选词,最后采用基于时间变化的方法进行候选词热度计算,根据候选词热度排序结果得到热词。本文抽取农业网站上的15354条文本进行实验,结果表明,热词提取准确率达到0.9以上,能够较高质量地提取农业热词,为不同农业用户群体发现和分析产业热点提供帮助。

    Abstract:

    With the vigorous development of the Internet, the network information grows rapidly, so does the agricultural network information. Extracting hot words from massive information is of great significance for monitoring and analyzing agricultural public opinion. Up to now, there is some research on hot words extraction, but there are still many problems such as poor pertinence. Existing hot word extraction methods cannot meet the personalized needs of users in different industries in agriculture. Therefore, a method of automatically extracting hot words based on agricultural network information classification was proposed. Firstly, the texts were classified by using the multi-label classification algorithm and multiple corpuses were built according to the classification categories. Secondly, the hot word candidates for each category were extracted by using the method based on information entropy. Thirdly, the heat of each hot word candidate was calculated by using the method based on time variation. Finally, these candidates were sorted by heat degree, and hot words were got according to the sorting results. Totally 15354 texts from agricultural websites were extracted for the experiment, automatically obtaining the hot words in the specified time period. The experiment results showed that the accuracy was over 0.9. It proved that the proposed method can extract agricultural hot words with high quality and help different agricultural user groups find and analyze the hot spot information of the industry.

    参考文献
    相似文献
    引证文献
引用本文

段青玲,张璐,刘怡然,王沙沙.基于农业网络信息分类的热词自动提取方法[J].农业机械学报,2018,49(7):160-167.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2017-12-15
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2018-07-10
  • 出版日期: