基于Stacking集成学习的水稻表型组学实体分类研究
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金项目(61502236)、中央高校基本科研业务费专项资金项目(KJQN201651)和大学生创新创业训练专项计划项目(S20190025)


Classification of Rice Phenomics Entities Based on Stacking Ensemble Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为研究整合水稻表型组学相关知识,系统地建立水稻表型组学知识图谱,通过分布式爬虫框架从国家水稻数据中心网站获取水稻表型组学数据集,并以互动百科为辅助数据源获取水稻表型组学数据。对水稻表型组学数据采用TF-IDF技术结合潜在语义模型进行预处理,并对水稻表型组学实体进行人工分类和标注。为实现水稻表型组学实体分类,研究了基于堆叠式两阶段集成学习的分类器组合模型,结合K-近邻算法、支持向量机、随机森林、梯度提升决策树机器学习方法,提升水稻表型组学实体数据分类的性能。研究表明,基于堆叠式两阶段集成学习的分类器组合模型对不同类别的水稻表型组学数据都具有较好的多分类能力,对于不平衡的水稻表型组学数据集,本文方法的分类器组合模型对水稻表型组学数据分类效果最佳,Gene类别的F1为90.47%,总体准确率达80.55%,比支持向量机、K-近邻、随机森林和梯度提升决策树4种基分类器的分类准确率平均高6.78个百分点。

    Abstract:

    With the development of rice phenomics research, it is of great significance for comprehensively analyzing, mining and applying the rice phenomics data. In order to integrate the knowledge related to rice phenomics and explore the factors affecting rice phenotypic traits,the rice phenomics knowledge graph system was implemented. Rice phenomics knowledge graph system consisted of functional modules such as entity recognition, entity query, relational query and knowledge visualization. The rice phenomics data were downloaded by a distributed data website crawler from the National Rice Data Center website, and the interactive encyclopedia website was taken as auxiliary data sources to obtain rice phenomics dataset. The dataset was preprocessed with TF-IDF and latent semantic indexing method and classified and labeling manually firstly, and then machine learning approaches were applied for training and testing. The rice phenomics entity classification was studied based on stacking ensemble learning integrated with basic learning classifier, such as K-nearest neighbor, support vector machine, random forests and gradient boosting decision tree. Based on stacking ensemble learning classifier, different types of rice phenomics data showed fine ability for entity classification. For the unbalanced rice phenomics entities, comparing with the support vector machine algorithm, the K-nearest neighbor algorithm, the random forest algorithm and the gradient boosting decision tree algorithm, the proposed method showed the best performance, i.e. the F1-Measure of Gene entities can reach 90.47%. The overall accuracy was 80.55%, and it was 6.78 percentage points higher than those of the other four basic classifiers.

    参考文献
    相似文献
    引证文献
引用本文

袁培森,杨承林,宋玉红,翟肇裕,徐焕良.基于Stacking集成学习的水稻表型组学实体分类研究[J].农业机械学报,2019,50(11):144-152. YUAN Peisen, YANG Chenglin, SONG Yuhong, ZHAI Zhaoyu, XU Huanliang. Classification of Rice Phenomics Entities Based on Stacking Ensemble Learning[J]. Transactions of the Chinese Society for Agricultural Machinery,2019,50(11):144-152.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-07-11
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2019-11-10
  • 出版日期: