袁培森,杨承林,宋玉红,翟肇裕,徐焕良.基于Stacking集成学习的水稻表型组学实体分类研究[J].农业机械学报,2019,50(11):144-152.
YUAN Peisen,YANG Chenglin,SONG Yuhong,ZHAI Zhaoyu,XU Huanliang.Classification of Rice Phenomics Entities Based on Stacking Ensemble Learning[J].Transactions of the Chinese Society for Agricultural Machinery,2019,50(11):144-152.
摘要点击次数: 1598
全文下载次数: 884
基于Stacking集成学习的水稻表型组学实体分类研究   [下载全文]
Classification of Rice Phenomics Entities Based on Stacking Ensemble Learning   [Download Pdf][in English]
投稿时间:2019-07-11  
DOI:10.6041/j.issn.1000-1298.2019.11.016
中文关键词:  水稻表型组学  实体分类  堆叠式集成学习  知识图谱  潜在语义模型
基金项目:国家自然科学基金项目(61502236)、中央高校基本科研业务费专项资金项目(KJQN201651)和大学生创新创业训练专项计划项目(S20190025)
作者单位
袁培森 南京农业大学 
杨承林 南京农业大学 
宋玉红 南京农业大学 
翟肇裕 马德里理工大学 
徐焕良 南京农业大学 
中文摘要:为研究整合水稻表型组学相关知识,系统地建立水稻表型组学知识图谱,通过分布式爬虫框架从国家水稻数据中心网站获取水稻表型组学数据集,并以互动百科为辅助数据源获取水稻表型组学数据。对水稻表型组学数据采用TF-IDF技术结合潜在语义模型进行预处理,并对水稻表型组学实体进行人工分类和标注。为实现水稻表型组学实体分类,研究了基于堆叠式两阶段集成学习的分类器组合模型,结合K-近邻算法、支持向量机、随机森林、梯度提升决策树机器学习方法,提升水稻表型组学实体数据分类的性能。研究表明,基于堆叠式两阶段集成学习的分类器组合模型对不同类别的水稻表型组学数据都具有较好的多分类能力,对于不平衡的水稻表型组学数据集,本文方法的分类器组合模型对水稻表型组学数据分类效果最佳,Gene类别的F1为90.47%,总体准确率达80.55%,比支持向量机、K-近邻、随机森林和梯度提升决策树4种基分类器的分类准确率平均高6.78个百分点。
YUAN Peisen  YANG Chenglin  SONG Yuhong  ZHAI Zhaoyu  XU Huanliang
Nanjing Agricultural University,Nanjing Agricultural University,Nanjing Agricultural University,Technical University of Madrid and Nanjing Agricultural University
Key Words:rice phenomics  entities classification  stacking ensemble learning  knowledge graph  latent semantic indexing
Abstract:With the development of rice phenomics research, it is of great significance for comprehensively analyzing, mining and applying the rice phenomics data. In order to integrate the knowledge related to rice phenomics and explore the factors affecting rice phenotypic traits,the rice phenomics knowledge graph system was implemented. Rice phenomics knowledge graph system consisted of functional modules such as entity recognition, entity query, relational query and knowledge visualization. The rice phenomics data were downloaded by a distributed data website crawler from the National Rice Data Center website, and the interactive encyclopedia website was taken as auxiliary data sources to obtain rice phenomics dataset. The dataset was preprocessed with TF-IDF and latent semantic indexing method and classified and labeling manually firstly, and then machine learning approaches were applied for training and testing. The rice phenomics entity classification was studied based on stacking ensemble learning integrated with basic learning classifier, such as K-nearest neighbor, support vector machine, random forests and gradient boosting decision tree. Based on stacking ensemble learning classifier, different types of rice phenomics data showed fine ability for entity classification. For the unbalanced rice phenomics entities, comparing with the support vector machine algorithm, the K-nearest neighbor algorithm, the random forest algorithm and the gradient boosting decision tree algorithm, the proposed method showed the best performance, i.e. the F1-Measure of Gene entities can reach 90.47%. The overall accuracy was 80.55%, and it was 6.78 percentage points higher than those of the other four basic classifiers.

Transactions of the Chinese Society for Agriculture Machinery (CSAM), in charged of China Association for Science and Technology (CAST), sponsored by CSAM and Chinese Academy of Agricultural Mechanization Science(CAAMS), started publication in 1957. It is the earliest interdisciplinary journal in Chinese which combines agricultural and engineering. It always closely grasps the development direction of agriculture engineering disciplines and the published papers represent the highest academic level of agriculture engineering in China. Currently, nearly 8,000 papers have been already published. There are around 3,000 papers contributed to the journal each year, but only around 600 of them will be accepted. Transactions of CSAM focuses on a wide range of agricultural machinery, irrigation, electronics, robotics, agro-products engineering, biological energy, agricultural structures and environment and more. Subjects in Transactions of the CSAM have been embodied by many internationally well-known index systems, such as: EI Compendex, CA, CSA, etc.

   下载PDF阅读器