基于随机森林的鱼粉蛋白近红外定量分析
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金资助项目(11226219、61164020)和广西自然科学基金资助项目(2014GXNSFBA118023)


Near-infrared Analysis of Fishmeal Protein Based on Random Forest
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    基于近红外(NIR)光谱技术,采用随机森林(RF)回归方法测定饲料鱼粉的蛋白含量。考虑到RF模型的随机性,通过调试决策树数量(ntree)和分裂变量数目(nsv)来进行模型优选;利用基尼系数(G)的下降量来判断近红外波长变量的建模重要性,进而为鱼粉蛋白的NIR分析优选信息波长,以提高NIR定量分析精度。根据统计学原理,选择具有较低计算复杂度的等效最优模型。优选的RF模型构建471个决策树,需要随机的103个波长变量进行树节点分裂,同时通过计算节点分裂前后G的平均下降量来选择52个近红外信息波长进行定标校正,得到等效最优的校正模型,校正均方根偏差和校正相关系数分别为3.970%和0.943;经过独立的预测集样品对最优RF模型进行检验,预测均方根偏差为5.271%,预测相关系数为0.906,说明RF回归结合G系数的波长优选能够有效地提高NIR光谱应用于鱼粉蛋白定量的预测能力。

    Abstract:

    Random forest (RF) regression algorithm was utilized for determination of protein content in fishmeal samples based on near-infrared (NIR) spectrometry. Considering the randomness of RF method, the optimized models were selected by tuning the two vital modeling parameters of the number of decision trees (ntree) and the number of split variables (nsv). The descending of Gini coefficient (G) is taken as the indicator performing the modeling importance of NIR valuables. It was used to select the informative wavelengths for NIR analysis of fishmeal, with an aim to improve the accuracy of quantitative models. According to statistical theory, we tried to select equivalent optimal model with relatively low computational complexity. The optimized RF model needed to construct 471 decision trees and randomly select 103 wavelength variables for node splitting when the decision trees grow. Simultaneously, 52 NIR informative wavelengths can be selected out according to the average of G descending values based on the trees in the forest. The equivalent optimized RF model output the root mean square error (RMSEv) and correlation coefficient (Rv) of validation set were 3.970% and 0.943, respectively. The optimized model was further evaluated by using the prediction samples that were excluded from modeling process, with the RMSEp of 5.271%, and the Rp of 0.906. Results showed that RF regression combined with G coefficients for wavelength selection is feasible and effective to improve the NIR predictive ability for quantitative determination of fishmeal protein.

    参考文献
    相似文献
    引证文献
引用本文

陈华舟,陈福,石凯,封全喜.基于随机森林的鱼粉蛋白近红外定量分析[J].农业机械学报,2015,46(5):233-238. Chen Huazhou, Chen Fu, Shi Kai, Feng Quanxi. Near-infrared Analysis of Fishmeal Protein Based on Random Forest[J]. Transactions of the Chinese Society for Agricultural Machinery,2015,46(5):233-238.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2014-08-06
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2015-05-10
  • 出版日期: