基于SGA-RF算法的农业土壤镉浓度反演研究
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金重点项目(41731280)和国家自然科学基金项目(11701310)


Inversion of Cadmium Content in Agriculture Soil Based on SGA-RF Algorithm
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在农业土壤重金属高光谱检测领域,土壤镉元素近红外光谱的高维、高冗余特性会严重影响高光谱反演模型的准确性和稳定性。为了解决上述问题,本文提出一种基于斯皮尔曼相关分析的遗传随机森林特征选择算法(SGA-RF)。该算法首先对初始特征集合使用基于斯皮尔曼相关分析的特征预选方法,筛选出大量冗余波段,保留与镉元素相关性最强的特征波段;其次在特征精选阶段,提出一种基于随机森林的适应度函数评估方法,该方法充分结合遗传算法强大的全局搜索能力和随机森林算法较高的反演能力,提高了对相似个体的区分能力,获得具有最小冗余度和最大区分性的最优特征波段子集。为了验证所提算法的有效性,选取青岛市大沽河流域具有代表性的124个土壤样品为实验对象,利用SGA-RF算法将原始2051个波段优选至37个最具代表性的敏感波段,并与现有特征选择算法所建模型进行对比分析。试验结果表明,该特征选择方法与随机森林回归模型相结合具有较低的预测均方根误差(0.0601),较高的相关系数(0.9502)和预测相对分析误差(2.03)。作为应用可见/近红外光谱技术定量反演农业土壤镉浓度的重要步骤,SGA-RF算法以较少的敏感波段达到了较高的反演效果,可为监测土壤重金属污染情况提供一定的理论依据。

    Abstract:

    In the field of hyperspectral detection on heavy metal pollution levels in agricultural soils, the accuracy and stability of hyperspectral inversion model for soil cadmium were seriously affected by the high dimensional and high redundancy characteristics in visible/NIR spectra. In order to solve the above problems, Spearman’s rank correlation analysis-based genetic algorithm by using random forest (SGA-RF) was proposed to select the characteristic wavelength from hyperspectral data. On the first-layer of feature selection stage, Spearman correlation analysis-based feature selection method was applied to remove redundancy between all spectra features and retain the characteristic wavelength which was the most relevant to the cadmium content. On the second-layer of feature selection stage, a new fitness function based on random forest was proposed, which perfectly combined the strong global search ability of genetic algorithm and the high inversion ability of random forest. With the proposed fitness function to evaluate the viability of individuals, the distinguishing ability between similar individuals was improved and a subset of optimal spectra feature set with minimum redundancy and maximum differentiation were obtained. In order to verify the validity of the proposed algorithm, totally 124 representative soil samples collected from the Dagu River Basin were chosen as samples. The optimal feature subset which contained 37 sensitive wavelengths was chosen and used to build soil available cadmium content inversion model, and its performance was compared with that of current feature selection methods. Results indicated that the minimum numbers of wavelength features was selected and meanwhile the prediction performance had lower predictive root mean square error of 0.0601, higher correlation coefficient of 0.9502 and residual predictive deviation of 2.03. As an important step for the quantitative inversion of cadmium concentration by using visible/NIR spectra, the research could provide some theoretical basis for monitoring soil heavy metal pollution.

    参考文献
    相似文献
    引证文献
引用本文

王轩慧,陈建毅,郑西来,朱成,王轩力,单春芝.基于SGA-RF算法的农业土壤镉浓度反演研究[J].农业机械学报,2018,49(10):261-269.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2018-04-12
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2018-10-10
  • 出版日期: