陈华舟,陈福,石凯,封全喜.基于随机森林的鱼粉蛋白近红外定量分析[J].农业机械学报,2015,46(5):233-238.
Chen Huazhou,Chen Fu,Shi Kai,Feng Quanxi.Near-infrared Analysis of Fishmeal Protein Based on Random Forest[J].Transactions of the Chinese Society for Agricultural Machinery,2015,46(5):233-238.
摘要点击次数: 3065
全文下载次数: 1355
基于随机森林的鱼粉蛋白近红外定量分析   [下载全文]
Near-infrared Analysis of Fishmeal Protein Based on Random Forest   [Download Pdf][in English]
投稿时间:2014-08-06  
DOI:10.6041/j.issn.1000-1298.2015.05.033
中文关键词:  鱼粉蛋白  近红外光谱  随机森林  基尼系数  波长优选
基金项目:国家自然科学基金资助项目(11226219、61164020)和广西自然科学基金资助项目(2014GXNSFBA118023)
作者单位
陈华舟 桂林理工大学 
陈福 上海优久生物科技有限公司 
石凯 桂林理工大学 
封全喜 桂林理工大学 
中文摘要:基于近红外(NIR)光谱技术,采用随机森林(RF)回归方法测定饲料鱼粉的蛋白含量。考虑到RF模型的随机性,通过调试决策树数量(ntree)和分裂变量数目(nsv)来进行模型优选;利用基尼系数(G)的下降量来判断近红外波长变量的建模重要性,进而为鱼粉蛋白的NIR分析优选信息波长,以提高NIR定量分析精度。根据统计学原理,选择具有较低计算复杂度的等效最优模型。优选的RF模型构建471个决策树,需要随机的103个波长变量进行树节点分裂,同时通过计算节点分裂前后G的平均下降量来选择52个近红外信息波长进行定标校正,得到等效最优的校正模型,校正均方根偏差和校正相关系数分别为3.970%和0.943;经过独立的预测集样品对最优RF模型进行检验,预测均方根偏差为5.271%,预测相关系数为0.906,说明RF回归结合G系数的波长优选能够有效地提高NIR光谱应用于鱼粉蛋白定量的预测能力。
Chen Huazhou  Chen Fu  Shi Kai  Feng Quanxi
Guilin University of Technology,Shanghai Su-Pro Bio-tech Co., Ltd.,Guilin University of Technology and Guilin University of Technology
Key Words:Fishmeal protein  Near-infrared spectroscopy  Random forest  Gini coefficient  Wavelength selection
Abstract:Random forest (RF) regression algorithm was utilized for determination of protein content in fishmeal samples based on near-infrared (NIR) spectrometry. Considering the randomness of RF method, the optimized models were selected by tuning the two vital modeling parameters of the number of decision trees (ntree) and the number of split variables (nsv). The descending of Gini coefficient (G) is taken as the indicator performing the modeling importance of NIR valuables. It was used to select the informative wavelengths for NIR analysis of fishmeal, with an aim to improve the accuracy of quantitative models. According to statistical theory, we tried to select equivalent optimal model with relatively low computational complexity. The optimized RF model needed to construct 471 decision trees and randomly select 103 wavelength variables for node splitting when the decision trees grow. Simultaneously, 52 NIR informative wavelengths can be selected out according to the average of G descending values based on the trees in the forest. The equivalent optimized RF model output the root mean square error (RMSEv) and correlation coefficient (Rv) of validation set were 3.970% and 0.943, respectively. The optimized model was further evaluated by using the prediction samples that were excluded from modeling process, with the RMSEp of 5.271%, and the Rp of 0.906. Results showed that RF regression combined with G coefficients for wavelength selection is feasible and effective to improve the NIR predictive ability for quantitative determination of fishmeal protein.

Transactions of the Chinese Society for Agriculture Machinery (CSAM), in charged of China Association for Science and Technology (CAST), sponsored by CSAM and Chinese Academy of Agricultural Mechanization Science(CAAMS), started publication in 1957. It is the earliest interdisciplinary journal in Chinese which combines agricultural and engineering. It always closely grasps the development direction of agriculture engineering disciplines and the published papers represent the highest academic level of agriculture engineering in China. Currently, nearly 8,000 papers have been already published. There are around 3,000 papers contributed to the journal each year, but only around 600 of them will be accepted. Transactions of CSAM focuses on a wide range of agricultural machinery, irrigation, electronics, robotics, agro-products engineering, biological energy, agricultural structures and environment and more. Subjects in Transactions of the CSAM have been embodied by many internationally well-known index systems, such as: EI Compendex, CA, CSA, etc.

   下载PDF阅读器