农业机械作业大数据清洗方法与试验优化
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目(2017YFD0700205)


Experimental Optimization of Big Data Cleaning Method for Agricultural Machinery
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对农业机械大数据平台中,已有数据清洗算法不适用于大规模、多源异构、高维度和强时空相关实时数据的问题,分析了复杂田间环境下农机作业数据异常来源及特征,研究了异常数据检测及修正技术,提出一种基于滑动窗口机制的农机作业数据在线清洗方法。该方法基于方差约束原则识别异常数据,基于最小变动原则生成候选修正数据,基于数据时间相关性通过AR、ARX模型迭代优化得到最终修复值,依托Flink分布式计算平台,从而适应农机数据吞吐量大、并发度高的特点。基于某省农机作业数据对算法进行了有效性验证,结果表明,在数据规模达到1×105条、数据异常率为5%的情况下,算法异常识别率达到0.94,且与已有清洗算法相比均方根误差更小。基于Box-Behnken方法设计试验,通过响应面分析得到回归模型,分析算法参数对均方根误差和运行时间的影响。基于二进制编码的混合遗传算法对参数进行优化,优化后的参数组合可使算法均方根误差达到0.16、运行时间达到0.13s。该数据清洗方法能够为农机大数据平台的实时处理提供高质量数据支撑。

    Abstract:

    Data quality issues are the bottleneck hindering the development of agricultural machinery big data platforms. The existing data cleaning algorithms are not suitable for large-scale, multi-source heterogeneous, high-dimensional, and strong spatiotemporal correlation of agricultural machinery real-time streaming data. To this end, the source and characteristics of the abnormal data of agricultural machinery in complex environments were analyzed, the detection and correction technology of abnormal data was studied, and an online cleaning method for agricultural machinery operation data based on sliding window mechanism was proposed. The method determined abnormal data based on the principle of variance constraint; generated preliminary candidate data based on the principle of minimum change; based on the time correlation of data, the final repair value was obtained through AR and ARX model optimization; relying on the Flink distributed computing platform, it was suitable for large data throughput and high concurrency of agricultural machinery. The validity of the algorithm was verified based on the agricultural machinery operation data of a certain province. The results showed that when the amount of data reached 1×10 5 and the proportion of abnormal data was 5%, the abnormal recognition rate of the algorithm reached 0.94, and the root mean square error was smaller than that of the existing cleaning algorithm. The experiment was designed based on the Box-Behnken method, and the regression model was obtained through response surface analysis to study the influence of algorithm parameters on the root mean square error and time. The hybrid genetic algorithm based on binary coding optimized the parameters, and the optimized parameter combination can make the root mean square error of the algorithm reach 0.16 and the running time reach 0.13s. The data cleaning method can provide high-quality data support for the real-time processing of the agricultural machinery big data platform.

    参考文献
    相似文献
    引证文献
引用本文

苑严伟,徐玲,冀福华,郭大方,安飒,牛康.农业机械作业大数据清洗方法与试验优化[J].农业机械学报,2021,52(6):35-42. YUAN Yanwei, XU Ling, JI Fuhua, GUO Dafang, AN Sa, NIU Kang. Experimental Optimization of Big Data Cleaning Method for Agricultural Machinery[J]. Transactions of the Chinese Society for Agricultural Machinery,2021,52(6):35-42.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-09-27
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-06-10
  • 出版日期: 2021-06-10
文章二维码