Abstract:Aiming to address the challenges of difficult annotation, complex features, and low boundary extraction precision for solid waste in remote sensing imagery, a two-stage method was proposed based on weakly supervised learning: in the first stage, an image-level labeled dataset was utilized to conduct comparative experiments among five network models, ultimately selecting the Swin Transformer as the feature learning model. Subsequently, the gradient-weighted class activation mapping was employed for feature region visualization to obtain heatmaps. These heatmaps were further processed by using a combination of adaptive thresholding and color difference methods to obtain a rough outline of the solid waste. In the second stage, the DeepSnake model was employed for optimization to achieve refined contours. This study utilized unmanned aerial vehicle (UAV) multispectral remote sensing image data to conduct experiments in six typical urbanrural interface areas within the Langfang Development Zone, Hebei Province. The results of the experiments were as follows: in the first stage, testing of the five network models revealed a pronounced advantage for the Swin Transformer in feature extraction quantitative analysis, with a precision of 93.8%, recall of 95.0%, and F1 score of 94.4%. Visualization of attention regions also indicated that it had the best recognition effect. The coarse outline extraction by using the combination method of adaptive thresholding and color difference demonstrated superiority in the binary comparison experiment. In the second stage, quantitative analysis of fine contour extraction evaluated by using the average precision (AP) metric from the COCO dataset, yielded an AP value of 91.3% at IOU 0.5 and 77.5% at IOU 0.75; moreover, qualitative comparison of contour extraction between the first and the second stages highlighted the optimization effect of DeepSnake. The results demonstrated that this study can accurately identify and extract solid waste by using an image-level labeled dataset, offering pronounced accuracy advantages and providing a viable method for the ecological environment management of urban and rural areas in China.