Abstract:Aiming to address the challenges in accurately assessing residual film coverage due to interference from multiple similar non-target scenarios, complex background textures in target scene images, and the small size, high fragmentation, and irregular contours of residual films during the operational process of residual film recovery machinery, a residual film recognition method was proposed based on vehicle-mounted imaging and deep convolutional neural networks. A multi-feature-enhanced SE-DenseNet-DC classification model was developed by integrating channel attention mechanisms before and after the nonlinear combination functions in each dense block of the DenseNet121 architecture, the model enhanced the weighting of effective feature channels. Additionally, the first-layer convolution of the original model was replaced with multi-scale cascaded dilated convolutions to expand the receptive field while preserving sensitivity to fine details, enabling effective extraction of target scene images. Furthermore, a CDC-TransUnet segmentation model was constructed with enhanced detail information and multi-scale feature fusion. In the encoder of the TransUnet framework, CBAM modules were introduced to capture finer and more precise global features. DAB modules were embedded in the skip connections to fuse multi-scale semantic information and bridge the semantic gap between encoder and decoder features. CCAF modules were then incorporated into the decoder to mitigate detail loss during upsampling, achieving precise segmentation of residual films against complex backgrounds in target scenes. Experimental results demonstrated that the SE-DenseNet-DC classification model achieved classification accuracy, precision, recall, and F1 score of 96.26%, 91.54%, 94.49%, and 92.83%, respectively, for target scene image classification. The CDC-TransUnet segmentation model achieved an average intersection over union (MIOU) of 77.17% for surface residual film segmentation. The coefficient of determination (R2) between the predicted and manually annotated film coverage was 0.92, with root mean square error (RMSE) of 0.23%, and average relative error of 2.95%. The average evaluation time was 0.54 s per image. This method demonstrated high accuracy and rapid processing capabilities for real-time monitoring and evaluation of residual film coverage post-recovery, providing robust technical support for quality assessment in residual film recovery operations.