Abstract:Accurate information on tea plantation distribution provides scientific support for land use planning and optimization of planting layouts, contributing to the sustainable development of the tea industry. Multimodal remote sensing features of tea plantation were constructed based on RGB bands from GF-2 PMS imagery, NDVI calculated from Sentinel-2 optical imagery, phenological characteristics derived from Sentinel-1 time-series SAR data, including growth amplitude, GA, and growth length, GL), and slope aspect, slope gradient, and curvature calculated from GF-7 stereo imagery. The optimal feature combination was selected through a random forest feature selection algorithm. A dual-branch network model, multi-modal information parallel branch network (MIPBNet), was built by using a multinetwork joint learning strategy, with attentional multiscale lightweight encoder-decoder network (AMLNet) as the first branch and Vanilla AMLNet as the second branch. A feature fusion module (dual-branch feature fusion block, DBFF) was utilized for feature-level fusion at the end of the decoder, and a composite loss function was employed for optimization training. The research findings were as follows: the combination of NDVI, GA, slope aspect, and slope gradient best improved classification accuracy and was identified as the optimal multi-modal feature set. When RGB data was sequentially augmented with NDVI, GA, slope aspect, and slope gradient, experiments showed a significant reduction in both omitted and falsely extracted tea plantation areas, with an improvement in overall accuracy (OA) of 3.11%. Compared with typical semantic segmentation models such as UNet, UNeXt, and Segformer, the single-branch AMLNet within MIPBNet achieved superior tea plantation extraction results.