Abstract:The rational application of nitrogen fertilizer during the growth of rice exerts a decisive influence on its development and yield. Traditional unimodal data, such as spectral or image data, struggle to capture the complex physiological states and nitrogen response mechanisms of rice seedlings comprehensively. A multi-modal fusion network (MMFN) was constructed to fulfil the requirement for identifying nitrogen application levels. To this end, a multi-modal dataset was compiled comprising near-infrared spectral data, leaf images and growth indicators of rice seedlings subjected to different nitrogen application concentrations. A recognition model based on MMFN was developed by incorporating an improved channel attention mechanism (improved channel attention mechanism, ICAM) and concatenation mechanism, which enabled the fusion of physical growth metrics with image feature information. The experimental results showed that the fusion model achieved an accuracy of 97.55%, a recall of 95.34%, a precision of 95.87% and an F1 score of 95.72% in identifying the nitrogen application level. The proposed model, through multi-modal data collaborative optimisation, effectively extracted complementary information across different modalities, demonstrating significant superiority over single-modal approaches. The proposed MMFN model fully exploited complementary information across modalities through collaborative multi-modal data optimisation, thereby enhancing the accuracy and robustness of nitrogen application level identification. It can offer reliable technical support for precise rice nitrogen monitoring and fertility regulation.