Abstract:In order to improve the current situation of lacking of agricultural video processing technologies, such as agricultural video shot detection, agricultural video annotation, the video scene detection technology, video semantic annotation and disease diagnosis algorithm for agriculture were studied, and the semantic mining oriented semantic scene detection model was constructed. Firstly, a vegetable video shot database was established by using multimodal fusion video segmentation technology. Secondly, through the study of vegetable knowledge system, a Chinese dictionary of vegetables knowledge was constructed. At last, the multimodal fusion based semantic annotation model for vegetable scene was presented though analyzing target recognition technology in multiple modals. The recognition results of semantic similarity of three modal were measured by HowNet, and video semantic scene detection model was built. The similarity of adjacent shots was measured, and the similar adjacent shots on semantic would be clustered, so that the clustered video scenes would be more suitable for contentbased video retrieval. Experiment results showed that the proposed method reached accuracy rate of 96.9% on vegetable semantic scene detection. This method can solve the ambiguity problem of existing algorithms annotating vegetable videos, and it would help to realize professional and objective semantic annotation of vegetable video scene.