Abstract:Crop diseases and insect pest text mining is becoming increasingly important as the number of crop diseases and insect pest documents rapidly grows. The development of effective and highly accurate named entity recognition (NER) systems of crop diseases and insect pests can be beneficial to extract research results from related research reports and provide effective suggestions for the control of diseases and insect pests. Stopwait algorithm based on semi-remote supervision was proposed to construct the corpus of Chinese crop diseases and insect pests to solve the problem of corpus missing. Moreover, an agricultural information extraction (Agr-IE) method was proposed. The method was based on BERT-BILSTM-CRF, and multi-source word segmentation information and global lexical embedding was used to enrich the information of character vector before character information integrated. Experiments performed by Agr-IE on the datasets of crop diseases and insect pests showed that the model can effectively distinguish four types of entities: the F1 score of diseases, pests, pharmaceuticals, and plant were 96.56%, 95.12%, 94.48% and 95.54%, respectively. And the model also performed well in identifying entities about pathogens (81.48% F1 score), which was higher than the corresponding values of BERT-BILSTM-CRF, BERT and other models. The recognition effect was higher than that of the compared models. In addition, the proposed model was compared with CAN-NER, Lattice-LSTM-CRF and other models on MSRA, Weibo datasets, and the best recognition results were obtained. The F1 scores were 95.80% and 94.57% respectively, which showed that the algorithm had good generalization ability and stability.