Abstract:Aiming at the problems of overlapping triples, the difficulty in extracting nested entities and complex entities in the text data within the field of citrus diseases and pests, a joint extraction method for citrus diseases and pests entity relationships based on dual-pointer network annotation-cascade binary tagging framework for relational triple extraction (DPNA-CASREL) was proposed. By combining the pre-training model robustly optimized BERT pre-training approach with whole word masking and extended training data (RoBERTa-wwm-ext) with the bi-directional long short-term memory (BiLSTM) to construct an encoder, multi-dimensional vector encodings of the text were obtained. According to the semantic characteristics of citrus diseases and pests, a decoding network with dual-pointer network annotation was designed. The multi-level-pointer-network annotation method was introduced in decoding the head entity, and a complex entity labeling strategy was adopted in the decoding network of the tail entity to enhance the model’s extraction performance for complex entities. By adopting a complex entity labeling strategy in the tail entity decoding network, the synchronous extraction of entity relationship triples was realized, and the problems of overlapping triples and nested entities were solved. Experimental results on a self-built citrus diseases and pests dataset showed that the precision, recall, and F1-score of the DPNA CASREL model reached 82.12%, 81.97%, and 82.05%, respectively, which was superior to those of other models. Compared with CASREL, the F1-score of the nested and complex entity extraction were improved by 8.16 percentage points and 6.58 percentage points, respectively. This method can effectively solve the problems of entity nesting and unclear entity boundaries. It can provide a basis for citrus diseases and pests knowledge-graph construction and other downstream tasks.