Abstract:Aiming at the problems of feature sparsity, terminology and difficulty in capturing semantic associations within the text in question intention recognition in the tobacco domain, a feature fusion method based on sentence-bidirectional encoder representational from transformers-Attention mechanism-latent dirichlet allocation (SBERT-Attention-LDA) and multi layers-long short term memory (ML-LSTM) feature fusion was proposed. The method first dynamically encoded the tobacco question based on the SBERT pre-training model combined with the Attention mechanism and converted it into semantic-rich feature vectors, and at the same time, the topic vector of the question was modelled by using the LDA model to capture the topic information in the question; and then the joint feature representation with more complete and accurate question semantics was obtained by using the modified model-level ML-LSTM feature fusion method; and then the three-layer LSTM and ML-LSTM feature fusion method was used to identify the intention of the question. Then a 3-channel convolutional neural network (CNN) was used to extract the hidden features in the hybrid semantic representation of the question and fed them into the fully connected layer and Softmax function to achieve the classification of the question intent. Compared with the enhanced representation through knowledge integration and embedding (BERT and ERNIE) CNN models, the improvement was obvious (the F1 values were improved by 2.07 percentage points and 2.88 percentage points, respectively), which supported the construction of the Q&A system for tobacco websites.