Newswise — Lexical simplification (LS) aims to simplify a sentence by replacing complex words with simpler words without changing the meaning of the sentence,which can facilitate comprehension of the text for people with non-native speakers and children. Traditional LS methods utilize linguistic databases or word embedding models to extract synonyms or high-similar words for the complex word, and then sort them based on their appropriateness in context.

Recently, BERT-based LS methods entirely or partially mask the complex word of the original sentence, and then feed the sentence into pretrained modeling BERT to obtain the top probability tokens corresponding to the masked word as the substitute candidates. They have made remarkable progress in generating substitutes by making full use of the context information of complex words, that can effectively alleviate the shortcomings of traditional methods. But, the paucity of annotated LS data limits the applicability of BERT, which leads to the following two limitations of BERT.

(1) BERT as a self-supervised pretrained model that is trained with the goal of recovering the destroyed original text, does not significantly learn the word substitution operation.

(2) Masking the complex word will impair the semantic information of complex word, resulting in failing to preserve the sentence's meaning.

To address those limitations mentioned above, we treat the LS task as a single-word generation task, and propose an unsupervised LS method PaGeLS based on a non-autoregressive paraphrase generation. After training a encoder-decoder modeling on a paraphrase corpus, we feed the sentence into the encoder, and let the decoder to predict the probability distribution over the vocabulary for the hidden representation of the complex word. We choose the words with top probabilities as the candidates. Compared with pretrained BERT, PaGeLS incorporates the following three information: the semantic information of the complex word, the context of the complex word, and the semantic information of the original word.

In general, we have done the following: (1) We propose an LS method PaGeLS without relying on any annotated LS data. To the best of our knowledge, PaGeLS is the first LS method that can produce substitute candidates based on the essence of LS task without changing the sentence's meaning. (2) We proposes a novel strategy for candidate ranking. We adopt a text generation evaluation metric BARTScore to compute the relationship between the original sentence and the updated sentence. We found that BARTScore is very suitable for candidate ranking, and it outperforms the previous state-of-the-art methods when they are all provided with the same substitution candidate list on three popular LS datasets. (3) Experimental results show that our method PeGeLS achieves state-of-the-art results.

Journal Link: Frontiers of Computer Science, Apr-2023