Published November 17, 2023 | Version v1
Conference paper Open

Neutralization of Evaluative Expressions Based on Dictionary Data and Distributional Models

  • 1. St.-Petersburg State University
  • 2. Saint Petersburg State University

Description

Text style transfer (TST) is an important task in natural language generation, which aims to change the stylistic properties of the text while preserving the style-independent content. With the success of deep learning algorithms in the last decade, a variety of neural networks have been recently proposed for TST. If parallel data is provided, sequence-to-sequence models are usually used. However, most of the use cases do not have parallel data. Thus, this paper presents three non-parallel dataset methods for automatic identification and replacement of obscene evaluative expressions in a text, one being based on an internet dictionary Wiktionary, and two based on transformer models (BERT, GPT2). The models are then evaluated manually and automatically on a toxic dataset, extracted from a popular Russian social network VKontakte (VK). Experimental results demonstrate that the transformer-based (BERT) method has the highest average score (0.86) among style-strength and content preservation metrics.

Files

Vyb.pdf

Files (846.3 kB)

Name Size Download all
md5:c91c070221652d6648c1cc033d97e670
846.3 kB Preview Download