Published November 17, 2023
| Version v1
Conference paper
Open
Neutralization of Evaluative Expressions Based on Dictionary Data and Distributional Models
Authors/Creators
- 1. St.-Petersburg State University
- 2. Saint Petersburg State University
Description
Text style transfer (TST) is an important task in natural language generation, which aims to change the stylistic properties of the text while preserving the style-independent content. With the success of deep learning algorithms in the last decade, a variety of neural networks have been recently proposed for TST. If parallel data is provided, sequence-to-sequence models are usually used. However, most of the use cases do not have parallel data. Thus, this paper presents three non-parallel dataset methods for automatic identification and replacement of obscene evaluative expressions in a text, one being based on an internet dictionary Wiktionary, and two based on transformer models (BERT, GPT2). The models are then evaluated manually and automatically on a toxic dataset, extracted from a popular Russian social network VKontakte (VK). Experimental results demonstrate that the transformer-based (BERT) method has the highest average score (0.86) among style-strength and content preservation metrics.
Files
Vyb.pdf
Files
(846.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c91c070221652d6648c1cc033d97e670
|
846.3 kB | Preview Download |