MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts

Macko, Dominik; Kopál, Jakub; Móro, Róbert; Srba, Ivan

doi:10.18653/v1/2025.acl-long.36

Published July 27, 2025 | Version v1

Conference paper Open

MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts

1. Kempelen Institute of Intelligent Technologies

Recent LLMs are able to generate high-quality multilingual texts, indistinguishable for humans from authentic human-written ones. Research in machine-generated text detection is however mostly focused on the English language and longer texts, such as news articles, scientific papers or student essays. Social-media texts are usually much shorter and often feature informal language, grammatical errors, or distinct linguistic items (e.g., emoticons, hashtags). There is a gap in studying the ability of existing methods in detection of such texts, reflected also in the lack of existing multilingual benchmark datasets. To fill this gap we propose the first multilingual (22 languages) and multi-platform (5 social media platforms) dataset for benchmarking machine-generated text detection in the social-media domain, called MultiSocial. It contains 472,097 texts, of which about 58k are human-written and approximately the same amount is generated by each of 7 multilingual LLMs. We use this benchmark to compare existing detection methods in zero-shot as well as fine-tuned form. Our results indicate that the fine-tuned detectors have no problem to be trained on social-media texts and that the platform selection for training matters.

Files

2025.acl-long.36.pdf

Files (545.9 kB)

Name	Size	Download all
2025.acl-long.36.pdf md5:94b47b8351723d0f76cb8ac08d0363ac	545.9 kB	Preview Download

Additional details

DOI: 10.18653/v1/2025.acl-long.36
arXiv: arXiv:2406.12549
DOI: 10.48550/arXiv.2406.12549

European Commission
AI-CODE - AI-CODE - AI services for COntinuous trust in emerging Digital Environments 101135437
European Commission
VIGILANT - Vital IntelliGence to Investigate ILlegAl DisiNformaTion 101073921

	All versions	This version
Views	174	174
Downloads	121	121
Data volume	75.9 MB	75.9 MB

2025.acl-long.36.pdf

Files (545.9 kB)

Identifiers

Funding

MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts

Authors/Creators

Description

Files

2025.acl-long.36.pdf

Files (545.9 kB)

Additional details

Identifiers

Funding