Conversational Networks For Automatic Online Moderation

Cécillon, Noé; Papégnies, Étienne; Labatut, Vincent; Dufour, Richard; Linarès, Georges

doi:10.5281/zenodo.11617245

Published March 6, 2024 | Version 2.0.0

Dataset Open

Conversational Networks For Automatic Online Moderation

1. Avignon Université

Description. This repository contains several datasets of conversational networks, extracted from the chat messages exchanged by players of the SpaceOrigin MMORPG. Each graph represents a specific conversation, and belongs to one of two classes: Abusive (1) or Non-abusive (0). Vertices represent users, and edges represent the fact that the connected users exchanged message during the considered time period. Edges are weighted and directed: weights represent the intensity of the message exchanges, and directions represent who sent messages to whom.

We provide two types of graphs: unsigned and signed. Unsigned graphs were extracted using the method described in paper [1], below. Version 1.0 of this dataset contain only a part of the conversations, subsampled to get balanced classes. Version 1.1 is extended to contain all available conversations, and there are much more Non-abusive than Abusive conversations. Signed graphs were extracted later, using the method described in publication [9] below. Each edge is described by an additional sign, that indicates the polarity of the messages exchanged by two users; friendly (positive) vs. hostile (negative).

These datasets were used to train a classifier into automatically recognizing abusive messages. See the below papers for more details. The repository also contains some figures that appear in these papers.

Publications. The following papers used the unsigned version of the conversational networks. The extraction method is described in paper [1].

[1] É. Papégnies, V. Labatut, R. Dufour & G. Linarès, “Conversational Networks for Automatic Online Moderation,” IEEE Transactions on Computational Social Systems 6(1):38–55, 2019. ⟨hal-01999546⟩ DOI: 10.1109/tcss.2018.2887240
[2] É. Papegnies, R. Dufour, V. Labatut & G. Linarès. “Détection de messages abusifs au moyen de réseaux conversationnels,” in 8ème Conférence sur les modèles et l'analyse de réseaux : approches mathématiques et informatiques (MARAMI), 2017. ⟨hal-01614279⟩
[3] É. Papegnies, V. Labatut, R. Dufour, & G. Linares. “Graph-based Features for Automatic Online Abuse Detection,” in International Conference on Statistical Language and Speech Processing (SLSP), Springer, Lecture Notes in Computer Science 10583:70-81, 2017. ⟨hal-01571639⟩ DOI: 10.1007/978-3-319-68456-7_6
[4] N. Cécillon. “Exploration de descripteurs de plongements de graphes pour la détection de messages abusifs,” MSc Thesis, Université d'Avignon, 2019. ⟨dumas-04073337⟩
[5] N. Cécillon, V. Labatut, R. Dufour, and G. Linarès, “Abusive Language Detection in Online Conversations by Combining Content- and Graph-based Features,” in International Workshop on Modeling and Mining Socia-Media Driven Complex Networks, Frontiers in Big Data 2:8, 2019. ⟨hal-02130205⟩ DOI: 10.3389/fdata.2019.00008
[6] N. Cécillon, V. Labatut, R. Dufour, & G. Linarès. “Tuning Graph2vec with Node Labels for Abuse Detection in Online Conversations,” in 11ème Conférence sur les modèles et l'analyse de réseaux : approches mathématiques et informatiques (MARAMI), 2020. ⟨hal-02993571⟩ Official Page
[7] N. Cécillon, V. Labatut, R. Dufour & G. Linarès. “Graph embeddings for Abusive Language Detection,” Springer Nature Computer Science 2:37, 2021. ⟨hal-03042171⟩ DOI: 10.1007/s42979-020-00413-7
[8] N. Cécillon, R. Dufour & V. Labatut. “Approche multimodale par plongements de texte et de graphes pour la détection de messages abusifs,” Traitement Automatique des Langues 62:13-38, 2021. ⟨hal-03527016⟩ Official Page

The following publications use the signed version of the graphs. The modified extraction method is described in publication [9].

[9] N. Cécillon. “Combining Graph and Text to Model Conversations: An Application to Online Abuse Detection,” PhD Thesis, Université d'Avignon, 2024. ⟨tel-04441308⟩

Funding. Part of this work was funded by a grant from the Provence-Alpes-Côte-d'Azur region (PACA, France) and the Nectar de Code company.

Citation. If you use this dataset, please cite paper [1] for the unsigned networks:

@Article{Papegnies2019,
author = {Papegnies, Étienne and Labatut, Vincent and Dufour, Richard and Linarès, Georges},
title = {Conversational Networks for Automatic Online Moderation},
journal = {IEEE Transactions on Computational Social Systems},
year = {2019},
volume = {6},
number = {1},
pages = {38-55},
doi = {10.1109/TCSS.2018.2887240},
}

and [9] for the signed ones:

@PhdThesis{Cecillon2024,
author = {Cécillon, Noé},
title = {Combining Graph and Text to Model Conversations: An Application to Online Abuse Detection},
school = {Université d'Avignon},
year = {2024},
type = {PhD Thesis},
address = {Avignon, FR},
url = {https://theses.fr/2024AVIG0100},
}

Files

SpaceOrigin_graphs.zip

Files (18.9 MB)

Name	Size	Download all
SpaceOrigin_graphs.zip md5:f66923268374b1322f47780958be3925	18.9 MB	Preview Download

Additional details

Is documented by: Journal article: 10.1109/tcss.2018.2887240 (DOI); Conference paper: 10.3389/fdata.2019.00008 (DOI)
Is required by: Conference paper: 10.1007/978-3-319-68456-7_6 (DOI); Journal article: 10.1007/s42979-020-00413-7 (DOI)
Obsoletes: Dataset: 10.6084/m9.figshare.7442273 (DOI)

Conseil Régional Provence-Alpes-Côte d'Azur

Created: 2017
Updated: 2024

Repository URL: https://github.com/CompNet/Alert
Development Status: Inactive

	All versions	This version
Views	300	96
Downloads	308	16
Data volume	418.9 MB	302.6 MB

Conversational Networks For Automatic Online Moderation

Files

SpaceOrigin_graphs.zip

Files (18.9 MB)

Additional details

Related works

Funding

Dates

Software

Conversational Networks For Automatic Online Moderation

Creators

Description

Files

SpaceOrigin_graphs.zip

Files (18.9 MB)

Additional details

Related works

Funding

Dates

Software