Published March 6, 2024 | Version 2.0.0
Dataset Open

Conversational Networks For Automatic Online Moderation

Description

Description. This repository contains several datasets of conversational networks, extracted from the chat messages exchanged by players of the SpaceOrigin MMORPG. Each graph represents a specific conversation, and belongs to one of two classes: Abusive (1) or Non-abusive (0). Vertices represent users, and edges represent the fact that the connected users exchanged message during the considered time period. Edges are weighted and directed: weights represent the intensity of the message exchanges, and directions represent who sent messages to whom.

We provide two types of graphs: unsigned and signed. Unsigned graphs were extracted using the method described in paper [1], below. Version 1.0 of this dataset contain only a part of the conversations, subsampled to get balanced classes. Version 1.1 is extended to contain all available conversations, and there are much more Non-abusive than Abusive conversations. Signed graphs were extracted later, using the method described in publication [9] below. Each edge is described by an additional sign, that indicates the polarity of the messages exchanged by two users; friendly (positive) vs. hostile (negative). 

These datasets were used to train a classifier into automatically recognizing abusive messages. See the below papers for more details. The repository also contains some figures that appear in these papers.

Publications. The following papers used the unsigned version of the conversational networks. The extraction method is described in paper [1].

  • [1] É. Papégnies, V. Labatut, R. Dufour & G. Linarès, “Conversational Networks for Automatic Online Moderation,” IEEE Transactions on Computational Social Systems 6(1):38–55, 2019. ⟨hal-01999546⟩ DOI: 10.1109/tcss.2018.2887240
  • [2] É. Papegnies, R. Dufour, V. Labatut & G. Linarès. “Détection de messages abusifs au moyen de réseaux conversationnels,” in 8ème Conférence sur les modèles et l'analyse de réseaux : approches mathématiques et informatiques (MARAMI), 2017. ⟨hal-01614279
  • [3] É. Papegnies, V. Labatut, R. Dufour, & G. Linares. “Graph-based Features for Automatic Online Abuse Detection,” in International Conference on Statistical Language and Speech Processing (SLSP), Springer, Lecture Notes in Computer Science 10583:70-81, 2017. ⟨hal-01571639⟩ DOI: 10.1007/978-3-319-68456-7_6
  • [4] N. Cécillon. “Exploration de descripteurs de plongements de graphes pour la détection de messages abusifs,” MSc Thesis, Université d'Avignon, 2019. ⟨dumas-04073337
  • [5] N. Cécillon, V. Labatut, R. Dufour, and G. Linarès, “Abusive Language Detection in Online Conversations by Combining Content- and Graph-based Features,” in International Workshop on Modeling and Mining Socia-Media Driven Complex Networks, Frontiers in Big Data 2:8, 2019. ⟨hal-02130205⟩ DOI: 10.3389/fdata.2019.00008
  • [6] N. Cécillon, V. Labatut, R. Dufour, & G. Linarès. “Tuning Graph2vec with Node Labels for Abuse Detection in Online Conversations,” in 11ème Conférence sur les modèles et l'analyse de réseaux : approches mathématiques et informatiques (MARAMI), 2020. ⟨hal-02993571⟩ Official Page
  • [7] N. Cécillon, V. Labatut, R. Dufour & G. Linarès. “Graph embeddings for Abusive Language Detection,” Springer Nature Computer Science 2:37, 2021. hal-03042171⟩ DOI: 10.1007/s42979-020-00413-7
  • [8] N. Cécillon, R. Dufour & V. Labatut. “Approche multimodale par plongements de texte et de graphes pour la détection de messages abusifs,” Traitement Automatique des Langues 62:13-38, 2021. ⟨hal-03527016Official Page

The following publications use the signed version of the graphs. The modified extraction method is described in publication [9].

  • [9] N. Cécillon. “Combining Graph and Text to Model Conversations: An Application to Online Abuse Detection,” PhD Thesis, Université d'Avignon, 2024. ⟨tel-04441308

Funding. Part of this work was funded by a grant from the Provence-Alpes-Côte-d'Azur region (PACA, France) and the Nectar de Code company.

Citation. If you use this dataset, please cite paper [1] for the unsigned networks:


@Article{Papegnies2019,
  author    = {Papegnies, Étienne and Labatut, Vincent and Dufour, Richard and Linarès, Georges},
  title     = {Conversational Networks for Automatic Online Moderation},
  journal   = {IEEE Transactions on Computational Social Systems},
  year      = {2019},
  volume    = {6},
  number    = {1},
  pages     = {38-55},
  doi       = {10.1109/TCSS.2018.2887240},
}

and [9] for the signed ones:


@PhdThesis{Cecillon2024,
  author          = {Cécillon, Noé},
  title           = {Combining Graph and Text to Model Conversations: An Application to Online Abuse Detection},
  school          = {Université d'Avignon},
  year            = {2024},
  type            = {PhD Thesis},
  address         = {Avignon, FR},
  url             = {https://theses.fr/2024AVIG0100},
}

Files

SpaceOrigin_graphs.zip

Files (18.9 MB)

Name Size Download all
md5:f66923268374b1322f47780958be3925
18.9 MB Preview Download

Additional details

Related works

Is documented by
Journal article: 10.1109/tcss.2018.2887240 (DOI)
Conference paper: 10.3389/fdata.2019.00008 (DOI)
Is required by
Conference paper: 10.1007/978-3-319-68456-7_6 (DOI)
Journal article: 10.1007/s42979-020-00413-7 (DOI)
Obsoletes
Dataset: 10.6084/m9.figshare.7442273 (DOI)

Funding

Conseil Régional Provence-Alpes-Côte d'Azur

Dates

Created
2017
Updated
2024

Software

Repository URL
https://github.com/CompNet/Alert
Development Status
Inactive